Someone in /flip/ wanted to make wordclouds based on multiple threads, so I wrote a little python script. I figured some of you might be interested as well.
I'm pretty much done with the wordcloud script, there are a few things that could be improved but I will release it now and maybe go back to it sometime in the future. I use python 3.12.6 I'm just going to assume that every 3.x version is going to work.
The script uses Selenium to scrape data from
warosu.org and then creates a wordcloud witht the python WordCloud module, so it comes with dependencies.
Install python and then type in the cmd "pip install selenium wordcloud"
If you are using chrome you will also need to install chromedriver, get the version that matches your chrome version here:
https://googlechromelabs.github.io/chrome-for-testing/#stableYou might need to set environment variables for chromedriver and python, ask google on how to do this for your windows version.
Inside the script are some settings. You will have to set a valid path in line 40. Check out the rest of the script while you are at it.
After all that you can run the script, it's cli only
You can use the following arguments:
-g specify a general by subject e.g. flip this will find the latest thread, note this might catch other threads that have "flip" in their subject
-thread specify any thread by it's threadnumber, this is the prefered method
-to the last thread that should be included specified by it's threadnumber
-n the amount of threads that should be used for the wordcloud, overrides -to
-e exports the posts as an array, useful for testing wordcloud settings and debugging
-i imports the array, overrides other inputs
-wc counts unique words
-nocloud doesn't create a wordcloud, only useful for debugging
I hope I didn't break anything with my last minute changes, just aks if there are any big issues.
https://pastebin.com/3s5adeUe