>>1278485I started by searching for “list of actors” on Wikipedia, and found
https://en.wikipedia.org/wiki/Lists_of_actors.
It’s a list of lists, so I wanted to find all lists, but not those under “See also”. I inspected some of those lists, and saw that many, e.g.,
https://en.wikipedia.org/wiki/List_of_British_actors,
shared a structure. There are other types of pages, but I ignored those because it would be a lot of work not to. Many of the articles have an infobox with birthplace div, so I extracted the birthplace from there and ignored pages that lack this element. The name I extract from the title, the first h1 element, which in hindsight is not the best way of doing it. I did all of this using Python with BeautifulSoup for the scraping, and Firefox’s developer tools to find the elements I need. I did a little post-processing with common Unix tools and Emacs. The code is here:
https://pastebin.com/79DMMy5QBe advised that this code is messy and inefficient, as it was written hastily with little forethought.