What would be the easiest way to get all articles about people from Wikipedia? I know I can download a dump of all the pages, but then how do I filter those and get only the ones about people? I need as many as I can get (preferably more than a million) so using any sort of API is probably not an option.
The average person who visits the site spends four minutes and 26 seconds per day browsing the 5.5 million English Wikipedia articles. 1.5 million articles are biographies.
You can download the Wikipedia database directly and parse all pages to XML with Wiki Parser, which is a standalone application. The first paragraph is a separate node in the resulting XML. Alternatively, you can extract the first paragraph from its plain-text output.
Wikipedia is not a reliable source for citations elsewhere on Wikipedia. As a user-generated source, it can be edited by anyone at any time, and any information it contains at a particular time could be vandalism, a work in progress, or simply incorrect.
You can, but you shouldn't. Wikipedia rules say that you should not create your own Wikipedia page. Doing so would be a conflict of interest. If you decide to write one anyway, and it gets taken down, it is very difficult to get a new page because your name will be flagged.
Since articles about people usually contain the Persondata template, you can just search for all articles that contain Persondata. You can find a sample API query for doing just that here:
Does the Wikipedia API support searches for a specific template?
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With