Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I get all articles about people from Wikipedia?

What would be the easiest way to get all articles about people from Wikipedia? I know I can download a dump of all the pages, but then how do I filter those and get only the ones about people? I need as many as I can get (preferably more than a million) so using any sort of API is probably not an option.

like image 852
Johnny Avatar asked Oct 25 '10 17:10

Johnny


People also ask

How many Wikipedia articles are about people?

The average person who visits the site spends four minutes and 26 seconds per day browsing the 5.5 million English Wikipedia articles. 1.5 million articles are biographies.

How do you get data from Wikipedia?

You can download the Wikipedia database directly and parse all pages to XML with Wiki Parser, which is a standalone application. The first paragraph is a separate node in the resulting XML. Alternatively, you can extract the first paragraph from its plain-text output.

Can you use Wikipedia to find sources?

Wikipedia is not a reliable source for citations elsewhere on Wikipedia. As a user-generated source, it can be edited by anyone at any time, and any information it contains at a particular time could be vandalism, a work in progress, or simply incorrect.

Can I make a Wikipedia page about someone?

You can, but you shouldn't. Wikipedia rules say that you should not create your own Wikipedia page. Doing so would be a conflict of interest. If you decide to write one anyway, and it gets taken down, it is very difficult to get a new page because your name will be flagged.


1 Answers

Since articles about people usually contain the Persondata template, you can just search for all articles that contain Persondata. You can find a sample API query for doing just that here:

Does the Wikipedia API support searches for a specific template?

like image 152
lambshaanxy Avatar answered Oct 11 '22 13:10

lambshaanxy