Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to get all article pages under a Wikipedia Category and its sub-categories?

I want to get all the articles names under a category and its sub-categories.

Options I'm aware of:

  1. Using the Wikipedia API. Does it have such an option??
  2. d/l the dump. Which format would be better for my usage?
  3. There is also an option to search in Wikipedia something like incategory:"music", but I didn't see an option to view that in XML.

Please share your thoughts

like image 404
Noam Avatar asked Apr 24 '11 16:04

Noam


People also ask

How do I see all categories in Wikipedia?

To avoid extra work, try searching within your wiki before creating a new category. The list of all categories can be found in "Special pages" in the "tools" box of the sidebar.

How do I add a sub category in Wikipedia?

To create a category, first add an article to that category. Do this by editing the article page. At the bottom, but before the interwiki links (if any are present), add the name of the new category, (e.g.: [[Category:New category name]] ), and save your edit.

How do I create a category page in Wikipedia?

Simply scroll down to the bottom of a page and: Click on the Add category button and start typing in a category name. You can use an already-established category (suggestions will appear as you type) or add a new one. Once you are done, hit save and the category will appear on the article.

How many different Wikipedia pages are there?

Including articles, the total number of pages is 56,416,157. Being pages themselves, articles make up 11.6 percent of all pages on Wikipedia. As of 2 April 2022, the size of the current version of all articles compressed is about 20.69 GB without media.


2 Answers

The following resource will help you to download all pages from the category and all its subcategories:

http://en.wikipedia.org/wiki/Wikipedia:CatScan

There is also an API available here:

https://www.mediawiki.org/wiki/API:Categorymembers

like image 163
Datageek Avatar answered Sep 30 '22 21:09

Datageek


You can do this through the following two API methods:

For articles pages for this category

YOUR_URL/api.php?action=query&format=json&list=categorymembers&cmtitle=Category:Music 

For get subcategories:

YOUR_URL/api.php?action=query&format=json&list=categorymembers&cmtype=subcat&cmtitle=Category:Music 

You can get more info on Mediawiki API

like image 33
Adexe Rivera Avatar answered Sep 30 '22 23:09

Adexe Rivera