Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to get table data from Wikipedia page?

Is there somebody who knows how to use the Wikipedia API to get JSON or XML data out from a table on a specific Wikipedia page?

Is there maybe a different way to do this?

For example from here https://en.wikipedia.org/wiki/List_of_action_films_of_the_2010s

like image 279
ST80 Avatar asked Aug 04 '16 09:08

ST80


1 Answers

You can use curl (or use any other method/tool) to retrieve and/or parse a Wikipedia-URL via the public API. Here are two examples that should help you:

Retrieval of List_of_action_films_of_the_2010s:

  • JSON unparsed via the query action
  • JSON parsed via the parse action

Next, you would need to parse for and/or select the sub-elements relevant for your analysis. In this case I would assume: wikitable elements.

For reference and a detailed explanation, you can have a look at the general API page of MediaWiki and at the list of parameters on how to use the API to parse Wikipedia pages for certain data elements.

like image 137
MWiesner Avatar answered Oct 15 '22 08:10

MWiesner