Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Rails + MediaWiki API for Wikipedia data extraction

I am trying to use Rails to extract data from Wikipedia, based on a search term.

For example,

1) if I have the String "American Idol", I want to pass that to Wikipedia and get a list of the articles that relate to that. My goal will be to take the first 3 hyperlinks and display them on the website.

2) one step further would involve me extracting small pieces of data from Wikipedia - say the infobox, or the first few words of the wikipedia article.

Any tips?

Thanks!

like image 231
Carlos F Avatar asked Oct 20 '11 04:10

Carlos F


People also ask

Is there an API for Wikipedia?

What is the Wikipedia API? The Wikipedia API (official documentation) is supported by the MediaWiki's API and provide access to Wikipedia and other MediaWiki data without interacting with the user interface.

How do I find my Wikipedia page ID?

In the desktop view of Wikipedia, in the default skin and most others, the left-hand panel has a "Wikidata item" link, under " tools ". Copy the URL of that link, paste it into a text editor, and read (or copy) the ID from it.

Is Wikipedia API free?

Wikipedia and other Wikimedia projects are free, collaborative repositories of knowledge, written and maintained by volunteers from around the world. The Wikimedia API gives you open access to add this free knowledge to your projects and apps.


1 Answers

You don't need to resort to screen-scraping, MediaWiki has a very comprehensive API for precisely this kind of thing. See https://github.com/jpatokal/mediawiki-gateway for a handy Ruby wrapper around it.

Alternatively, if you're only interested in data like infoboxes, see DBpedia for the database version of Wikipedia.

like image 182
lambshaanxy Avatar answered Sep 19 '22 02:09

lambshaanxy