Query Wikipedia pages with properties

Question

I need to use Wikipedia API Query or any other api such as Opensearch to query for a simple list of pages with some properties.

Input: a list of page (article) titles or ids.
Output: a list of pages that contain the following properties each:
page id
title
snippet/description (like in opensearch api)
page url
image url (like in opensearch api)

A result similar to this:
http://en.wikipedia.org/w/api.php?action=opensearch&search=miles%20davis&limit=20&format=xml
Only with page ids and not for a search, but rather an exact list of pages by either titles or pageids.

This should be a fairly simple thing but I have been stuck with that for quite some time trying all kinds of URL combinations from the MW api manual, without success.

leo · Accepted Answer

I dont't think there is another way than the Open Search API to fetch Open Search data, but depending on which Wikipedia you are interested in, there might be other extensions installed to help you. Taking English Wikipedia as an example, we can make use of the MobileFrontend and PageImages extensions, that happens to be installed there.

Title and url are available from the native MediaWiki API. To get the url, you can use prop=info, and specify with inprop=url that it is the url you are interested in.
Prominent images of a page is returned by prop=pageimages, thanks to PageImages.
MobileFrontend adds a property called extracts, that you can use with the directive exintro to get the first paragraph. Note however that MediWiki markup is complex, and result might not always be perfect. If we put it all together in one single query, it would be something like this:

http://en.wikipedia.org/w/api.php?action=query&pageids=21482&prop=pageimages|info|extracts&inprop=url&exintro

giving this:

<api>
  <query>
    <pages>
      <page pageid="21482" ns="0" title="Nairobi" pageimage="Nairobi_Montage.jpg" contentmodel="wikitext" pagelanguage="en" touched="2014-02-06T06:10:01Z" lastrevid="594161616" counter="" length="89157" fullurl="http://en.wikipedia.org/wiki/Nairobi" editurl="http://en.wikipedia.org/w/index.php?title=Nairobi&amp;action=edit">
        <thumbnail source="http://upload.wikimedia.org/wikipedia/commons/thumb/6/66/Nairobi_Montage.jpg/45px-Nairobi_Montage.jpg" width="45" height="50" />
        <extract xml:space="preserve">
             &lt;p&gt;&lt;b&gt;Nairobi&lt;/b&gt; /naɪˈroʊbi/ is the [...]
        </extract>
      </page>
    </pages>
  </query>
</api>

Query Wikipedia pages with properties

Tags:

search

api

wikipedia

mediawiki

yuvalr80

1 Answers

leo

Recent Activity

Donate For Us

Query Wikipedia pages with properties

Tags:

search

api

wikipedia

mediawiki

yuvalr80

1 Answers

leo

Related questions

Recent Activity

Donate For Us