Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Wikipedia search API get redirect pageID

I get many Wikipedia pageIDs in DB, and some of them are already redirected to somewhere else.

So I want to know how to get these new pageIDs.

I check the Wikipedia web:

http://en.wikipedia.org/wiki/?curid=11601783

It says (Redirected from....) which means it is not the main link I want. The good link should be:

http://en.wikipedia.org/wiki/?curid=34344124

So I want to know how to get the final pageID by API search like:

http://en.wikipedia.org/w/api.php?action=query&format=json&prop=extracts&pageids=11601783

What parameters should I use?

like image 906
Benny Ae Avatar asked Mar 06 '14 21:03

Benny Ae


1 Answers

To make the API resolve redirects, just add redirects to a query. So, for example:

http://en.wikipedia.org/w/api.php?action=query&format=json&pageids=11601783&redirects

will give you the page id of the redirect target.

There doesn't seem to be a good way to do this using a single query for multiple pages, because the redirects part of the response maps from title to title, not page id (I'm assuming you don't know the title of the redirect page).

One way to work around that would be to combine redirects with prop=redirects:

http://en.wikipedia.org/w/api.php?action=query&format=json&pageids=11601783&redirects&prop=redirects&rdlimit=max

This will give you all redirects to the target page, including their page ids.

like image 85
svick Avatar answered Nov 17 '22 21:11

svick