Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Wikipedia Mediawiki API get Pageid from URL

I have a set of full urls like

http://en.wikipedia.org/wiki/Episkopi_Bay
http://en.wikipedia.org/wiki/Monte_Lauro
http://en.wikipedia.org/wiki/Lampedusa
http://en.wikipedia.org/wiki/Himera
http://en.wikipedia.org/wiki/Lago_Cecita
http://en.wikipedia.org/wiki/Aspromonte

I want to find wikipedia pageids for these URLS. I have used the Mediawiki API before but I cant figure out how I may do this.

I have tried extracting the page title from the URLs by taking a substring of lastindexof("/") and the last character and then querying the API to get pageid.

http://en.wikipedia.org/wiki/Episkopi_Bay --> Episkopi_Bay
http://en.wikipedia.org/wiki/Monte_Lauro --> Monte_Lauro
http://en.wikipedia.org/wiki/Lampedusa -- > Lampedusa
http://en.wikipedia.org/wiki/Himera --> Himera
http://en.wikipedia.org/wiki/Lago_Cecita --> Lago_Cecita
http://en.wikipedia.org/wiki/Aspromonte --> Aspromonte

But the problem is that some of my links might be redirects and hence the substring might not always be the title of the page.

TL;DR : How can I find the pageid of a wikipedia page from a URL ?

like image 958
Shreyas Chavan Avatar asked Jul 28 '15 17:07

Shreyas Chavan


People also ask

How do I find my Wikipedia page ID API?

In the desktop view of Wikipedia, in the default skin and most others, the left-hand panel has a "Wikidata item" link, under " tools ". Copy the URL of that link, paste it into a text editor, and read (or copy) the ID from it.

How do I get an image from Wikipedia API?

As other's have mentioned, you would use prop=pageimages in your API query. If you also want the image description, you would use prop=pageimages|pageterms instead in your API query. You can get the original image using piprop=original . Or you can get a thumbnail image with a specified width/height.

How do I get all the links from a Wikipedia page?

If you want to get all the links on the page “Title”: Use just prop=links , you don't want the generator. Increase the limit to the maximum possible by adding pllimit=max ( pl is the “prefix” for links ) Use the value given in the query-continue element to get to the second (and following) page of results.

Does Wikipedia run on MediaWiki?

MediaWiki is a free and open-source wiki software. It is used on Wikipedia and almost all other Wikimedia websites, including Wiktionary, Wikimedia Commons and Wikidata; these sites define a large part of the requirement set for MediaWiki.


1 Answers

You can add &indexpageids to your query.

For example

https://en.wikipedia.org/w/api.php?action=query&format=json&titles=Main%20Pages&indexpageids

or if you are looking for a summary at the same time, here's a more comprehensive example link:

https://en.wikipedia.org/w/api.php?action=query&format=json&titles=barberton%20daisy&prop=extracts&exintro&explaintext&redirects=1&indexpageids

Then if you parse the JSON you will see a property named pageids under query

like image 140
Ari Avatar answered Sep 18 '22 14:09

Ari