I have a set of full urls like
http://en.wikipedia.org/wiki/Episkopi_Bay
http://en.wikipedia.org/wiki/Monte_Lauro
http://en.wikipedia.org/wiki/Lampedusa
http://en.wikipedia.org/wiki/Himera
http://en.wikipedia.org/wiki/Lago_Cecita
http://en.wikipedia.org/wiki/Aspromonte
I want to find wikipedia pageids for these URLS. I have used the Mediawiki API before but I cant figure out how I may do this.
I have tried extracting the page title from the URLs by taking a substring of lastindexof("/") and the last character and then querying the API to get pageid.
http://en.wikipedia.org/wiki/Episkopi_Bay --> Episkopi_Bay
http://en.wikipedia.org/wiki/Monte_Lauro --> Monte_Lauro
http://en.wikipedia.org/wiki/Lampedusa -- > Lampedusa
http://en.wikipedia.org/wiki/Himera --> Himera
http://en.wikipedia.org/wiki/Lago_Cecita --> Lago_Cecita
http://en.wikipedia.org/wiki/Aspromonte --> Aspromonte
But the problem is that some of my links might be redirects and hence the substring might not always be the title of the page.
TL;DR : How can I find the pageid of a wikipedia page from a URL ?
In the desktop view of Wikipedia, in the default skin and most others, the left-hand panel has a "Wikidata item" link, under " tools ". Copy the URL of that link, paste it into a text editor, and read (or copy) the ID from it.
As other's have mentioned, you would use prop=pageimages in your API query. If you also want the image description, you would use prop=pageimages|pageterms instead in your API query. You can get the original image using piprop=original . Or you can get a thumbnail image with a specified width/height.
If you want to get all the links on the page “Title”: Use just prop=links , you don't want the generator. Increase the limit to the maximum possible by adding pllimit=max ( pl is the “prefix” for links ) Use the value given in the query-continue element to get to the second (and following) page of results.
MediaWiki is a free and open-source wiki software. It is used on Wikipedia and almost all other Wikimedia websites, including Wiktionary, Wikimedia Commons and Wikidata; these sites define a large part of the requirement set for MediaWiki.
You can add &indexpageids to your query.
For example
https://en.wikipedia.org/w/api.php?action=query&format=json&titles=Main%20Pages&indexpageids
or if you are looking for a summary at the same time, here's a more comprehensive example link:
https://en.wikipedia.org/w/api.php?action=query&format=json&titles=barberton%20daisy&prop=extracts&exintro&explaintext&redirects=1&indexpageids
Then if you parse the JSON you will see a property named pageids under query
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With