I have a set of full urls like <pre class="prettyprint"><code>http://en.wikipedia.org/wiki/Episkopi_Bay http://en.wikipedia.org/wiki/Monte_Lauro http://en.wikipedia.org/wiki/Lampedusa http://en.wikipedia.org/wiki/Himera http://en.wikipedia.org/wiki/Lago_Cecita http://en.wikipedia.org/wiki/Aspromonte </code></pre> I want to find wikipedia pageids for these URLS. I have used the Mediawiki API before but I cant figure out how I may do this. I have tried extracting the page title from the URLs by taking a substring of lastindexof("/") and the last character and then querying the API to get pageid. <pre class="prettyprint"><code>http://en.wikipedia.org/wiki/Episkopi_Bay --> Episkopi_Bay http://en.wikipedia.org/wiki/Monte_Lauro --> Monte_Lauro http://en.wikipedia.org/wiki/Lampedusa -- > Lampedusa http://en.wikipedia.org/wiki/Himera --> Himera http://en.wikipedia.org/wiki/Lago_Cecita --> Lago_Cecita http://en.wikipedia.org/wiki/Aspromonte --> Aspromonte </code></pre> But the problem is that some of my links might be redirects and hence the substring might not always be the title of the page. TL;DR : How can I find the pageid of a wikipedia page from a URL ?

You can add &indexpageids to your query. For example https://en.wikipedia.org/w/api.php?action=query&format=json&titles=Main%20Pages&indexpageids or if you are looking for a summary at the same time, here's a more comprehensive example link: https://en.wikipedia.org/w/api.php?action=query&format=json&titles=barberton%20daisy&prop=extracts&exintro&explaintext&redirects=1&indexpageids Then if you parse the JSON you will see a property named pageids under query

Wikipedia Mediawiki API get Pageid from URL

Tags:

wikipedia-api

mediawiki-api

mediawiki

mediawiki-extensions

I have a set of full urls like

http://en.wikipedia.org/wiki/Episkopi_Bay
http://en.wikipedia.org/wiki/Monte_Lauro
http://en.wikipedia.org/wiki/Lampedusa
http://en.wikipedia.org/wiki/Himera
http://en.wikipedia.org/wiki/Lago_Cecita
http://en.wikipedia.org/wiki/Aspromonte

I want to find wikipedia pageids for these URLS. I have used the Mediawiki API before but I cant figure out how I may do this.

I have tried extracting the page title from the URLs by taking a substring of lastindexof("/") and the last character and then querying the API to get pageid.

http://en.wikipedia.org/wiki/Episkopi_Bay --> Episkopi_Bay
http://en.wikipedia.org/wiki/Monte_Lauro --> Monte_Lauro
http://en.wikipedia.org/wiki/Lampedusa -- > Lampedusa
http://en.wikipedia.org/wiki/Himera --> Himera
http://en.wikipedia.org/wiki/Lago_Cecita --> Lago_Cecita
http://en.wikipedia.org/wiki/Aspromonte --> Aspromonte

But the problem is that some of my links might be redirects and hence the substring might not always be the title of the page.

TL;DR : How can I find the pageid of a wikipedia page from a URL ?

958

asked Jul 28 '15 17:07

Shreyas Chavan

1 Answers

You can add &indexpageids to your query.

For example

https://en.wikipedia.org/w/api.php?action=query&format=json&titles=Main%20Pages&indexpageids

or if you are looking for a summary at the same time, here's a more comprehensive example link:

https://en.wikipedia.org/w/api.php?action=query&format=json&titles=barberton%20daisy&prop=extracts&exintro&explaintext&redirects=1&indexpageids

Then if you parse the JSON you will see a property named pageids under query

140

answered Sep 18 '22 14:09

Ari

Related questions
                            
                                WordPress MediaWiki integration
                            
                                realpath returns empty string
                            
                                Parse birth and death dates from Wikipedia?
                            
                                How to get image URL in wiki api?
                            
                                Obtaining static HTML files from Wikipedia XML dump
                            
                                Re-indexing huge database (the English Wikipedia) efficiently
                            
                                Is there an open source realtime collaborative wiki engine?
                            
                                How to best add extensions when using official docker image for MediaWiki?
                            
                                How do I change a user's Email address in MediaWiki
                            
                                How to obtain data in a table from Wikipedia API?
                            
                                How to make some text red in mediawiki? [closed]
                            
                                How can I fix the MediaWiki error "Wiki uses cookies to log in users. You have cookies disabled. Please enable them and try again."?
                            
                                How to encourage non-anonymous editing on MediaWiki?
                            
                                Mediawiki open external links in a new window
                            
                                Transclude a category in MediaWiki
                            
                                How to parse/extract data from a mediawiki marked-up article via python
                            
                                How to make a MediaWiki site multilingual
                            
                                Wikipedia api fulltext search to return articles with title, snippet and image
                            
                                Convert MediaWiki wikitext format to HTML using command line
                            
                                How to export text from all pages of a MediaWiki?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With