This page: http://wikidata.dbpedia.org/downloads/20160111/ has a dump called wikidatawiki-20160111-page-ids.ttl.bz2 which contains Wikidata id to what they called wikipage id. The wikipage id seems different from the Wikipedia pageid though.
e.g. for Germany:
So basically this dump maps Q183 to 322, while I need to map Q183 to 11867.
As a reference : https://en.wikipedia.org/w/index.php?title=Germany&curid=11867 the curid in the URL represents the Wikipedia page id.
Is there any equivalent dump file out there that has the Wikidata ids and the Wikipedia pageid? (I don't want to use an API and loop my Wikipedia page id one by one like this one does: https://en.wikipedia.org/w/api.php?action=query&prop=pageprops&format=xml&pageids=11867)
Edit: I'm not sure about waht is exactly the wikipage id, but maybe there is a wikipageId to Wikipedia pageid mapping file on top of the dump I mentioned in the question.
I created a Python package and command line tool to deal with the issue called wikimapper. It can be installed via pip install wikimapper
. It uses the Wikipedia SQL dumps to create an index that then can be used to map many times very fast (much faster than the Wikidata SPARQL endpoint). You could either use one of my precomputed indices and use this sqlite3 database or use the package to map Wikipedia page titles/Wikipedia URLs to Wikidata IDs and vice versa. Using pages names or URLs instead of interal Wikipedia IDs should be more comfortable.
If you are willing to consider an API call solution instead of using the dump plus format adjustment, you could use the pageprops
property of the query
action.
For instance, if we want to find out the Wikidata item for Albert Einstein, given the wikipedia page title, you'd do:
https://en.wikipedia.org/w/api.php?action=query&format=json&prop=pageprops&titles=Albert Einstein
Which gives:
{
"batchcomplete": "",
"query": {
"pages": {
"736": {
"pageid": 736,
"ns": 0,
"title": "Albert Einstein",
"pageprops": {
"defaultsort": "Einstein, Albert",
"page_image": "Einstein_1921_by_F_Schmutzer_-_restoration.jpg",
"wikibase-badge-Q17437798": "1",
"wikibase_item": "Q937"
}
}
}
}
}
Like this we can retrieve the wikidata item id at wikibase_item
.
(This is as originally answered by Dmitry Brant in the Mediawiki-api mailing list)
Potentially this is a better solution because:
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With