Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Check if a MediaWiki page exists (Python)

I'm working on a Python script that transforms this:

foo
bar

Into this:

[[Component foo]]
[[bar]]

The script checks (per input line) if the page "Component foo" exists. If it exists then a link to that page is created, if it doesn't exist then a direct link is created.

The problem is that I need a quick & cheap way to check if a lot of wiki pages exist.I don't want to (try to) download all the 'Component' pages.

I already figured out a fast way to do this by hand: Edit a new wiki page. paste all the 'component' links into the page, press preview, and then save the resulting preview HTML page. The resulting HTML file contains a different link for existing pages than for non-existing pages.

So to rephrase my question: How can I save a mediawiki preview page in Python?

(I don't have local access to the database.)

like image 512
compie Avatar asked Dec 03 '22 05:12

compie


2 Answers

You can definitely use the API to check if a page exists:

# assuming words is a list of words you wish to query for
import urllib

# replace en.wikipedia.org with the address of the wiki you want to access
query = "http://en.wikipedia.org/w/api.php?action=query&titles=%s&format=xml" % "|".join(words)
pages = urllib.urlopen(query)

Now pages you will contain xml like this:

<?xml version="1.0"?><api><query><pages>

   <page ns="0" title="DOESNOTEXIST" missing="" />

   <page pageid="600799" ns="0" title="FOO" />

   <page pageid="11178" ns="0" title="Foobar" />

</pages></query></api>

Pages which don't exist will appear here but they have the missing="" attribute set, as can be seen above. You can also check for the invalid attribute to be on the save side.

Now you can use your favorite xml parser to check for these attributes and react accordingly.

See also: http://www.mediawiki.org/wiki/API:Query

like image 189
Garns Avatar answered Dec 28 '22 21:12

Garns


Use Pywikibot to interact with the MediaWiki software. It's probably the most powerful bot framework available.

The Python Wikipediabot Framework (pywikipedia or PyWikipediaBot) is a collection of tools that automate work on MediaWiki sites. Originally designed for Wikipedia, it is now used throughout the Wikimedia Foundation's projects and on many other MediaWiki wikis. It's written in Python, which is a free, cross-platform programming language. This page provides links to general information for people who want to use the bot software.

like image 27
poke Avatar answered Dec 28 '22 21:12

poke