Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to export text from all pages of a MediaWiki?

I have a MediaWiki running which represents a dictionary of German terms and their translation to a local dialect. Each page holds one term, its translation and a number of additional infos.

Now, for a printable version of the dictionary, I need a full export of all terms and their translation. Since this is an extract of a page's content, I guess I need a complete export of all pages in their newest version in a parsable format, e.g. xml or csv.

Has anyone done that or can point me to a tool? I should mention, that I don't have full access to the server, e.g. no command line, but I am able to add MediaWiki extensions or access the MySQL database.

like image 670
Alexander Rühl Avatar asked Jul 18 '11 22:07

Alexander Rühl


People also ask

Where does MediaWiki store pages?

MediaWiki stores important data in two places: Database. Pages and their contents, users and their preferences, metadata, search index, etc. File system.

How do I export from Wikipedia?

An exporter need to apply different forms of shipping bill/ bill of export for export of duty free goods, export of dutiable goods and export under drawback etc. Under EDI System, declarations in prescribed format are to be filed through the Service Centers of Customs.

How do I import a page into MediaWiki?

You can name the file differently but make sure that you use the . xml file extension. After that you can download and transfer the XML file with the pages to another MediaWiki installation (or to your local computer, depending on which import method you decide to use), and then you can import it in that installation.


2 Answers

You can export the page content directly from the database. It will be the raw wiki markup, as when using Special:Export. But it will be easier to script the export, and you don't need to make sure all your pages are in some special category.

Here is an example:

SELECT page_title, page_touched, old_text
FROM revision,page,text
WHERE revision.rev_id=page.page_latest
AND text.old_id=revision.rev_text_id;

If your wiki uses Postgresql, the table "text" is named "pagecontent", and you may need to specify the schema. In that case, the same query would be:

SET search_path TO mediawiki,public;

SELECT page_title, page_touched, old_text 
FROM revision,page,pagecontent
WHERE revision.rev_id=page.page_latest
AND pagecontent.old_id=revision.rev_text_id;
like image 96
mivk Avatar answered Sep 20 '22 14:09

mivk


This worked very well for me. Notice I redirected the output to the file backup.xml. From a Windows Command Processor (CMD.exe) prompt:

cd \PATH_TO_YOUR_WIKI_INSTALLATION\maintenance
\PATH_OF_PHP.EXE\php dumpBackup.php --full > backup.xml
like image 28
Robert Stevens Avatar answered Sep 21 '22 14:09

Robert Stevens