Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Possible to create ZIM file of whole Wiki? (my own, based on mediawiki)

Tags:

mediawiki

I want to generate an offline ZIM version of our own Wiki (that runs on Mediawiki).The Collection extension is a breeze to install, but only works for selecting single pages, which in a next step can be combined into a single ZIM file.

But with a wiki of hundreds of pages this is too hard to do, based on single pages. I want to have a zim-dump of the whole wiki. I know it's possible, because there is also a zimfile for the complete wikipedia.

However, I can't find how this is done. Anyone able to help? Thanks in advance!

like image 901
Dr.Bob Avatar asked Apr 12 '13 11:04

Dr.Bob


2 Answers

I don't know up to what extent this answer is still relevant, but here it goes…

After much trouble, I finally managed to create a ZIM file out of my private MediaWiki-based wiki:

  • I started with this page: OpenZIM - Build your ZIM file
  • I tested all of the listed possibilities but only mwoffliner worked (for me)
  • The installation was done in a VirtualBox (version 6.0.0) Ubuntu 18.10 Desktop guest, hosted on a Mac (macOS Mojave, vs. 10.14.2)
    • Note that I ended up using the Guest OS as headless, so the graphical interface became useless, next step will be to use a server version of Ubuntu
  • After much struggle, I managed to make mwoffliner work but not without the precious help of the developers on GitHub

Please find here below step-by-step instructions on what I did. Note that the main instructions come from mwoffliner branch of openZIM on GitHub, therefore most of the credit of these instructions goes to them.

NodeJS

$ sudo apt install curl
$ curl -o- https://raw.githubusercontent.com/creationix/nvm/v0.33.11/install.sh | bash && source ~/.bashrc && nvm install stable && node --version

Image Processing & Redis & git & meson & gcc & g++ & pkg-config installation

$ sudo apt install jpegoptim advancecomp gifsicle pngquant imagemagick redis-server git meson g++ pkg-config libzim-dev

libzim-dev: manual upgrade from version 2.0.0 to version >=4.0.0

  • (Ubuntu: uninstall packages source)
  • (libzim installation instructions source)

1- If libzim 2.0.0 (libzim-dev) is already installed, then proceed with uninstalling it, else continue with point 2.

$ sudo apt remove libzim-dev #removes libzim 2.0.0
$ sudo apt purge libzim-dev
$ sudo apt autoremove #removes libzim2

2- Install libzim version >=4.0.0

$ sudo apt install cython3 liblzma-dev libgumbo-dev libicu-dev libmagic-dev libxapian-dev python-dev python-pip python-virtualenv zlib1g-dev
$ git clone https://github.com/openzim/libzim.git
$ cd libzim
$ meson . build
$ ninja -C build
$ sudo ninja -C build install
$ sudo ldconfig

ZimWriterFS Manual installation

(Source)

$ cd ~/Downloads/
$ sudo apt install librsvg2-bin
$ git clone https://github.com/openzim/zimwriterfs.git
$ cd zimwriterfs
$ meson . build
$ ninja -C build
$ sudo ninja -C build install
$ zimwriterfs
zimwriterfs usage page should appear

VirtualBox - Access VirtualBox Guest from host OS

  • (Source)

    1. Start VirtualBox 6.x.x
    2. Menu File
    3. Choose Host Network Manager…
    4. Choose tab DHCP Server
    5. Click Create (upper left corner of the window)
    6. Select Enable Server
    7. Server Address: 192.168.56.2
    8. Server Mask 255.255.255.0
    9. Lower Address Bound: 192.168.56.3
    10. Upper Address Bound: 192.168.56.254
    11. Choose tab Adapter
    12. Verify that "Configure Adapter Manually" is selected and,
    13. IPv4 Address: 192.168.56.1
    14. IPv4 Network Mask: 255.255.255.0
    15. Click Close
    16. Right-click on the guest machine
    17. Select Settings… (or just press cmd-s)
    18. Choose tab Network
    19. Select tab Adapter 2
    20. Click Enable Network Adapter
    21. Attached to: select Host-only Adapter
    22. Name: vboxnet0
    23. Click OK
    24. Start Guest machine

mwoffliner command issued

This command assumes that:

  • The MediaWiki wiki is up and running,
  • VirtualBox attributed the IP address 192.168.56.5 to the guest OS (see instructions under section VirtualBox - Access VirtualBox Guest from host OS above) (check the IP address of the OS with ifconfig)
  • LocalSettings.php's $wgServer = "http://192.168.56.5"; (check the IP address of the OS with ifconfig)
  • The name of your wiki is YourWiki
  • The MediaWiki folder containing your wiki is in /var/www/html/ (i.e., /var/www/html/YourWiki)

The actual command:

mwoffliner --mwUrl=http://192.168.56.5/YourWiki [email protected] --verbose --redis=redis://127.0.0.1:6379 --mwWikiPath=/ --mwApiPath=api.php --localParsoid

like image 198
pdeli Avatar answered Jan 02 '23 18:01

pdeli


Yes you can, but it's not easy. Kiwix devs are now working on a Parsoid-based solution: http://sourceforge.net/p/kiwix/other/ci/master/tree/mwoffliner/ Parsoid is, in short, the backend of the MediaWiki VisualEditor, which handles the translation of wikitext to HTML and vice versa. It has a cache of HTML versions that can be exploited for such stuff. https://www.mediawiki.org/wiki/Parsoid should give some info on how to set it up...

like image 33
Nemo Avatar answered Jan 02 '23 19:01

Nemo