Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to convert a webpage (from an intranet wiki) to an Office document?

I have a set of Wiki pages (MediaWiki style) on my company's intranet that I would like to convert to Microsoft Office Word documents (or something that I can import in it). I am looking for something that has:

Requirements

  • Keep the formatting as much as it can
  • Does not require to change anything on the server that hosts the Wiki (no plugin can be added nor configuration files can be modified from my side)
  • The solution can be programmatically (as I am a developer too), in flavor of Python/C#/C++ and the like

Exclusions

  • Does not look like a solution as "Wiki to Acrobat PDF Pro to Microsof Office Word" (as we do not have Acrobat PDF Pro). Actually, even the non-Pro version (that allows a "Save as Microsoft Word online" option) is not available in my company (very old version of Adobe suite). However, I can still export the page as a pdf, but from the Wiki we have, it does not look good (because some element are too big, for an A4 format, and the extra parts are scraped out of the produced pdf. I would like them to be included anyway and be able to play with "bad" formatting within Word eventually
  • As it is an intranet wiki, online solutions are out of the scope
  • Solutions that implies I could copy the db of the Wiki and do the operation elsewhere (at home for example) are also out of the scope

Options

  • The solution can be either on Windows or Linux-like (CentOS)
  • If it can do it in batch, it is better, but not required

Question

Would you have any hint of a solution that could fit my needs?

like image 415
Marc-Olivier Titeux Avatar asked Jun 04 '12 20:06

Marc-Olivier Titeux


2 Answers

A very simple solution is to open the URL of the Wiki in Word's Open Document dialog, e.g. by pasting the URL http://en.wikipedia.org/w/index.php?title=Microsoft_Word&printable=yes into the File Name text box. This does not require any programming, still gives a satisfying result.

If you need a batch solution, you can write a simple script in VBA that creates and saves the documents for you:

Sub OpenFromWiki()

    Documents.Open FileName:= _
        "http://en.wikipedia.org/w/index.php?title=Microsoft_Word&printable=yes", _
         ConfirmConversions:=False, ReadOnly:=True, AddToRecentFiles:=False, _
        PasswordDocument:="", PasswordTemplate:="", Revert:=False, _
        WritePasswordDocument:=""

End Sub
like image 169
Dirk Vollmar Avatar answered Oct 20 '22 14:10

Dirk Vollmar


You could install the OpenDocument Export Extension, which will allow you to download single pages or Collections in OpenDocument format, which can be opended with MS Word.

With the mwlib python package, which is internally used by the extension, you also can easily execute batch scripts.

like image 27
Bergi Avatar answered Oct 20 '22 14:10

Bergi