Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do you convert a Word Document into very simple html in Python? [closed]

Every now and then I receive a Word Document that I have to display as a web page. I'm currently using Django's flatpages to achieve this by grabbing the html content generated by MS Word. The generated html is quite messy. Is there a better way that can generate very simple html to solve this issue using Python?

like image 208
Thierry Lam Avatar asked Oct 20 '09 19:10

Thierry Lam


People also ask

How do I convert a Word document to HTML format?

Using MS Words built-in save as HTML option Go to the file menu. Select Save as. In the drop-down file type box select, Web Page, Filtered. Click Save.

How do I change a DOCX to HTML?

Click the File menu and choose Save as. Choose where you want to save the file, and then give it a name. Click the ""Save as type"" menu and select Web Page. Click Save to save your new HTML code to the desired location.


1 Answers

A good solution involves uploading into Google Docs and exporting the html version from it. (There must be an api for that?)

It does so many "clean ups"; Beautiful Soup down the road can be used to make any further changes, as appropriate. It is the most powerful and elegant html parsing library on the planet.

This is a known standard for Journalist companies.

like image 158
lprsd Avatar answered Sep 25 '22 22:09

lprsd