Every now and then I receive a Word Document that I have to display as a web page. I'm currently using Django's flatpages to achieve this by grabbing the html content generated by MS Word. The generated html is quite messy. Is there a better way that can generate very simple html to solve this issue using Python?
Using MS Words built-in save as HTML option Go to the file menu. Select Save as. In the drop-down file type box select, Web Page, Filtered. Click Save.
Click the File menu and choose Save as. Choose where you want to save the file, and then give it a name. Click the ""Save as type"" menu and select Web Page. Click Save to save your new HTML code to the desired location.
A good solution involves uploading into Google Docs and exporting the html version from it. (There must be an api for that?)
It does so many "clean ups"; Beautiful Soup down the road can be used to make any further changes, as appropriate. It is the most powerful and elegant html parsing library on the planet.
This is a known standard for Journalist companies.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With