Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Best way to programmatically save a webpage to a Static HTML File

The more research I do, the more grim the outlook becomes.

I am trying to Flat Save, or Static Save a webpage with Python. This means merging all the styles to inline properties, and changing all links to absolute URLs.

I've tried nearly every free conversion website, api, and even libraries on github. None are that impressive. The best python implementation I could find for flattening styles is https://github.com/davecranwell/inline-styler. I adapted that slightly for Flask, but the generated file isn't that great. Here's how it looks:

enter image description here

Obviously, it should look better. Here's what it should look like:

https://dzwonsemrish7.cloudfront.net/items/3U302I3Y1H0J1h1Z0t1V/Screen%20Shot%202012-12-19%20at%205.51.44%20PM.png?v=2d0e3d26

It seems like a neverending struggle dealing with Malformed html, unrecognized CSS properties, Unicode errors, etc. So does anyone have a suggestion on a better way to do this? I understand I can go to file -> save in my local browser, but when I am trying to do this en mass, and extract a particular xpath that's not really viable.

It looks like Evernote's web clipper uses iFrames, but that seems more complicated than I think it should be. But at least the clippings look decent on Evernote.

like image 837
Nick Woodhams Avatar asked Dec 19 '12 23:12

Nick Woodhams


1 Answers

After walking away for a while, I managed to install a ruby library that flattens the CSS much much better than anything else I've used. It's the library behind the very slow web interface here http://premailer.dialect.ca/

Thank goodness they released the source on Github, it's the best hands down. https://github.com/alexdunae/premailer

It flattens styles, creates absolute urls, works with a URL or string, and can even create plain text email templates. Very impressed with this library.

Update Nov 2013

I ended up writing my own bookmarklet that works purely client side. It is compatible with Webkit and FireFox only. It recurses through each node and adds inline styles then sends the flattened HTML to the clippy.in API to save to the user's dashboard.

Client Side Bookmarklet

like image 99
Nick Woodhams Avatar answered Sep 28 '22 17:09

Nick Woodhams