I'm looking for a solution for generating a PDF from an HTML5/CSS3 document, serverside.
I know there is plenty of solution for creating a PDF (like FOP, iText...), but I need to make sure it will look 100% the same than the HTML page. So, I don't want to create a PDF element by element like FOP or iText.
Actually, something should exists because that's what you do when you print as PDF from your Browser. Ideally, the solution should embed a web browser engine (webkit or gecko). I tried wkHtmlToPdf... but the result is not good at all (the HTML5 canvas is not even printed...)
If someone have an idea of any solution, free or not, any language... I will appreciate A LOT! Thanks!!
I have used PhantomJS to generate png images from web pages and it can produce PDF as well and the quality is good usually. The property is called screen capture and described here. The supported formats are PNG, JPEG, GIF and PDF.
When converted to PDF, texts of pages retain as texts.
After testing few other libraries or programs, found PhantomJS the most perfect solution. PhantomJS uses WebKit, a real layout and rendering engine.
Few examples are in https://github.com/ariya/phantomjs/wiki/Examples. In the section Rendering/rasterization there is mentioned the following script that helps you in the process:
rasterize.js rasterizes a web page to image or PDF
PhantomJS QuicStart Guide says:
Producing PDF output is possible, e.g. from a Wikipedia article:
phantomjs rasterize.js 'http://en.wikipedia.org/w/index.php?title=Jakarta&printable=yes' jakarta.pdf
or when creating printer-ready cheat sheet:
phantomjs rasterize.js http://www.nihilogic.dk/labs/webgl_cheat_sheet/WebGL_Cheat_Sheet.htm webgl.pdf
I tested pdf-generation of few pages and if page follows standards, it produces good results. Text is selectable and printable as high-quality, but on some pages layout in pdf is not the very same as in png. Below is two screenshots which are generated using commands:
$ phantomjs rasterize.js 'http://windows.microsoft.com/en-US/windows/home' microsoft.png
$ phantomjs rasterize.js 'http://windows.microsoft.com/en-US/windows/home' microsoft.pdf
I tested also http://lab.simurai.com/buttons/. The pdf and png was very identical and below is a sample of pdf that I rasterized to 5641px wide and cropped a region of it. As in previous PDF example, text is selectable in PDF and as you see, text is sharp (no antialias!).
INSTALLING
I tried first to install Qt library and PhantomJS on Centos5 compiling from source, but no luck. Then on Ubuntu 11.10 and the process was painless:
I downloaded http://phantomjs.googlecode.com/files/phantomjs-1.7.0-linux-x86_64.tar.bz2 and extracted it using
tar -xjvf phantomjs-1.7.0-linux-x86_64.tar.bz2
And then copied phantomjs executable to bin dir of system:
$ cp phantomjs-1.7.0-linux-x86_64/bin/phantomjs /usr/local/bin/phantomjs
and phantomjs was ready to run.
If the generated PDF is not good, you may try to update Webkit, but I suppose that the result should be sufficient. The PhantomJS has excellent update cycle, so bugs should be fixed in reasonable time.
PhantomJS FAQ has also good information of possibilities.
Depending on the complexity of your HTML you could use XmlWorker, which is a project by the iText developers and uses iText.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With