Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

HTML5 to PDF serverside [closed]

I'm looking for a solution for generating a PDF from an HTML5/CSS3 document, serverside.

I know there is plenty of solution for creating a PDF (like FOP, iText...), but I need to make sure it will look 100% the same than the HTML page. So, I don't want to create a PDF element by element like FOP or iText.

Actually, something should exists because that's what you do when you print as PDF from your Browser. Ideally, the solution should embed a web browser engine (webkit or gecko). I tried wkHtmlToPdf... but the result is not good at all (the HTML5 canvas is not even printed...)

If someone have an idea of any solution, free or not, any language... I will appreciate A LOT! Thanks!!

like image 662
Olivier Avatar asked Oct 02 '12 07:10

Olivier


2 Answers

I have used PhantomJS to generate png images from web pages and it can produce PDF as well and the quality is good usually. The property is called screen capture and described here. The supported formats are PNG, JPEG, GIF and PDF.

When converted to PDF, texts of pages retain as texts.

After testing few other libraries or programs, found PhantomJS the most perfect solution. PhantomJS uses WebKit, a real layout and rendering engine.

Few examples are in https://github.com/ariya/phantomjs/wiki/Examples. In the section Rendering/rasterization there is mentioned the following script that helps you in the process:

rasterize.js rasterizes a web page to image or PDF

PhantomJS QuicStart Guide says:

Producing PDF output is possible, e.g. from a Wikipedia article:

phantomjs rasterize.js 'http://en.wikipedia.org/w/index.php?title=Jakarta&printable=yes' jakarta.pdf

or when creating printer-ready cheat sheet:

phantomjs rasterize.js http://www.nihilogic.dk/labs/webgl_cheat_sheet/WebGL_Cheat_Sheet.htm webgl.pdf

I tested pdf-generation of few pages and if page follows standards, it produces good results. Text is selectable and printable as high-quality, but on some pages layout in pdf is not the very same as in png. Below is two screenshots which are generated using commands:

$ phantomjs rasterize.js 'http://windows.microsoft.com/en-US/windows/home' microsoft.png

$ phantomjs rasterize.js 'http://windows.microsoft.com/en-US/windows/home' microsoft.pdf 

Example of png and pdf generation using Phantomjs

I tested also http://lab.simurai.com/buttons/. The pdf and png was very identical and below is a sample of pdf that I rasterized to 5641px wide and cropped a region of it. As in previous PDF example, text is selectable in PDF and as you see, text is sharp (no antialias!).

CSS3Buttons

INSTALLING

I tried first to install Qt library and PhantomJS on Centos5 compiling from source, but no luck. Then on Ubuntu 11.10 and the process was painless:

I downloaded http://phantomjs.googlecode.com/files/phantomjs-1.7.0-linux-x86_64.tar.bz2 and extracted it using

tar -xjvf phantomjs-1.7.0-linux-x86_64.tar.bz2

And then copied phantomjs executable to bin dir of system:

$ cp phantomjs-1.7.0-linux-x86_64/bin/phantomjs /usr/local/bin/phantomjs

and phantomjs was ready to run.

If the generated PDF is not good, you may try to update Webkit, but I suppose that the result should be sufficient. The PhantomJS has excellent update cycle, so bugs should be fixed in reasonable time.

PhantomJS FAQ has also good information of possibilities.

like image 61
Timo Kähkönen Avatar answered Sep 28 '22 09:09

Timo Kähkönen


Depending on the complexity of your HTML you could use XmlWorker, which is a project by the iText developers and uses iText.

like image 41
Michaël Demey Avatar answered Sep 28 '22 08:09

Michaël Demey