Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to convert a simple html to pdf using wkhtmltopdf? [closed]

Here is what I did:

  1. Created a linux virtual machine in the Amazon cloud.
  2. Followed the instructions from https://code.google.com/p/wkhtmltopdf/wiki/compilation to download and compile the source code of wkhtmltopdf-qt and of wkhtmltopdf. In the end I have a static build of wkhtmltopdf.
  3. Took this html (http://jsfiddle.net/mark69_fnd/8CtjB/):

    <html> <head> <style type="text/css">p{font-family: sans-serif;};</style> </head> <body> <p>Let's Test</p> </body> </html>

  4. Ran wkhtmltopdf test.html test.pdf

  5. Copied test.pdf to my Windows desktop, opened it and got this (https://docs.google.com/file/d/0B2pbsdBJxJI3MV8zby14cGk5VWs/edit?usp=sharing): enter image description here

I followed the guide closely, the qt configuration options were taken from ../wkhtmltopdf/static_qt_conf_base and ../wkhtmltopdf/static_qt_conf_linux as the guide suggests.

Needless to say I am a bit disappointed with the result. Can anyone explain me what am I doing wrong?

P.S.

In reality I need to convert a much more complex HTML, but there is no point to talk about it when I fail to convert a trivial one.

EDIT

I wish to emphasize that I do not work on Linux, I only open a terminal to an Amazon hosted Linux box. Meaning, I do not have an X11 environment.

This is what I get when I try using the predefined wkhtmltopdf package:

ubuntu@ip-10-245-78-162:~$ which wkhtmltopdf
ubuntu@ip-10-245-78-162:~$ /usr/bin/wkhtmltopdf
-bash: /usr/bin/wkhtmltopdf: No such file or directory
ubuntu@ip-10-245-78-162:~$ sudo apt-get install wkhtmltopdf
Reading package lists... Done
Building dependency tree
Reading state information... Done
The following NEW packages will be installed:
  wkhtmltopdf
0 upgraded, 1 newly installed, 0 to remove and 120 not upgraded.
Need to get 0 B/104 kB of archives.
After this operation, 303 kB of additional disk space will be used.
Selecting previously unselected package wkhtmltopdf.
(Reading database ... 36679 files and directories currently installed.)
Unpacking wkhtmltopdf (from .../wkhtmltopdf_0.9.9-3_amd64.deb) ...
Processing triggers for man-db ...
Setting up wkhtmltopdf (0.9.9-3) ...
ubuntu@ip-10-245-78-162:~$ l test.*
-rw-r--r-- 1 ubuntu ubuntu 123 Mar 30 12:46 test.html
ubuntu@ip-10-245-78-162:~$ cat test.html
<html> <head> <style type="text/css">p{font-family: sans-serif;};</style> </head> <body> <p>Let's Test</p> </body> </html>
ubuntu@ip-10-245-78-162:~$ /usr/bin/wkhtmltopdf test.html test.pdf
wkhtmltopdf: cannot connect to X server
ubuntu@ip-10-245-78-162:~$

EDIT2

  1. I have downloaded ftp://rpmfind.net/linux/fedora/linux/development/rawhide/x86_64/os/Packages/u/urw-fonts-2.4-14.fc19.noarch.rpm
  2. Followed instructions from http://www.howtogeek.com/howto/ubuntu/install-an-rpm-package-on-ubuntu-linux/ to convert the rpm to a deb format.
  3. Installed the deb
  4. Produced pdf, but still seeing just the squares.

Here is the transcript:

ubuntu@ip-10-245-78-162:~$ sudo alien urw-fonts-2.4-14.fc19.noarch.rpm --scripts
warning: urw-fonts-2.4-14.fc19.noarch.rpm: Header V3 RSA/SHA256 Signature, key ID fb4b18e6: NOKEY
warning: urw-fonts-2.4-14.fc19.noarch.rpm: Header V3 RSA/SHA256 Signature, key ID fb4b18e6: NOKEY
warning: urw-fonts-2.4-14.fc19.noarch.rpm: Header V3 RSA/SHA256 Signature, key ID fb4b18e6: NOKEY
warning: urw-fonts-2.4-14.fc19.noarch.rpm: Header V3 RSA/SHA256 Signature, key ID fb4b18e6: NOKEY
warning: urw-fonts-2.4-14.fc19.noarch.rpm: Header V3 RSA/SHA256 Signature, key ID fb4b18e6: NOKEY
warning: urw-fonts-2.4-14.fc19.noarch.rpm: Header V3 RSA/SHA256 Signature, key ID fb4b18e6: NOKEY
warning: urw-fonts-2.4-14.fc19.noarch.rpm: Header V3 RSA/SHA256 Signature, key ID fb4b18e6: NOKEY
warning: urw-fonts-2.4-14.fc19.noarch.rpm: Header V3 RSA/SHA256 Signature, key ID fb4b18e6: NOKEY
warning: urw-fonts-2.4-14.fc19.noarch.rpm: Header V3 RSA/SHA256 Signature, key ID fb4b18e6: NOKEY
warning: urw-fonts-2.4-14.fc19.noarch.rpm: Header V3 RSA/SHA256 Signature, key ID fb4b18e6: NOKEY
warning: urw-fonts-2.4-14.fc19.noarch.rpm: Header V3 RSA/SHA256 Signature, key ID fb4b18e6: NOKEY
warning: urw-fonts-2.4-14.fc19.noarch.rpm: Header V3 RSA/SHA256 Signature, key ID fb4b18e6: NOKEY
warning: urw-fonts-2.4-14.fc19.noarch.rpm: Header V3 RSA/SHA256 Signature, key ID fb4b18e6: NOKEY
warning: urw-fonts-2.4-14.fc19.noarch.rpm: Header V3 RSA/SHA256 Signature, key ID fb4b18e6: NOKEY
warning: urw-fonts-2.4-14.fc19.noarch.rpm: Header V3 RSA/SHA256 Signature, key ID fb4b18e6: NOKEY
warning: urw-fonts-2.4-14.fc19.noarch.rpm: Header V3 RSA/SHA256 Signature, key ID fb4b18e6: NOKEY
warning: urw-fonts-2.4-14.fc19.noarch.rpm: Header V3 RSA/SHA256 Signature, key ID fb4b18e6: NOKEY
urw-fonts_2.4-15_all.deb generated
ubuntu@ip-10-245-78-162:~$ sudo dpkg -i urw-fonts_2.4-15_all.deb
Selecting previously unselected package urw-fonts.
(Reading database ... 38529 files and directories currently installed.)
Unpacking urw-fonts (from urw-fonts_2.4-15_all.deb) ...
Setting up urw-fonts (2.4-15) ...
Processing triggers for fontconfig ...
ubuntu@ip-10-245-78-162:~$  ./wkhtmltopdf/bin/wkhtmltopdf test.html test.pdf
Loading pages (1/6)
Counting pages (2/6)
Resolving links (4/6)
Loading headers and footers (5/6)
Printing pages (6/6)
Done
ubuntu@ip-10-245-78-162:~$

EDIT3

I have installed the xvfb-run package and now the default version (/usr/bin/wkhtmltopdf) can be run through it. Indeed, it is able to convert the simple test.html to pdf, however, it fails to do so for a complex html page with Javascript code. It appears as though /usr/bin/wkhtmltopdf is unable to run any Javascript code on the page being converted.

I am still puzzled why the compiled version does not work.

EDIT4

I have been unjust with the default wkhtmltopdf version. It is capable to understand Javascript in the page, it successfully converts the following html:

<html>
  <head>
    <style type="text/css">
      body {
        font-family: sans-serif;
      }
    </style>
  </head>
  <body id='body'>
    <script>
      document.getElementById('body').innerHTML = 'Hello world!';
    </script>
  </body>
</html>

I will try to understand why does it fail with a real page, but I do not know how can I troubleshoot it except by trying to get a minimal failing page by throwing away pieces of the original one.

EDIT5

OK, here is the minimal example that does not work with the default wkhtmltopdf version:

<!DOCTYPE html>
<html>
  <head>
    <style type="text/css">
        html, body {
                height: 100%;
                overflow: hidden;
        }
    </style>
  </head>
  <body>
    Hello World!
  </body>
</html>

The created pdf is empty. Here is the transcript:

ubuntu@ip-10-245-78-162:~$ cat test2.html
<!DOCTYPE html>
<html>
  <head>
    <style type="text/css">
        html, body {
                height: 100%;
                overflow: hidden;
        }
    </style>
  </head>
  <body>
    Hello World!
  </body>
</html>
ubuntu@ip-10-245-78-162:~$ xvfb-run /usr/bin/wkhtmltopdf test2.html test2.pdf ; l test2.pdf
Loading page (1/2)
Printing pages (2/2)
Done
-rw-r--r-- 1 ubuntu ubuntu 1266 Mar 31 11:16 test2.pdf
ubuntu@ip-10-245-78-162:~$ cat test2.html |sed 6d | xvfb-run /usr/bin/wkhtmltopdf - test2.pdf ; l test2.pdf
Loading page (1/2)
Printing pages (2/2)
Done
-rw-r--r-- 1 ubuntu ubuntu 4284 Mar 31 11:16 test2.pdf
ubuntu@ip-10-245-78-162:~$

Notice how removing the 6th line (height: 100%;) changes the size of the created pdf file.

EDIT6

The custom version is linked statically, whereas the default one depends on quite a few of the WebKit shared libraries:

The custom version:

ubuntu@ip-10-245-78-162:~/wkhtmltopdf/bin$ l wkhtmltopdf
-rwxr-xr-x 1 ubuntu ubuntu 35020224 Mar 31 22:26 wkhtmltopdf
ubuntu@ip-10-245-78-162:~/wkhtmltopdf/bin$ ldd !$
ldd wkhtmltopdf
        linux-vdso.so.1 =>  (0x00007fff195ff000)
        libXrender.so.1 => /usr/lib/x86_64-linux-gnu/libXrender.so.1 (0x00007fefc06db000)
        libX11.so.6 => /usr/lib/x86_64-linux-gnu/libX11.so.6 (0x00007fefc03a7000)
        libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007fefc01a2000)
        librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007fefbff9a000)
        libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007fefbfd7d000)
        libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007fefbfa7c000)
        libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007fefbf780000)
        libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007fefbf56a000)
        libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fefbf1aa000)
        /lib64/ld-linux-x86-64.so.2 (0x00007fefc08ef000)
        libxcb.so.1 => /usr/lib/x86_64-linux-gnu/libxcb.so.1 (0x00007fefbef8c000)
        libXau.so.6 => /usr/lib/x86_64-linux-gnu/libXau.so.6 (0x00007fefbed88000)
        libXdmcp.so.6 => /usr/lib/x86_64-linux-gnu/libXdmcp.so.6 (0x00007fefbeb82000)
ubuntu@ip-10-245-78-162:~/wkhtmltopdf/bin$

Now the default version:

ubuntu@ip-10-245-78-162:/usr/bin$ l wkhtmltopdf
-rwxr-xr-x 1 root root 233512 May  7  2011 wkhtmltopdf
ubuntu@ip-10-245-78-162:/usr/bin$ ldd wkhtmltopdf
        linux-vdso.so.1 =>  (0x00007fff031ff000)
        libQtWebKit.so.4 => /usr/lib/x86_64-linux-gnu/libQtWebKit.so.4 (0x00007f28a33bc000)
        libQtGui.so.4 => /usr/lib/x86_64-linux-gnu/libQtGui.so.4 (0x00007f28a26ee000)
        libQtNetwork.so.4 => /usr/lib/x86_64-linux-gnu/libQtNetwork.so.4 (0x00007f28a23a1000)
        libQtCore.so.4 => /usr/lib/x86_64-linux-gnu/libQtCore.so.4 (0x00007f28a1ecf000)
        libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007f28a1bcf000)
        libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f28a19b8000)
        libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f28a15f9000)
        libsqlite3.so.0 => /usr/lib/x86_64-linux-gnu/libsqlite3.so.0 (0x00007f28a1356000)
        libXrender.so.1 => /usr/lib/x86_64-linux-gnu/libXrender.so.1 (0x00007f28a114b000)
        libgstapp-0.10.so.0 => /usr/lib/x86_64-linux-gnu/libgstapp-0.10.so.0 (0x00007f28a0f3f000)
        libgstinterfaces-0.10.so.0 => /usr/lib/x86_64-linux-gnu/libgstinterfaces-0.10.so.0 (0x00007f28a0d2d000)
        libgstpbutils-0.10.so.0 => /usr/lib/x86_64-linux-gnu/libgstpbutils-0.10.so.0 (0x00007f28a0b09000)
        libgstvideo-0.10.so.0 => /usr/lib/x86_64-linux-gnu/libgstvideo-0.10.so.0 (0x00007f28a08ed000)
        libgstbase-0.10.so.0 => /usr/lib/x86_64-linux-gnu/libgstbase-0.10.so.0 (0x00007f28a069a000)
        libgstreamer-0.10.so.0 => /usr/lib/x86_64-linux-gnu/libgstreamer-0.10.so.0 (0x00007f28a03b2000)
        libgobject-2.0.so.0 => /usr/lib/x86_64-linux-gnu/libgobject-2.0.so.0 (0x00007f28a0163000)
        libglib-2.0.so.0 => /lib/x86_64-linux-gnu/libglib-2.0.so.0 (0x00007f289fe6e000)
        libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f289fc50000)
        libX11.so.6 => /usr/lib/x86_64-linux-gnu/libX11.so.6 (0x00007f289f91c000)
        libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f289f620000)
        libfontconfig.so.1 => /usr/lib/x86_64-linux-gnu/libfontconfig.so.1 (0x00007f289f3e9000)
        libaudio.so.2 => /usr/lib/x86_64-linux-gnu/libaudio.so.2 (0x00007f289f1d1000)
        libpng12.so.0 => /lib/x86_64-linux-gnu/libpng12.so.0 (0x00007f289efa9000)
        libz.so.1 => /lib/x86_64-linux-gnu/libz.so.1 (0x00007f289ed91000)
        libfreetype.so.6 => /usr/lib/x86_64-linux-gnu/libfreetype.so.6 (0x00007f289eaf5000)
        libSM.so.6 => /usr/lib/x86_64-linux-gnu/libSM.so.6 (0x00007f289e8ed000)
        libICE.so.6 => /usr/lib/x86_64-linux-gnu/libICE.so.6 (0x00007f289e6d2000)
        libXi.so.6 => /usr/lib/x86_64-linux-gnu/libXi.so.6 (0x00007f289e4c3000)
        libXext.so.6 => /usr/lib/x86_64-linux-gnu/libXext.so.6 (0x00007f289e2b2000)
        libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f289e0ad000)
        librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007f289dea5000)
        /lib64/ld-linux-x86-64.so.2 (0x00007f28a517e000)
        liborc-0.4.so.0 => /usr/lib/x86_64-linux-gnu/liborc-0.4.so.0 (0x00007f289dc29000)
        libgmodule-2.0.so.0 => /usr/lib/x86_64-linux-gnu/libgmodule-2.0.so.0 (0x00007f289da25000)
        libxml2.so.2 => /usr/lib/x86_64-linux-gnu/libxml2.so.2 (0x00007f289d6ca000)
        libffi.so.6 => /usr/lib/x86_64-linux-gnu/libffi.so.6 (0x00007f289d4c1000)
        libpcre.so.3 => /lib/x86_64-linux-gnu/libpcre.so.3 (0x00007f289d284000)
        libxcb.so.1 => /usr/lib/x86_64-linux-gnu/libxcb.so.1 (0x00007f289d065000)
        libexpat.so.1 => /lib/x86_64-linux-gnu/libexpat.so.1 (0x00007f289ce3b000)
        libXt.so.6 => /usr/lib/x86_64-linux-gnu/libXt.so.6 (0x00007f289cbd5000)
        libXau.so.6 => /usr/lib/x86_64-linux-gnu/libXau.so.6 (0x00007f289c9d1000)
        libuuid.so.1 => /lib/x86_64-linux-gnu/libuuid.so.1 (0x00007f289c7cc000)
        libXdmcp.so.6 => /usr/lib/x86_64-linux-gnu/libXdmcp.so.6 (0x00007f289c5c5000)
ubuntu@ip-10-245-78-162:/usr/bin$

EDIT7

Guys, I do not understand how wkhtmltopdf works for you. I have started from scratch, totally:

  1. Created a brand new Ubuntu Amazon micro instance (free tier)
  2. sudo apt-get update
  3. sudo apt-get upgrade
  4. sudo apt-get install libx11-dev
  5. sudo apt-get install libfontconfig1-dev
  6. wget https://wkhtmltopdf.googlecode.com/files/wkhtmltopdf-0.11.0_rc1-static-amd64.tar.bz2
  7. tar xjf wkhtmltopdf-0.11.0_rc1-static-amd64.tar.bz2
  8. Created test2.html with the contents from EDIT5 (see the EDIT5 transcript)
  9. Ran wkhtmltopdf-amd64 on test2.html. The produced pdf is empty!
  10. Remove line 6 or 7 from the test2.html (CSS property width or overflow) and suddenly it works!

Can anyone retrace my steps and confirm it?

EDIT8

Installed CentOS 6.4 in a VMWare VM on my laptop. Same results. wkhtmltopdf does not work on the aforementioned trivial html file.

like image 331
mark Avatar asked Mar 28 '13 22:03

mark


People also ask

How do I convert an HTML link to a PDF?

You can do this using Adobe Acrobat. In Acrobat, go to File-> Create PDF-> From File. Select your HTML doc, let it do a bit of work, and then use File-> Save As to save it. All hyperlinks will be intact.

How do I use Wkhtmltopdf?

The syntax for using the tool is fairly simple, enter the name wkhtmltopdf, followed by the URL of the web page, and the name of the PDF that you want to create, like so. Let's say you want to save a copy of a website, this is what the command will look like. That wasn't difficult now, was it?

What browser does Wkhtmltopdf use?

Historically, wkhtmltopdf was an excellent open-source HTML-to-PDF tool. It is one of the few open-source projects built solely for HTML-to-PDF generation and uses a specifically modified version of the WebKit browser engine.


1 Answers

Try to set charset declaration in your html head tag like this:

<head>
  <meta charset="utf-8">
  ...
</head>
like image 65
sepulchered Avatar answered Oct 19 '22 20:10

sepulchered