trouble using xhtml2pdf with unicode

Question

I've been trying to convert Hebrew html files without success; the Hebrew characters show up in the output PDF as black rectangles regardless of any encoding I tried.

I tried some unicode test files included in the pisa distribution: pisa-3.0.33 est est-unicode-all.html and est-bidirectional-text.html . I ran xhtml2pdf from the command line both with and without --encoding utf-8. Same result: none of the non-Latin characters made it through.

Is this a fonts problem*? If the unicode test file works for you, was there anything you did to set it up?

*FWIW, at least some of these languages, including Hebrew, should work with Arial.

EDIT: Alternatively, if someone has pisa set up and could try converting the unicode test file above, I would be very grateful.

eviltrue · Accepted Answer

Inserting following code into html helped me

<style>
@page {
size: a4;
margin: 0.5cm;
}

@font-face {
font-family: "Verdana";
src: url("verdana.ttf");
}

html {
font-family: Verdana;
font-size: 11pt;
}

</style>

in url instead of "verdana.ttf" you should put absolute path to font in your os

OrPo · Answer

If anyone in the future tries, like me, to figure out how to PROPERLY create a PDF file that contains Hebrew using xhtml2pdf, here's what worked for me:

First thing: including the fonts settings as described here by @eviltrue in my HTML. This can be any font as long as it supports Hebrew characters, otherwise any Hebrew characters in the input HTML would simply appear as black rectangles in the PDF.
At the time of writing this answer, while it is possible to output Hebrew characters to PDF in xhtml2pdf, Hebrew characters are outputted in revers order, i.e. שלום כיתה א
would be א התיכ םולש.

At this point I was stuck, but then I stumbled upon this SO asnwer: https://stackoverflow.com/a/15449145/1918837

After installing the python-bidi package, here is an example of a complete solution (used in a python app):

from bidi import algorithm as bidialg
from xhtml2pdf import pisa

HTMLINPUT = """
            <!DOCTYPE html>
            <html>
            <head>
               <meta http-equiv="content-type" content="text/html; charset=utf-8">
               <style>
                  @page {
                      size: a4;
                      margin: 1cm;
                  }

                  @font-face {
                      font-family: DejaVu;
                      src: url(my_fonts_dir/DejaVuSans.ttf);
                  }

                  html {
                      font-family: DejaVu;
                      font-size: 11pt;
                  }
               </style>
            </head>
            <body>
               <div>Something in English - משהו בעברית</div>
            </body>
            </html>
            """

pdf = pisa.CreatePDF(bidialg.get_display(HTMLINPUT, base_dir="L"), outpufile)

# I'm using base_dir="L" so that "< >" signs in HTML tags wouldn't be
flipped by the bidi algorithm

The nice thing about the bidi algorithm is that you can have mixed RTL and LTR languages in the same line (like in the HTML example above) and still have a correctly formatted result.

EDIT: The best way to go now is definitely using wkhtmltopdf

trouble using xhtml2pdf with unicode

Tags:

pdf

unicode

pisa

hebrew

user490616

2 Answers

eviltrue

OrPo

Recent Activity

Donate For Us

trouble using xhtml2pdf with unicode

Tags:

pdf

unicode

pisa

hebrew

user490616

2 Answers

eviltrue

OrPo

Related questions

Recent Activity

Donate For Us