Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Convert HTML to PDF by iText & XMLWorker with Polish Letters

I've got a string with an example in it - it works really great, but when I'm adding polish letters, they're gone. I tried something like this:

        byte[] byteArray = str.getBytes(Charset.forName("UTF-8"));
        ByteArrayInputStream byteArrayInputStream = new ByteArrayInputStream(byteArray);
        worker.parseXHtml(pdfWriter, document, byteArrayInputStream, Charset.forName("UTF-8"));

but it doesn't change anything. How to add polish letters?

EDIT: It still doesn't work.

Code:

        document.open();

        XMLWorkerHelper worker = XMLWorkerHelper.getInstance();
        String str = "<html><head></head><body style=\"font-size:12.0pt; font-family:Times New Roman\">"+
                "<a href='http://www.rgagnon.com/howto.html'><b>Real's HowTo</b></a>" +
                "<h1>Show your support</h1>" +
                "<p>It DOES cost a lot to produce this site - in ISP storage and transfer fees</p>" +
                "<p>TEST POLSKICH ZNAKÓW: ĄąćCÓ󣳯żŹźĘę</p>" +
                "<hr/>" +
                "<p>the huge amounts of time it takes for one person to design and write the actual content.</p>" +
                "<p>If you feel that effort has been useful to you, perhaps you will consider giving something back?</p>" +
                "<p>Donate using PayPalŽ</p>" +
                "<p>Contributions via PayPal are accepted in any amount</p>" +
                "<p><br/><table border='1'><tr><td>Java HowTo</td></tr><tr>" +
                "<td style='background-color:red;'>Javascript HowTo</td></tr>" +
                "<tr><td>Powerbuilder HowTo</td></tr></table></p>" +
                "</body></html>";

        byte[] byteArray = str.getBytes(Charset.forName("UTF-8"));
        ByteArrayInputStream byteArrayInputStream = new ByteArrayInputStream(byteArray);
        worker.parseXHtml(pdfWriter, document, byteArrayInputStream, Charset.forName("UTF-8"));

        document.close();

Maybe someone will find a bug.

like image 394
KurdTt- Avatar asked Feb 11 '23 16:02

KurdTt-


2 Answers

I have taken your sample HTML and I have used it to create the ParseHtml2 example. The resulting PDF, html_2.pdf, looks like this:

enter image description here

At first sight, I don't see any issues with the Polish characters.

The code I used looks like this:

public void createPdf(String file) throws IOException, DocumentException {
    // step 1
    Document document = new Document();
    // step 2
    PdfWriter writer = PdfWriter.getInstance(document, new FileOutputStream(file));
    // step 3
    document.open();
    // step 4
    String str = "<html><head></head><body style=\"font-size:12.0pt; font-family:Times New Roman\">"+
            "<a href='http://www.rgagnon.com/howto.html'><b>Real's HowTo</b></a>" +
            "<h1>Show your support</h1>" +
            "<p>It DOES cost a lot to produce this site - in ISP storage and transfer fees</p>" +
            "<p>TEST POLSKICH ZNAKÓW: \u0104\u0105\u0106\u0107\u00d3\u00f3\u0141\u0142\u0179\u017a\u017b\u017c\u017d\u017e\u0118\u0119</p>" +
            "<hr/>" +
            "<p>the huge amounts of time it takes for one person to design and write the actual content.</p>" +
            "<p>If you feel that effort has been useful to you, perhaps you will consider giving something back?</p>" +
            "<p>Donate using PayPal\u017d</p>" +
            "<p>Contributions via PayPal are accepted in any amount</p>" +
            "<p><br/><table border='1'><tr><td>Java HowTo</td></tr><tr>" +
            "<td style='background-color:red;'>Javascript HowTo</td></tr>" +
            "<tr><td>Powerbuilder HowTo</td></tr></table></p>" +
            "</body></html>";

    XMLWorkerHelper worker = XMLWorkerHelper.getInstance();
    InputStream is = new ByteArrayInputStream(str.getBytes(StandardCharsets.UTF_8));
    worker.parseXHtml(writer, document, is, Charset.forName("UTF-8"));
    // step 5
    document.close();
}

Note that you have defined Times New Roman as the font. It is essential that your OS has access to a font with that name, otherwise you'll still end up with Helvetica.

Also be aware that using non-ASCII characters in source code is considered a crime against good taste. Source code is stored as a text file, but using which encoding? There is no guarantee that your file will be stored as UTF-8, there is no guarantee that a compiler will read it as UTF-8, there is no guarantee that a versioning system will accept UTF-8,... Hence I replaced all UTF-8 characters by their unicode value which allows me to keep the source file in ASCII.

like image 66
Bruno Lowagie Avatar answered Mar 01 '23 09:03

Bruno Lowagie


I have taken Bruno sample HTML and changed that function for C# users. I am using PdfFileName as a property to get and set the file name.

    public string PdfFileName { get; set; }
    public void CreatePdf()
    {
        // replace this code with you full pdf name which you want to create
        PdfFileName = EU.Master_Data_Utility.obj.Get_Current_DateTimeInteger(_connFlag) + ".pdf"; 

        String str = "<html><head></head><body style=\"font-size:12.0pt; font-family:Times New Roman\">" +
                "<a href='http://www.rgagnon.com/howto.html'><b>Real's HowTo</b></a>" +
                "<h1>Show your support</h1>" +
                "<p>It DOES cost a lot to produce this site - in ISP storage and transfer fees</p>" +
                "<p>TEST POLSKICH ZNAKÓW: \u0104\u0105\u0106\u0107\u00d3\u00f3\u0141\u0142\u0179\u017a\u017b\u017c\u017d\u017e\u0118\u0119</p>" +
                "<hr/>" +
                "<p>the huge amounts of time it takes for one person to design and write the actual content.</p>" +
                "<p>If you feel that effort has been useful to you, perhaps you will consider giving something back?</p>" +
                "<p>Donate using PayPal\u017d</p>" +
                "<p>Contributions via PayPal are accepted in any amount</p>" +
                "<p><br/><table border='1'><tr><td>Java HowTo</td></tr><tr>" +
                "<td style='background-color:red;'>Javascript HowTo</td></tr>" +
                "<tr><td>Powerbuilder HowTo</td></tr></table></p>" +
                "</body></html>";

        StringReader sr = new StringReader(str.ToString());
        Document doc = new Document(PageSize.A4, 10f, 10f, 10f, 10f);
        PdfWriter pdfWriter = PdfWriter.GetInstance(doc, new FileStream(Server.MapPath(PdfFileName), FileMode.Create));
        doc.Open();
        XMLWorkerHelper.GetInstance().ParseXHtml(pdfWriter, doc, sr);
        doc.Close();
        // Created a new function to open created file
        OpenPDFFile();

    }
    protected void OpenPDFFile()
    {
        //Open the PDF file
        Process.Start(Server.MapPath(PdfFileName));
    }
like image 39
Nitish Saini Avatar answered Mar 01 '23 10:03

Nitish Saini