This is the 2nd day I spend investigating with no results. At least now, I am able to ask something very specific.
I am trying to write a valid HTML code that contains some non-Latin characters in a PDF file using iText and more specifically using ITextRenderer from Flying Saucer.
My short example/code starts by initializing a string variable doc with this value:
String doc = "<?xml version=\"1.0\" encoding=\"UTF-8\"?><html xmlns=\"http://www.w3.org/1999/xhtml\" lang=\"en\">"
+ "<body>Some greek characters: Καλημέρα Some greek characters"
+ "</body></html>";
Here is the code that I use for debugging purposes. I save this string to HTML file and then I open it through a browser just to double check that HTML content is valid and I can still read Greek characters:
//write for debugging purposes in an html file
File newTextFile = new File("C:/work/test.html");
FileWriter fw = new FileWriter(newTextFile);
fw.write(doc);
fw.close();
Next step is to try to write this value in the PDF file. This is my code:
ITextRenderer renderer = new ITextRenderer();
//add some fonts - if paths are not right, an exception will be thrown
renderer.getFontResolver().addFont("c:/work/fonts/TIMES.TTF", BaseFont.IDENTITY_H, BaseFont.EMBEDDED);
renderer.getFontResolver().addFont("c:/work/fonts/TIMESBD.TTF", BaseFont.IDENTITY_H, BaseFont.EMBEDDED);
renderer.getFontResolver().addFont("c:/work/fonts/TIMESBI.TTF", BaseFont.IDENTITY_H, BaseFont.EMBEDDED);
renderer.getFontResolver().addFont("c:/work/fonts/TIMESI.TTF", BaseFont.IDENTITY_H, BaseFont.EMBEDDED);
final DocumentBuilderFactory documentBuilderFactory = DocumentBuilderFactory
.newInstance();
documentBuilderFactory.setValidating(false);
DocumentBuilder builder = documentBuilderFactory.newDocumentBuilder();
builder.setEntityResolver(FSEntityResolver.instance());
org.w3c.dom.Document document = builder.parse(new ByteArrayInputStream(
doc.toString().getBytes("UTF-8")));
renderer.setDocument(document, null);
renderer.layout();
renderer.createPDF(os);
The final outcome of my code is:
In HTML file I get: Some greek characters: Καλημέρα Some greek characters (expected)
In PDF file I get: Some greek characters: Some greek characters (unexpected - greek characters are ignored!!)
Dependencies:
java version "1.6.0_27"
itext-2.0.8.jar
de.huxhorn.lilith.3rdparty.flyingsaucer.core-renderer-8Pre2.jar
I also have been experimented with much more fonts, but I guess that my problem has nothing to do with using wrong fonts. Any help is more than welcome.
Thanx
i am from Czech Republic, and had same problem with our national symbols! After some searching, i managed to solve it with this solution.
Specifically with (which you already have):
renderer
.getFontResolver()
.addFont(fonts.get(i).getFile().getPath(),
BaseFont.IDENTITY_H,
BaseFont.NOT_EMBEDDED);
and then important part in CSS:
* {
font-family: Verdana;
/* font-family: Times New Roman; - alternative. Without ""! */
}
It seems to me, without that css, your fonts are not used. When i remove theese lines from CSS, encoding is broken again.
Hope this will help!
Add to your HTML something like this:
<?xml version='1.0' encoding='UTF-8'?>
<!DOCTYPE html>
<html>
<head>
<meta http-equiv='Content-Type' content='text/html; charset=UTF-8'/>
<style type='text/css'>
* { font-family: 'Arial Unicode MS'; }
</style>
</head>
<body>
<span>Some text with šđčćž characters</span>
</body>
</html>
and then add FontResolver to ITextRenderer in java code:
ITextRenderer renderer = new ITextRenderer();
renderer.getFontResolver().addFont("fonts/ARIALUNI.TTF", BaseFont.IDENTITY_H, BaseFont.NOT_EMBEDDED);
works great for Croatian characters
jars used for generating PDF are:
core-renderer.jar
iText-2.0.8.jar
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With