I'm using openhtmltopdf to transform html to pdf. Currently I'm getting an exception if the html contains german characters, like for example ä,ö,ü.
PdfRendererBuilder builder = new PdfRendererBuilder();
builder.useFastMode();
builder.withHtmlContent(html,"file://localhost/");
builder.toStream(out);
builder.run();
org.xml.sax.SAXParseException; lineNumber: 17; columnNumber: 31; The entity "auml" was referenced, but not declared.
Here my html:
<html>
<head>
<meta charset="UTF-8" />
</head>
<body>
käse
</body>
</html>
The exported word is "käse" (cheese).
UPDATE
I have tried with an entity resolver, in this way:
DocumentBuilderFactory factory=DocumentBuilderFactory.newInstance();
DocumentBuilder builder=null;
try{
builder=factory.newDocumentBuilder();
ByteArrayInputStream input=new ByteArrayInputStream(html.getBytes("UTF-8"));
builder.setEntityResolver(FSEntityResolver.instance());
org.w3c.dom.Document doc=builder.parse(input);
}catch(Exception e){
logger.error(e.getMessage(),e);
}
but I'm still getting the same exception at "parse".
Looks like you either need to provide DTD or replace the entity name auml with its corresponding hex or decimal value, i.e. ä or ä respectively. See A.2. Entity Sets and HTML 4 Entity Names.
The html content would look like this:
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE html [
<!ENTITY auml "ä">
]>
<html>
<head>
</head>
<body>
käse
</body>
</html>
Alternatively, you can run through the html string and replace the entity names with their corresponding dec/hex values, which should be fine, or just prepend the DTD to your html string before passing it to the pdf builder.
Update
You might want to give the jsoup library a try. It It parses and provides you with a org.w3c.dom.Document, e.g.
Document jsoupDoc = Jsoup.parse(html); // org.jsoup.nodes.Document
W3CDom w3cDom = new W3CDom(); // org.jsoup.helper.W3CDom
org.w3c.dom.Document w3cDoc = w3cDom.fromJsoup(jsoupDoc);
You can then pass the w3cDoc to the pdf builder like so
PdfRendererBuilder builder = new PdfRendererBuilder();
builder.withW3cDocument(w3cDoc, "file://localhost/");
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With