Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Preventing Jsoup.parse from removing the closing </img> tag

I'm parsing a piece of html with Jsoup.parse.

Everything else is great, but I'm supposed to parse this html later in pdf converter.

For some reason the Jsoup.parse removes the closing tag and the pdf-parser throws an exception about missing closing img tag.

Can't load the XML resource (using TRaX transformer). org.xml.sax.SAXParseException; 
lineNumber: 115; columnNumber: 4; The element
type "img" must be terminated by the matching end-tag "</img>"

How to prevent Jsoup.parse from removing the closing img tag?

For example this line:

<img src="C:\path\to\image\image.png"></img>

turns to:

<img src="C:\path\to\image\image.png">

Same happens with:

<img src="C:\path\to\image\image.png"/>

Here's the code:

private void createPdf(File file, String content) throws IOException, DocumentException {
        OutputStream os = new FileOutputStream(file);
            content = tidyUpHTML(content);
            ITextRenderer renderer = new ITextRenderer();
            renderer.setDocumentFromString(content);
            renderer.layout();
            renderer.createPDF(os);
        os.close();
    }

Here's the tidyUpHTML-method that is called in above method:

private String tidyUpHTML(String html) {
    org.jsoup.nodes.Document doc = Jsoup.parse(html);
    doc.select("a").unwrap();
    String fixedTags = doc.toString().replace("<br>", "<br />");
    fixedTags = fixedTags.replace("<hr>", "<hr />");
    fixedTags = fixedTags.replaceAll("&nbsp;","&#160;");
    return fixedTags;
}
like image 998
Steve Waters Avatar asked Dec 23 '22 22:12

Steve Waters


1 Answers

Your PDF converter is expecting xhtml (since it expects the closing img tag). Set up Jsoup to output to xhtml (xml) instead.

org.jsoup.nodes.Document doc = Jsoup.parse(html);
document.outputSettings().syntax( Document.OutputSettings.Syntax.xml);
doc.select("a").unwrap();
String fixedTags = doc.html();

See Is it possible to convert HTML into XHTML with Jsoup 1.8.1?

like image 120
Joeri Hendrickx Avatar answered Dec 26 '22 12:12

Joeri Hendrickx