The quasi html text, looks like:
Simple<br> text <b>simple</b> text simple <BR><BR>text simple text
, I would like to parse it and create dom document. But problem is with unclosed tags, when I try this:
DocumentBuilder builder = DocumentBuilderFactory.newInstance().newDocumentBuilder();
InputSource source = new InputSource(new StringReader(
Document doc = builder.parse(source);
Error occurs: org.xml.sax.SAXParseException; The element type "br" must be terminated by the matching end-tag
I don't want replace all <br>
by <br></br>
, any solution or advice?
Use jsoup and enjoy the ease of use.
You must rewrite all well formed HTML. Basically you go through the text and create a List of all opening tags. When you find a corresponding closing tag, you can remove it from the list. When you are through, and you still have entries in this List, you know its not well formed.
The problem is where to insert the unclosed Tags. You can try to insert a corresponding closing tag, right after the next word. In your case you can simply replace the br tag by br / tag, if thats the only occurence. This way string represntes the document's content.
string = string.replace("<br>", "<br />");
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With