I want to parse a HTML file using Java and i have used DocumentBuilder class for it. My HTML contains a <img src="xyz"> tag, without a closing </img> tag,which is allowed in browser.But when i give it to DocumentBuilder for parsing it gives me this error
The element type "img" must be terminated by the matching end-tag
</img>.
Java :
DocumentBuilderFactory docBuilderFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder docBuilder = docBuilderFactory.newDocumentBuilder();
Document document = docBuilder.parse(is);
What should i do to get rid of this error?
DocumentBuilder is part of Java's XML parsing framework. An XML parser will not correctly parse HTML: the languages look similar, but XML has stricter requirements. (You've already seen one of the differences: in XML, all tags should have a matching end tag, while in HTML some tags do and some don't.)
Try a HTML parser instead. I've heard good things about jsoup (http://jsoup.org/).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With