Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

I need to parse non well-formed xml data (HTML)

I have some non well-formed xml (HTML) data in JAVA, I used JAXP Dom, but It complains.

The Question is :Is there any way to use JAXP to parse such documents ??

I have a file containing data such as :

<employee>
 <name value="ahmed" > <!-- note, this element is not closed, So it is not well-formed xml-->
</employee>
like image 544
Muhammad Hewedy Avatar asked Apr 01 '10 13:04

Muhammad Hewedy


People also ask

How do I fix XML parsing error not well-formed?

You will need to post more information. "XML Parsing Error" occurs when something is trying to read the XML, not when it is being generated. Also, "not well-formed" usually refers to errors in the structure of the document, such as a missing end-tag, not the characters it contains.

Can an XML parser parse HTML?

You can try parsing an HTML file using a XML parser, but it's likely to fail. The reason is that HTML documents can have the following HTML features that XML parsers don't understand. XML parsers will fail to parse any HTML document that uses any of those features.

What will a typical parser do if an XML document is not well-formed?

If the document is not well-formed, the XML processor should report one or more errors encountered, and normal processing, including the passing of parsed data to the application, should stop.


1 Answers

You could try running your document through the jtidy API first - that has the ability to convert html into valid xhtml: http://jtidy.sourceforge.net/howto.html

Tidy tidy = new Tidy();
tidy.setXHTML(true);
tidy.parse(......)...
like image 161
simonlord Avatar answered Oct 28 '22 16:10

simonlord