Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Which methods can be used to return valid and invalid XML data from a file in Java?

I have the following data that is supposed to be XML:

<?xml version="1.0" encoding="UTF-8"?>
<Product>
    <id>1</id>
    <description>A new product</description>
    <price>123.45</price>
</Product>

<Product>
    <id>1</id>
    <description>A new product</description>
    <price>123.45</price>
</Product>

<ProductTTTTT>
    <id>1</id>
    <description>A new product</description>
    <price>123.45</price>
</Product>

<Product>
    <id>1</id>
    <description>A new product</description>
    <price>123.45</price>
</ProductAAAAAA>

So, basically I have multiple root elements (product)...

The point is that I'm trying to transform this data into 2 XML documents, 1 for valid nodes and other for invalid nodes.

Valid node:

<Product>
   ...
</Product>

Invalid nodes: <ProductTTTTT>...</Product> and <Product>...</ProductAAAAAA>

Then I am thinking how I can achieve this using JAVA (not web).

  • If I am not wrong, validating it with a XSD will invalidate the whole file, so not an option.
  • Using default JAXB parser (unmarshaller) will lead to item above since internally it creates a XSD of my entity.
  • Using XPath just (from what I know) will just return the whole file, I did not find a way to get something like GET !VALID (It is just to explain...)
  • Using XQuery (maybe?).. by the way, how to use XQuery with JAXB?
  • XSL(T) will lead to same thing on XPath, since it uses XPath to select the content.

So... which method can I use to achieve the objective? (And if possible, provide links or code please)

like image 347
Felipe C. Avatar asked Mar 08 '26 02:03

Felipe C.


1 Answers

Firstly, you're confusing valid and well-formed. You say you want to find invalid elements, but your examples aren't just invalid, they are ill-formed. That means that no XML parser is going to do anything with them other than throwing an error message at you. You can't use JAXB or XPath, or XQuery, or XSLT, or anything to process something that isn't XML.

You say "unfortunately I do not have access to the system that sends this xml format". I'm not sure why you call it an XML format: it isn't. I also don't understand why you (and many others on StackOverflow) are prepared to spend your time digging in garbage like this rather than telling the sender to get their act together. If you were served a salad with maggots in it, would you try to pick them out, or would you send it back for replacement? You should adopt a zero-tolerance approach to bad data; that's the only way senders will learn to improve the quality.

like image 102
Michael Kay Avatar answered Mar 09 '26 18:03

Michael Kay