Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to parse XML file with "Opening and ending tag mismatch" in Java

I have and XML file with open Price tag. Is there a way to parse the file despite the error? How to skip product with error and continue parsing?

<Products>
      <Product Name="Gummi bears">
        <Price Currency="GBP">4.07</Price>
        <BestBefore Date="19-02-2014"/>
      </Product>
      <Product Name="Mounds">
        <Price Currency="AUD">5.64
        <BestBefore Date="08-04-2014"/>
      </Product>
      <Product Name="Vodka">
        <Price Currency="RUB">70</Price>
        <BestBefore Date="11-10-2014"/>
      </Product>
  </Products>
like image 651
Elchin Valiyev Avatar asked Jan 12 '16 22:01

Elchin Valiyev


2 Answers

Here's the code. It's an implementation to what BrandonArp has already mentioned.

There's a property that need to set to ignore fatal error - continue-after-fatal-error

http://apache.org/xml/features/continue-after-fatal-error 
true:   Attempt to continue parsing after a fatal error.  
false:  Stops parse on first fatal error.  
default:    false  
XMLUni Predefined Constant:     fgXercesContinueAfterFatalError  
note:   The behavior of the parser when this feature is set to true is undetermined! Therefore use this feature with extreme caution because the parser may get stuck in an infinite loop or worse.  

More detail can be found here

PriceReader class

import javax.xml.parsers.SAXParser;
import javax.xml.parsers.SAXParserFactory;

import org.xml.sax.XMLReader;

public class PriceReader {

    public static void main(String argv[]) {

        try {
        SAXParserFactory factory = SAXParserFactory.newInstance();
        SAXParser saxParser = factory.newSAXParser();

        XMLReader xmlReader = saxParser.getXMLReader();

        try {
            xmlReader.setFeature(
                            "http://apache.org/xml/features/continue-after-fatal-error",
                            true);
        } catch (SAXException e) {
            System.out.println("error in setting up parser feature");
        }

        xmlReader.setContentHandler(new PriceHandler());
        xmlReader.setErrorHandler(new MyErrorHandler());
        xmlReader.parse("bin\\com\\test\\stack\\overflow\\sax\\prices.xml");

    } catch (Throwable e) {
         System.out.println("Error -- " +e.getMessage());
    }

    }
}

PriceHandler class

import org.xml.sax.Attributes;
import org.xml.sax.SAXException;
import org.xml.sax.helpers.DefaultHandler;

public class PriceHandler extends DefaultHandler {

    public void startElement(String uri, String localName,
        String qName, Attributes attributes)
        throws SAXException {

    if (qName.equalsIgnoreCase("Product")) {
        System.out.println("Product ::: "+ attributes.getValue("Name"));
    }
  }
}

MyErrorHandler class

import org.xml.sax.ErrorHandler;
import org.xml.sax.SAXException;
import org.xml.sax.SAXParseException;

public class MyErrorHandler implements ErrorHandler {

    private String getParseExceptionInfo(SAXParseException spe) {
        String systemId = spe.getSystemId();

        if (systemId == null) {
            systemId = "null";
        }

        String info = "URI=" + systemId + " Line=" 
            + spe.getLineNumber() + ": " + spe.getMessage();

        return info;
    }

    public void warning(SAXParseException spe) throws SAXException {
        System.out.println("Warning: " + getParseExceptionInfo(spe));
    }

    public void error(SAXParseException spe) throws SAXException {
        String message = "Error: " + getParseExceptionInfo(spe);
        System.out.println(message);
    }

    public void fatalError(SAXParseException spe) throws SAXException {
        String message = "Fatal Error: " + getParseExceptionInfo(spe);
        System.out.println(message);
    }
}

Output

 Product ::: Gummi bears
Product ::: Mounds
Fatal Error: URI=file:///C:/Developer/pachat/workspaces/eclipse-default/stack-overflow/bin/com/test/stack/overflow/sax/prices.xml Line=9: The element type "Price" must be terminated by the matching end-tag "</Price>".
Fatal Error: URI=file:///C:/Developer/pachat/workspaces/eclipse-default/stack-overflow/bin/com/test/stack/overflow/sax/prices.xml Line=9: The end-tag for element type "Price" must end with a '>' delimiter.
Product ::: Vodka
Product ::: Rum
Product ::: Brezzer
Fatal Error: URI=file:///C:/Developer/pachat/workspaces/eclipse-default/stack-overflow/bin/com/test/stack/overflow/sax/prices.xml Line=21: The element type "Price" must be terminated by the matching end-tag "</Price>".
Fatal Error: URI=file:///C:/Developer/pachat/workspaces/eclipse-default/stack-overflow/bin/com/test/stack/overflow/sax/prices.xml Line=21: The end-tag for element type "Price" must end with a '>' delimiter.
Product ::: Water
Fatal Error: URI=file:///C:/Developer/pachat/workspaces/eclipse-default/stack-overflow/bin/com/test/stack/overflow/sax/prices.xml Line=26: The end-tag for element type "Product" must end with a '>' delimiter.
Fatal Error: URI=file:///C:/Developer/pachat/workspaces/eclipse-default/stack-overflow/bin/com/test/stack/overflow/sax/prices.xml Line=26: XML document structures must start and end within the same entity.
Fatal Error: URI=file:///C:/Developer/pachat/workspaces/eclipse-default/stack-overflow/bin/com/test/stack/overflow/sax/prices.xml Line=26: Premature end of file.
 Error -- processing event: -1
like image 159
TheCodingFrog Avatar answered Oct 04 '22 12:10

TheCodingFrog


The general way to deal with errors like this is to use a streaming parser. The one that comes to mind for Java is SAX.

When creating a Handler, you will be able to override/implement the error and fatalError methods. These will allow you to continue parsing, but that still leaves you to handle the actual errors.

Obviously there are many possible errors in an XML document and it'll only make sense to handle some of them. Hopefully this will give you a place to start with a parser, though.

like image 44
BrandonArp Avatar answered Oct 04 '22 13:10

BrandonArp