Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Better way to parse xml

Tags:

java

xml

sax

I've been parsing XML like this for years, and I have to admit when the number of different element becomes larger I find it a bit boring and exhausting to do, here is what I mean, sample dummy XML:

<?xml version="1.0"?>
<Order>
    <Date>2003/07/04</Date>
    <CustomerId>123</CustomerId>
    <CustomerName>Acme Alpha</CustomerName>
    <Item>
        <ItemId> 987</ItemId>
        <ItemName>Coupler</ItemName>
        <Quantity>5</Quantity>
    </Item>
    <Item>
        <ItemId>654</ItemId>
        <ItemName>Connector</ItemName>
        <Quantity unit="12">3</Quantity>
    </Item>
    <Item>
        <ItemId>579</ItemId>
        <ItemName>Clasp</ItemName>
        <Quantity>1</Quantity>
    </Item>
</Order>

This is relevant part (using sax) :

public class SaxParser extends DefaultHandler {

    boolean isItem = false;
    boolean isOrder = false;
    boolean isDate = false;
    boolean isCustomerId = false;
    private Order order;
    private Item item;

        @Override
    public void startElement(String namespaceURI, String localName, String qName, Attributes atts) {
        if (localName.equalsIgnoreCase("ORDER")) {
            order = new Order();
        }

        if (localName.equalsIgnoreCase("DATE")) {
            isDate = true;
        }

        if (localName.equalsIgnoreCase("CUSTOMERID")) {
            isCustomerId = true;
        }

        if (localName.equalsIgnoreCase("ITEM")) {
            isItem = true;
        }
    }

    public void characters(char ch[], int start, int length) throws SAXException {

        if (isDate){
            SimpleDateFormat formatter = new SimpleDateFormat("yyyy/MM/dd");
            String value = new String(ch, start, length);
            try {
                order.setDate(formatter.parse(value));
            } catch (ParseException e) {
                e.printStackTrace();
            }
        }

        if(isCustomerId){
            order.setCustomerId(Integer.valueOf(new String(ch, start, length)));
        }

        if (isItem) {
            item = new Item();
            isItem = false;
        }



    }

}

I'm wondering is there a way to get rid of these hideous booleans which keep growing with number of elements. There must be a better way to parse this relatively simple xml. Just by looking the lines of code necessary to do this task looks ugly.

Currently I'm using SAX parser, but I'm open to any other suggestions (other than DOM, I can't afford in memory parsers I have huge XML files).

like image 558
Gandalf StormCrow Avatar asked Mar 25 '13 23:03

Gandalf StormCrow


People also ask

How many ways we can parse XML file?

Java provides many ways to parse an XML file. There are two parsers in Java which parses an XML file: Java DOM Parser. Java SAX Parser.

How do you parse an XML file?

To parse XML documents, use the XML PARSE statement, specifying the XML document that is to be parsed and the processing procedure for handling XML events that occur during parsing, as shown in the following code fragment.

Is XML easy to parse?

Well parsing XML is not an easy task. Its basic structure is a tree with any node in tree capable of holding a container which consists of an array of more trees.


2 Answers

If you control the definition of the XML, you could use an XML binding tool, for example JAXB (Java Architecture for XML Binding.) In JAXB you can define a schema for the XML structure (XSD and others are supported) or annotate your Java classes in order to define the serialization rules. Once you have a clear declarative mapping between XML and Java, marshalling and unmarshalling to/from XML becomes trivial.

Using JAXB does require more memory than SAX handlers, but there exist methods to process the XML documents by parts: Dealing with large documents.

JAXB page from Oracle

like image 97
Marcelo Avatar answered Sep 22 '22 08:09

Marcelo


Here's an example of using JAXB with StAX.

Input document:

<?xml version="1.0" encoding="UTF-8"?>
<Personlist xmlns="http://example.org">
    <Person>
        <Name>Name 1</Name>
        <Address>
            <StreetAddress>Somestreet</StreetAddress>
            <PostalCode>00001</PostalCode>
            <CountryName>Finland</CountryName>
        </Address>
    </Person>
    <Person>
        <Name>Name 2</Name>
        <Address>
            <StreetAddress>Someotherstreet</StreetAddress>
            <PostalCode>43400</PostalCode>
            <CountryName>Sweden</CountryName>
        </Address>
    </Person>
</Personlist>

Person.java:

@XmlRootElement(name = "Person", namespace = "http://example.org")
public class Person {
    @XmlElement(name = "Name", namespace = "http://example.org")
    private String name;
    @XmlElement(name = "Address", namespace = "http://example.org")
    private Address address;

    public String getName() {
        return name;
    }

    public Address getAddress() {
        return address;
    }
}

Address.java:

public class Address {
    @XmlElement(name = "StreetAddress", namespace = "http://example.org")
    private String streetAddress;
    @XmlElement(name = "PostalCode", namespace = "http://example.org")
    private String postalCode;
    @XmlElement(name = "CountryName", namespace = "http://example.org")
    private String countryName;

    public String getStreetAddress() {
        return streetAddress;
    }

    public String getPostalCode() {
        return postalCode;
    }

    public String getCountryName() {
        return countryName;
    }
}

PersonlistProcessor.java:

public class PersonlistProcessor {
    public static void main(String[] args) throws Exception {
        new PersonlistProcessor().processPersonlist(PersonlistProcessor.class
                .getResourceAsStream("personlist.xml"));
    }

    // TODO: Instead of throws Exception, all exceptions should be wrapped
    // inside runtime exception
    public void processPersonlist(InputStream inputStream) throws Exception {
        JAXBContext jaxbContext = JAXBContext.newInstance(Person.class);
        XMLStreamReader xss = XMLInputFactory.newFactory().createXMLStreamReader(inputStream);
        // Create unmarshaller
        Unmarshaller unmarshaller = jaxbContext.createUnmarshaller();
        // Go to next tag
        xss.nextTag();
        // Require Personlist
        xss.require(XMLStreamReader.START_ELEMENT, "http://example.org", "Personlist");
        // Go to next tag
        while (xss.nextTag() == XMLStreamReader.START_ELEMENT) {
            // Require Person
            xss.require(XMLStreamReader.START_ELEMENT, "http://example.org", "Person");
            // Unmarshall person
            Person person = (Person)unmarshaller.unmarshal(xss);
            // Process person
            processPerson(person);
        }
        // Require Personlist
        xss.require(XMLStreamReader.END_ELEMENT, "http://example.org", "Personlist");
    }

    private void processPerson(Person person) {
        System.out.println(person.getName());
        System.out.println(person.getAddress().getCountryName());
    }
}
like image 24
Sami Korhonen Avatar answered Sep 24 '22 08:09

Sami Korhonen