Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Java SAX Parsing

Tags:

java

xml

sax

There's an XML stream which I need to parse. Since I only need to do it once and build my java objects, SAX looks like the natural choice. I'm extending DefaultHandler and implementing the startElement, endElement and characters methods, having members in my class where I save the current read value (taken in the characters method).

I have no problem doing what I need, but my code got quite complex and I'm sure there's no reason for that and that I can do things differently. The structure of my XML is something like this:

<players>
  <player>
    <id></id>
    <name></name>
    <teams total="2">
      <team>
        <id></id>
        <name></name>
        <start-date>
          <year>2009</year>
          <month>9</month>
        </start-date>
        <is-current>true</is-current>
      </team>
      <team>
        <id></id>
        <name></name>
        <start-date>
          <year>2007</year>
          <month>11</month>
        </start-date>
        <end-date>
          <year>2009</year>
          <month>7</month>
        </end-date>
      </team>
    </teams>
  </player>
</players>

My problem started when I realized that the same tag names are used in several areas of the file. For example, id and name exist for both a player and a team. I want to create instances of my java classes Player and Team. While parsing, I kept boolean flags telling me whether I'm in the teams section so that in the endElement I will know that the name is a team's name, not a player's name and so on.

Here's how my code looks like:

public class MyParser extends DefaultHandler {

    private String currentValue;
    private boolean inTeamsSection = false;
    private Player player;
    private Team team;
    private List<Team> teams;

    public void characters(char[] ch, int start, int length) throws SAXException {
        currentValue = new String(ch, start, length);
    }

    public void startElement(String uri, String localName, String name, Attributes attributes) throws SAXException {
        if(name.equals("player")){
            player = new Player();
        }
        if (name.equals("teams")) {
            inTeamsSection = true;
            teams = new ArrayList<Team>();
        }
        if (name.equals("team")){
            team = new Team();
        }
    }   

    public void endElement(String uri, String localName, String name) throws SAXException {
        if (name.equals("id")) {
            if(inTeamsSection){
                team.setId(currentValue);
            }
            else{
                player.setId(currentValue);
            }
        }
        if (name.equals("name")){
            if(inTeamsSection){
                team.setName(currentValue);
            }
            else{
                player.setName(currentValue);
            }
        }
        if (name.equals("team")){
            teams.add(team);
        }
        if (name.equals("teams")){
            player.setTeams(teams);
            inTeamsSection = false;
        }
    }
}

Since in my real scenario I have more nodes to a player in addition to the teams and those nodes also have tags like name and id, I found myself messed up with several booleans similar to the inTeamsSection and my endElement method becomes long and complex with many conditions.

What should I do differently? How can I know what a name tag, for instance, belongs to?

Thanks!

like image 231
Haji Avatar asked Dec 22 '11 09:12

Haji


3 Answers

There is one neat trick when writing a SAX parser: It is allowed to change the ContentHandler of a XMLReader while parsing. This allows to separate the parsing logic for different elements into multiple classes, which makes the parsing more modular and reusable. When one handler sees its end element it switches back to its parent. How many handlers you implement would be left to you. The code would look like this:

public class RootHandler extends DefaultHandler {
    private XMLReader reader;
    private List<Team> teams;

    public RootHandler(XMLReader reader) {
        this.reader = reader;
        this.teams = new LinkedList<Team>();
    }

    public void startElement(String uri, String localName, String name, Attributes attributes) throws SAXException {
        if (name.equals("team")) {
            // Switch handler to parse the team element
            reader.setContentHandler(new TeamHandler(reader, this));
        }
    }
}

public class TeamHandler extends DefaultHandler {
    private XMLReader reader;
    private RootHandler parent;
    private Team team;
    private StringBuilder content;

    public TeamHandler(XMLReader reader, RootHandler parent) {
        this.reader = reader;
        this.parent = parent;
        this.content = new StringBuilder();
        this.team = new Team();
    }

    // characters can be called multiple times per element so aggregate the content in a StringBuilder
    public void characters(char[] ch, int start, int length) throws SAXException {
        content.append(ch, start, length);
    }

    public void startElement(String uri, String localName, String name, Attributes attributes) throws SAXException {
        content.setLength(0);
    }

    public void endElement(String uri, String localName, String name) throws SAXException {
        if (name.equals("name")) {
            team.setName(content.toString());
        } else if (name.equals("team")) {
            parent.addTeam(team);
            // Switch handler back to our parent
            reader.setContentHandler(parent);
        }
    }
}
like image 146
Jörn Horstmann Avatar answered Nov 01 '22 20:11

Jörn Horstmann


It's difficult to advise without knowing more about your requirements, but the fact that you are surprised that "my code got quite complex" suggests that you were not well informed when you chose SAX. SAX is a low-level programming interface capable of very high performance, but that's because the parser is doing far less work for you, and you therefore need to do a lot more work yourself.

like image 3
Michael Kay Avatar answered Nov 01 '22 22:11

Michael Kay


I do something very similar, but instead of having boolean flags to tell me what state I'm in, I test for player or team being non-null. Makes things a bit neater. This requires you to set them to null when you detect the end of each element, after you've added it to the relevant list.

like image 1
Graham Borland Avatar answered Nov 01 '22 21:11

Graham Borland