There's an XML stream which I need to parse. Since I only need to do it once and build my java objects, SAX looks like the natural choice. I'm extending DefaultHandler and implementing the startElement, endElement and characters methods, having members in my class where I save the current read value (taken in the characters method).
I have no problem doing what I need, but my code got quite complex and I'm sure there's no reason for that and that I can do things differently. The structure of my XML is something like this:
<players>
<player>
<id></id>
<name></name>
<teams total="2">
<team>
<id></id>
<name></name>
<start-date>
<year>2009</year>
<month>9</month>
</start-date>
<is-current>true</is-current>
</team>
<team>
<id></id>
<name></name>
<start-date>
<year>2007</year>
<month>11</month>
</start-date>
<end-date>
<year>2009</year>
<month>7</month>
</end-date>
</team>
</teams>
</player>
</players>
My problem started when I realized that the same tag names are used in several areas of the file. For example, id and name exist for both a player and a team. I want to create instances of my java classes Player and Team. While parsing, I kept boolean flags telling me whether I'm in the teams section so that in the endElement I will know that the name is a team's name, not a player's name and so on.
Here's how my code looks like:
public class MyParser extends DefaultHandler {
private String currentValue;
private boolean inTeamsSection = false;
private Player player;
private Team team;
private List<Team> teams;
public void characters(char[] ch, int start, int length) throws SAXException {
currentValue = new String(ch, start, length);
}
public void startElement(String uri, String localName, String name, Attributes attributes) throws SAXException {
if(name.equals("player")){
player = new Player();
}
if (name.equals("teams")) {
inTeamsSection = true;
teams = new ArrayList<Team>();
}
if (name.equals("team")){
team = new Team();
}
}
public void endElement(String uri, String localName, String name) throws SAXException {
if (name.equals("id")) {
if(inTeamsSection){
team.setId(currentValue);
}
else{
player.setId(currentValue);
}
}
if (name.equals("name")){
if(inTeamsSection){
team.setName(currentValue);
}
else{
player.setName(currentValue);
}
}
if (name.equals("team")){
teams.add(team);
}
if (name.equals("teams")){
player.setTeams(teams);
inTeamsSection = false;
}
}
}
Since in my real scenario I have more nodes to a player in addition to the teams and those nodes also have tags like name and id, I found myself messed up with several booleans similar to the inTeamsSection and my endElement method becomes long and complex with many conditions.
What should I do differently? How can I know what a name tag, for instance, belongs to?
Thanks!
There is one neat trick when writing a SAX parser: It is allowed to change the
ContentHandler
of a XMLReader while parsing. This allows to separate the
parsing logic for different elements into multiple classes, which makes the
parsing more modular and reusable. When one handler sees its end element it
switches back to its parent. How many handlers you implement would be left to
you. The code would look like this:
public class RootHandler extends DefaultHandler {
private XMLReader reader;
private List<Team> teams;
public RootHandler(XMLReader reader) {
this.reader = reader;
this.teams = new LinkedList<Team>();
}
public void startElement(String uri, String localName, String name, Attributes attributes) throws SAXException {
if (name.equals("team")) {
// Switch handler to parse the team element
reader.setContentHandler(new TeamHandler(reader, this));
}
}
}
public class TeamHandler extends DefaultHandler {
private XMLReader reader;
private RootHandler parent;
private Team team;
private StringBuilder content;
public TeamHandler(XMLReader reader, RootHandler parent) {
this.reader = reader;
this.parent = parent;
this.content = new StringBuilder();
this.team = new Team();
}
// characters can be called multiple times per element so aggregate the content in a StringBuilder
public void characters(char[] ch, int start, int length) throws SAXException {
content.append(ch, start, length);
}
public void startElement(String uri, String localName, String name, Attributes attributes) throws SAXException {
content.setLength(0);
}
public void endElement(String uri, String localName, String name) throws SAXException {
if (name.equals("name")) {
team.setName(content.toString());
} else if (name.equals("team")) {
parent.addTeam(team);
// Switch handler back to our parent
reader.setContentHandler(parent);
}
}
}
It's difficult to advise without knowing more about your requirements, but the fact that you are surprised that "my code got quite complex" suggests that you were not well informed when you chose SAX. SAX is a low-level programming interface capable of very high performance, but that's because the parser is doing far less work for you, and you therefore need to do a lot more work yourself.
I do something very similar, but instead of having boolean
flags to tell me what state I'm in, I test for player
or team
being non-null
. Makes things a bit neater. This requires you to set them to null
when you detect the end of each element, after you've added it to the relevant list.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With