Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

SAX parsing - efficient way to get text nodes

Tags:

java

xml

sax

Given this XML snippet

<?xml version="1.0"?>
<catalog>
   <book id="bk101">
      <author>Gambardella, Matthew</author>

In SAX, it is easy to get attribute values:

@Override
public void startElement (String uri, String localName,
              String qName, Attributes attributes) throws SAXException{
    if(qName.equals("book")){
        String bookId = attributes.getValue("id");
        ...
    }
}

But to get the value of a text node, e.g. the value of the <author> tag, it is quite hard...

private StringBuffer curCharValue = new StringBuffer(1024);

@Override
public void startElement (String uri, String localName,
              String qName, Attributes attributes) throws SAXException {
    if(qName.equals("author")){
        curCharValue.clear();
    }
}

@Override
public void characters (char ch[], int start, int length) throws SAXException
{
     //already synchronized
    curCharValue.append(char, start, length);
}

@Override
public void endElement (String uri, String localName, String qName)
throws SAXException
{
    if(qName.equals("author")){
        String author = curCharValue.toString();
    }
}
  1. I'm not sure the above sample is even working, what do you think of this approach?
  2. Is there a better way? (to get the text node's value)
like image 531
Eran Medan Avatar asked Jan 14 '10 14:01

Eran Medan


2 Answers

That's the usual way to do it with SAX.

Just beware that characters() may be called more than once per tag. See this question for more info. Here is a complete example.

Otherwise you could give a try to StAX.

like image 78
ewernli Avatar answered Oct 20 '22 06:10

ewernli


public void startElement(String strNamespaceURI, String strLocalName,
      String strQName, Attributes al) throws SAXException {
       if(strLocalName.equalsIgnoreCase("HIT"))
       {
            String output1 = al.getValue("NAME");
          //this will work but how can we parse if NAME="abc" only     ?
       }

   }
like image 43
Venkat Avatar answered Oct 20 '22 07:10

Venkat