Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

stax - get xml node as string

xml looks like so:

<statements>
   <statement account="123">
      ...stuff...
   </statement>
   <statement account="456">
      ...stuff...
   </statement>
</statements>

I'm using stax to process one "<statement>" at a time and I got that working. I need to get that entire statement node as a string so I can create "123.xml" and "456.xml" or maybe even load it into a database table indexed by account.

using this approach: http://www.devx.com/Java/Article/30298/1954

I'm looking to do something like this:

String statementXml = staxXmlReader.getNodeByName("statement");

//load statementXml into database
like image 520
Jason Avatar asked Dec 04 '10 03:12

Jason


4 Answers

I had a similar task and although the original question is older than a year, I couldn't find a satisfying answer. The most interesting answer up to now was Blaise Doughan's answer, but I couldn't get it running on the XML I am expecting (maybe some parameters for the underlying parser could change that?). Here the XML, very simplyfied:

<many-many-tags>
    <description>
        ...
        <p>Lorem ipsum...</p>
        Devils inside...
        ...
    </description>
</many-many-tags>

My solution:

public static String readElementBody(XMLEventReader eventReader)
    throws XMLStreamException {
    StringWriter buf = new StringWriter(1024);

    int depth = 0;
    while (eventReader.hasNext()) {
        // peek event
        XMLEvent xmlEvent = eventReader.peek();

        if (xmlEvent.isStartElement()) {
            ++depth;
        }
        else if (xmlEvent.isEndElement()) {
            --depth;

            // reached END_ELEMENT tag?
            // break loop, leave event in stream
            if (depth < 0)
                break;
        }

        // consume event
        xmlEvent = eventReader.nextEvent();

        // print out event
        xmlEvent.writeAsEncodedUnicode(buf);
    }

    return buf.getBuffer().toString();
}

Usage example:

XMLEventReader eventReader = ...;
while (eventReader.hasNext()) {
    XMLEvent xmlEvent = eventReader.nextEvent();
    if (xmlEvent.isStartElement()) {
        StartElement elem = xmlEvent.asStartElement();
        String name = elem.getName().getLocalPart();

        if ("DESCRIPTION".equals(name)) {
            String xmlFragment = readElementBody(eventReader);
            // do something with it...
            System.out.println("'" + fragment + "'");
        }
    }
    else if (xmlEvent.isEndElement()) {
        // ...
    }
}

Note that the extracted XML fragment will contain the complete extracted body content, including white space and comments. Filtering those on demand, or making the buffer size parametrizable have been left out for code brevity:

'
    <description>
        ...
        <p>Lorem ipsum...</p>
        Devils inside...
        ...
    </description>
    '
like image 143
t0r0X Avatar answered Nov 15 '22 04:11

t0r0X


You can use StAX for this. You just need to advance the XMLStreamReader to the start element for statement. Check the account attribute to get the file name. Then use the javax.xml.transform APIs to transform the StAXSource to a StreamResult wrapping a File. This will advance the XMLStreamReader and then just repeat this process.

import java.io.File;
import java.io.FileReader;
import javax.xml.stream.XMLInputFactory;
import javax.xml.stream.XMLStreamConstants;
import javax.xml.stream.XMLStreamReader;
import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.stax.StAXSource;
import javax.xml.transform.stream.StreamResult;

public class Demo {

    public static void main(String[] args) throws Exception  {
        XMLInputFactory xif = XMLInputFactory.newInstance();
        XMLStreamReader xsr = xif.createXMLStreamReader(new FileReader("input.xml"));
        xsr.nextTag(); // Advance to statements element

        while(xsr.nextTag() == XMLStreamConstants.START_ELEMENT) {
            TransformerFactory tf = TransformerFactory.newInstance();
            Transformer t = tf.newTransformer();
            File file = new File("out" + xsr.getAttributeValue(null, "account") + ".xml");
            t.transform(new StAXSource(xsr), new StreamResult(file));
        }
    }

}
like image 30
bdoughan Avatar answered Nov 15 '22 05:11

bdoughan


Stax is a low-level access API, and it does not have either lookups or methods that access content recursively. But what you actually trying to do? And why are you considering Stax?

Beyond using a tree model (DOM, XOM, JDOM, Dom4j), which would work well with XPath, best choice when dealing with data is usually data binding library like JAXB. With it you can pass Stax or SAX reader and ask it to bind xml data into Java beans and instead of messing with xml process Java objects. This is often more convenient, and it is usually quite performance. Only trick with larger files is that you do not want to bind the whole thing at once, but rather bind each sub-tree (in your case, one 'statement' at a time). This is easiest done by iterating Stax XmlStreamReader, then using JAXB to bind.

like image 39
StaxMan Avatar answered Nov 15 '22 04:11

StaxMan


I've been googling and this seems painfully difficult.

given my xml I think it might just be simpler to:

StringBuilder buffer = new StringBuilder();
for each line in file {
   buffer.append(line)
   if(line.equals(STMT_END_TAG)){
      parse(buffer.toString())
      buffer.delete(0,buffer.length)
   }
 }

 private void parse(String statement){
    //saxParser.parse( new InputSource( new StringReader( xmlText ) );
    // do stuff
    // save string
 }
like image 38
Jason Avatar answered Nov 15 '22 06:11

Jason