Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I extract child element from XML to a string in Java?

Tags:

java

xml

If I have an XML document like

<root>   
   <element1>
        <child attr1="blah">
           <child2>blahblah</child2>
        <child>   
   </element1> 
</root>

I want to get an XML string with the first child element. My output string would be

<element1>
    <child attr1="blah">
       <child2>blahblah</child2>
    <child>
</element1>

There are many approaches, would like to see some ideas. I've been trying to use Java XML APIs for it, but it's not clear that there is a good way to do this.

thanks

like image 606
phil swenson Avatar asked Mar 10 '09 20:03

phil swenson


3 Answers

You're right, with the standard XML API, there's not a good way - here's one example (may be bug ridden; it runs, but I wrote it a long time ago).

import javax.xml.*;
import javax.xml.parsers.*;
import javax.xml.transform.*;
import javax.xml.transform.dom.*;
import javax.xml.transform.stream.*;
import org.w3c.dom.*;
import java.io.*;

public class Proc
{
    public static void main(String[] args) throws Exception
    {
        //Parse the input document
        DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
        DocumentBuilder builder = factory.newDocumentBuilder();
        Document doc = builder.parse(new File("in.xml"));

        //Set up the transformer to write the output string
        TransformerFactory tFactory = TransformerFactory.newInstance();
        Transformer transformer = tFactory.newTransformer();
        transformer.setOutputProperty("indent", "yes");
        StringWriter sw = new StringWriter();
        StreamResult result = new StreamResult(sw);

        //Find the first child node - this could be done with xpath as well
        NodeList nl = doc.getDocumentElement().getChildNodes();
        DOMSource source = null;
        for(int x = 0;x < nl.getLength();x++)
        {
            Node e = nl.item(x);
            if(e instanceof Element)
            {
                source = new DOMSource(e);
                break;
            }
        }

        //Do the transformation and output
        transformer.transform(source, result);
        System.out.println(sw.toString());
    }
}

It would seem like you could get the first child just by using doc.getDocumentElement().getFirstChild(), but the problem with that is if there is any whitespace between the root and the child element, that will create a Text node in the tree, and you'll get that node instead of the actual element node. The output from this program is:

D:\home\tmp\xml>java Proc
<?xml version="1.0" encoding="UTF-8"?>
<element1>
        <child attr1="blah">
           <child2>blahblah</child2>
       </child>
   </element1>

I think you can suppress the xml version string if you don't need it, but I'm not sure on that. I would probably try to use a third party XML library if at all possible.

like image 89
Matt McMinn Avatar answered Nov 07 '22 09:11

Matt McMinn


Since this is the top google answer and For those of you who just want the basic:

    public static String serializeXml(Element element) throws Exception
{
    ByteArrayOutputStream buffer = new ByteArrayOutputStream();
    StreamResult result = new StreamResult(buffer);

    DOMSource source = new DOMSource(element);
    TransformerFactory.newInstance().newTransformer().transform(source, result);

    return new String(buffer.toByteArray());
}

I use this for debug, which most likely is what you need this for

like image 38
Monica Avatar answered Nov 07 '22 07:11

Monica


I would recommend JDOM. It's a Java XML library that makes dealing with XML much easier than the standard W3C approach.

like image 41
duffymo Avatar answered Nov 07 '22 08:11

duffymo