Parsing xml file contents without knowing xml file structure

Tags:

I've been working on learning some new tech using java to parse files and for the msot part it's going well. However, I'm at a lost as to how I could parse an xml file to where the structure is not known upon receipt. Lots of examples of how to do so if you know the structure (getElementByTagName seems to be the way to go), but no dynamic options, at least not that I've found.

So the tl;dr version of this question, how can I parse an xml file where I cannot rely on knowing it's structure?

697

asked Feb 23 '14 01:02

canadiancreed

1 Answers

Well the parsing part is easy; like helderdarocha stated in the comments, the parser only requires valid XML, it does not care about the structure. You can use Java's standard DocumentBuilder to obtain a Document:

InputStream in = new FileInputStream(...);
Document doc = DocumentBuilderFactory.newInstance().newDocumentBuilder().parse(in);

(If you're parsing multiple documents, you can keep reusing the same DocumentBuilder.)

Then you can start with the root document element and use familiar DOM methods from there on out:

Element root = doc.getDocumentElement(); // perform DOM operations starting here.

As for processing it, well it really depends on what you want to do with it, but you can use the methods of Node like getFirstChild() and getNextSibling() to iterate through children and process as you see fit based on structure, tags, and attributes.

Consider the following example:

import java.io.ByteArrayInputStream;
import java.io.InputStream;
import javax.xml.parsers.DocumentBuilderFactory;   
import org.w3c.dom.Document;
import org.w3c.dom.Element;
import org.w3c.dom.Node;


public class XML {

    public static void main (String[] args) throws Exception {

        String xml = "<objects><circle color='red'/><circle color='green'/><rectangle>hello</rectangle><glumble/></objects>";

        // parse
        InputStream in = new ByteArrayInputStream(xml.getBytes("utf-8"));
        Document doc = DocumentBuilderFactory.newInstance().newDocumentBuilder().parse(in);

        // process
        Node objects = doc.getDocumentElement();
        for (Node object = objects.getFirstChild(); object != null; object = object.getNextSibling()) {
            if (object instanceof Element) {
                Element e = (Element)object;
                if (e.getTagName().equalsIgnoreCase("circle")) {
                    String color = e.getAttribute("color");
                    System.out.println("It's a " + color + " circle!");
                } else if (e.getTagName().equalsIgnoreCase("rectangle")) {
                    String text = e.getTextContent();
                    System.out.println("It's a rectangle that says \"" + text + "\".");
                } else {
                    System.out.println("I don't know what a " + e.getTagName() + " is for.");
                }
            }
        }

    }

}

The input XML document (hard-coded for example) is:

<objects>
    <circle color='red'/>
    <circle color='green'/>
    <rectangle>hello</rectangle>
    <glumble/>
</objects>

The output is:

It's a red circle!
It's a green circle!
It's a rectangle that says "hello".
I don't know what a glumble is for.

166

answered Nov 10 '22 00:11

Jason C

Related questions
                            
                                Reinitialize fix delay in ScheduledExecutorService
                            
                                Android google maps add to marker own tag
                            
                                Servlet @WebServlet urlPatterns
                            
                                Spring Security, REST basic authentication issue
                            
                                use of System.identityHashCode(obj) - when? why?
                            
                                One-To-Many relationship in ORMLite Android
                            
                                Get the changed HTML content after it's updated by Javascript? (htmlunit)
                            
                                Executor does not handling tasks as expected
                            
                                Java setting private fields inside constructors
                            
                                Java: Compiler or Eclipse warning when attempting to use wrong type as Map key
                            
                                Building simple http-header for Junit test
                            
                                How do setCache() and CacheHint work together in JavaFX?
                            
                                jaxb.properties missing at runtime when built with Maven
                            
                                Diference between jdk/bin/java and jdk/jre/bin/java
                            
                                Cant connect to my SQL database
                            
                                XML validation against XSD 1.1 with Xerces in Java
                            
                                Converting MultipartFile to java.io.File without copying to local machine
                            
                                SQLite update query Android
                            
                                Bluetooth data transfer between two Android devices
                            
                                FreeMarker: Expected a boolean, but this evaluated to a number

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Parsing xml file contents without knowing xml file structure

Tags:

java

xml

canadiancreed

People also ask

1 Answers

Jason C

Recent Activity

Donate For Us