I've been working on learning some new tech using java to parse files and for the msot part it's going well. However, I'm at a lost as to how I could parse an xml file to where the structure is not known upon receipt. Lots of examples of how to do so if you know the structure (getElementByTagName seems to be the way to go), but no dynamic options, at least not that I've found.
So the tl;dr version of this question, how can I parse an xml file where I cannot rely on knowing it's structure?
You can view XML files in different ways including using a text editor, like Notepad or TextEdit, a web browser like Safari, Chrome, or Firefox, or an XML viewer. Open your text editor or XML viewer, then open your XML to view it. Drag and drop the XML file to your web browser to view it.
XML files are encoded in plaintext, so you can open them in any text editor and be able to clearly read it. Right-click the XML file and select "Open With." This will display a list of programs to open the file in. Select "Notepad" (Windows) or "TextEdit" (Mac).
XML Decoder: as the name suggests, it is a tool to decode the text which is already encoded for XML's predefined entities. The XML escape codes present in the text will be converted to their corresponding XML predefined entities. See XML predefined entities here.
Well the parsing part is easy; like helderdarocha stated in the comments, the parser only requires valid XML, it does not care about the structure. You can use Java's standard DocumentBuilder
to obtain a Document
:
InputStream in = new FileInputStream(...);
Document doc = DocumentBuilderFactory.newInstance().newDocumentBuilder().parse(in);
(If you're parsing multiple documents, you can keep reusing the same DocumentBuilder
.)
Then you can start with the root document element and use familiar DOM methods from there on out:
Element root = doc.getDocumentElement(); // perform DOM operations starting here.
As for processing it, well it really depends on what you want to do with it, but you can use the methods of Node
like getFirstChild()
and getNextSibling()
to iterate through children and process as you see fit based on structure, tags, and attributes.
Consider the following example:
import java.io.ByteArrayInputStream;
import java.io.InputStream;
import javax.xml.parsers.DocumentBuilderFactory;
import org.w3c.dom.Document;
import org.w3c.dom.Element;
import org.w3c.dom.Node;
public class XML {
public static void main (String[] args) throws Exception {
String xml = "<objects><circle color='red'/><circle color='green'/><rectangle>hello</rectangle><glumble/></objects>";
// parse
InputStream in = new ByteArrayInputStream(xml.getBytes("utf-8"));
Document doc = DocumentBuilderFactory.newInstance().newDocumentBuilder().parse(in);
// process
Node objects = doc.getDocumentElement();
for (Node object = objects.getFirstChild(); object != null; object = object.getNextSibling()) {
if (object instanceof Element) {
Element e = (Element)object;
if (e.getTagName().equalsIgnoreCase("circle")) {
String color = e.getAttribute("color");
System.out.println("It's a " + color + " circle!");
} else if (e.getTagName().equalsIgnoreCase("rectangle")) {
String text = e.getTextContent();
System.out.println("It's a rectangle that says \"" + text + "\".");
} else {
System.out.println("I don't know what a " + e.getTagName() + " is for.");
}
}
}
}
}
The input XML document (hard-coded for example) is:
<objects>
<circle color='red'/>
<circle color='green'/>
<rectangle>hello</rectangle>
<glumble/>
</objects>
The output is:
It's a red circle! It's a green circle! It's a rectangle that says "hello". I don't know what a glumble is for.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With