Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

xml.etree.ElementTree equivalent in Java

Tags:

java

python

xml

I've been doing quite a bit of simple XML-processing in python and grown to like the ElementTree way of doing things.

Is there something similar and as easy to use in Java? I find the DOM model a bit cumbersome and find myself writing much more code than I would like to do simple things.

Or am I asking the wrong thing?

Maybe my question is: Is there a better option than the "XMLUtils" classes I see people implementing in some places to simplify their code when dealing with DOM?


Adding a litte bit here about why I like ElementTree since the question was asked.

  • Simplicity (I guess anything seems simple after working with DOM though)
  • Feels like a natural fit in python
  • Requires very little code on my part.

I'm trying to come up with a simple code example to illustrate, but it's sort of hard to give a good example. Here's an attempt though. This just adds a tag with a value and an attribute to an existing xml string.

from xml.etree.ElementTree import *
xml_string = '<top><sub a="x"></sub></top>'
parsed = fromstring(xmlstring)
se = SubElement(parsed, "tag")
se.text = "value"
se.attrib["a"] = "x"
new_xml_string = tostring(parsed)

After that, the new_xml_string is

<top><sub a="x" /><tag a="x">value</tag></top>

Not an example that really covers everything, but still. There's also the fairly simple looping over tags when you want to do stuff, easy testing for presence of tags and attributes and other things.

like image 203
Mattias Nilsson Avatar asked Nov 02 '09 16:11

Mattias Nilsson


2 Answers

To be honest, all XML APIs in Java suck, you just can vary the level of suckage you push yourself into which may turn horrible/slow to manageable/decent to even suprisingly OK at times.

This all mostly stems from the fact that Java APIs try to be as W3C DOM compliant as possible, in fact Xerces (Java's current native XML solution) prides itself on being compliant to a whole bunch of XML related W3C specifications as you can see from their front page.

The actual Xerces API is very unpleasant to work with, though, and because of that multiple other Java XML libraries have popped out over the years. Currently most popular ones are

  • JDOM, simplifies DOM operations a lot and do I dare to say even pleasant at times, works like a charm when mixed with Jaxen - well, unless you hit this problem with namespaces.
  • XOM which has a wonderful presentation about what's wrong with Java's XML right now and how they propose their way of doing things as a solution. In part it is actually better than JDOM, but it's not widespread enough yet so can't really say how it behaves in the real world out there. Definitely worth a check though.
  • dom4j, well-rounded library, supports all kinds of important features and plays out as a down-to-earth solution for XML. dom4j is basically the "old, proven and reliable" option of the popular ones.

Last but definitely not least I just have to mention StAX just because it's different, it's actually event-driven streaming API for XML. Definitely worth a look just out of curiosity.

PS. I'm currently actually writing my own XML parser/navigator as an exercise but haven't decided on what kind of API it will have. I'm really aiming for ease of use which seems to be quite rare in Java XML APIs so far, but I'm not entirely sure what kind of API I am going to provide. Python's ElementTree seems interesting, but since I'm not entirely familiar with it, would you like to maybe give a short summary on what exactly in it you find enjoyable?

like image 172
Esko Avatar answered Nov 15 '22 13:11

Esko


You might look into the following alternatives:

dom4j

xom

jdom

Since I never used ElementTree I don't know wich one is the closest. If you can use Groovy inside your project, it offers a set of classes that helps a lot when processing XML.

like image 27
jassuncao Avatar answered Nov 15 '22 13:11

jassuncao