I have a specific format XML document that I will get pushed. This document will always be the same type so it's very strict.
I need to parse this so that I can convert it into JSON (well, a slightly bastardized version so someone else can use it with DOJO).
My question is, shall I use a very fast lightweight (no need for SAX, etc.) XML parser (any ideas?) or write my own, basically converting into a StringBuffer and spinning through the array? Basically, under the covers I assume all HTML parsers will spin thru the string (or memory buffer) and parse, producing output on the way through.
Thanks
edit
The xml will be between 3/4 lines to about 50 max (at the extreme)..
DOM Parser is faster than SAX Parser. Best for the larger sizes of files. Best for the smaller size of files. It is suitable for making XML files in Java.
Well parsing XML is not an easy task. Its basic structure is a tree with any node in tree capable of holding a container which consists of an array of more trees.
In PHP there are two major types of XML parsers: Tree-Based Parsers. Event-Based Parsers.
No, you should not try to write your own XML parser for this.
SAX itself is very lightweight and fast, so I'm not sure why think it's too much. Also using a string buffer would actually be much less scalable then using SAX because SAX doesn't require you to load the whole XML file into memory to use it. I've used SAX to parse through multigigabyte XML files, which you wouldn't be able to do using string buffers on a 32 bit machine.
If you have small files and you don't need to worry about performance, look into using the DOM. Java's implementation can be kind of annoying to use (You create a document by using a DocumentBuilder, which comes from a DocumentBuilderFactory)
The code to create a document from a file looks like this:
Document d = DocumentBuilderFactory.newInstance().newDocumentBuilder().parse(new FileInputStream("file.xml"));
(note that keeping a reference to your document builder will speed things up if you need to parse multiple files)
Then you use the function in org.w3c.dom.Document to read or manipulate the contents. For example getElementsByTagName() returns all the Elements with a certain tag name.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With