I have a load of XML files, and an XSD for them.
I'd like to simply convert then into POJO's and insert them into a database. The DB schema is under my control, so it can be what ever I like.
I've looked around at a load of apis, but wanted another opinion what works best.
Does hibernate have some api to create POJO's from an XSD, then read the XML into those POJOs, and then insert the data into the database?
Or does spring have any features to help with this?
I guess I'm just after your views, just incase there is an API I've missed that will do help do what I want.
Thanks Jeff Porter
DOM Parser is the easiest java xml parser to learn. DOM parser loads the XML file into memory and we can traverse it node by node to parse the XML. DOM Parser is good for small files but when file size increases it performs slow and consumes more memory.
Or you could bypass the step of translating into POJOs and store the XML directly as a CLOB. It'll allow "duck typing" later on, which you might find advantageous.
Mapping to Java POJOs make sense if you need to query for those objects individually later on. If you need the entire stream, all the time, without ever having to query for values in the XML (e.g., XPath), then I'd say that storing XML as a CLOB makes more sense.
Quick answer: JAXB, JPA and Spring
When inserting XML into a database you need to consider what operations you'd like to perform on the data that the XML is representing.
You could, for example, consider the XML to be the input data and then create a schema that holds the data in an easily queryable manner. If that's what you'd like to do then use JAXB as the unmarshaller because you can easily generate suitably annotated pojos/entities from the XSD via the xjc tool. A bit of additional JPA annotating and you'll have a quick solution that maps the XML to a complete schema that allows a variety of mix and match queries and alternative views. Of course, JAXB annotations can be used to generate a wide variety of output formats (XML, JSON, YAML etc) so you're not limited to XML when you want to output this data.
Next, you could consider the XML to be the complete entity that you wish to store. In that case you want to store it either as a CLOB or as XML (in Oracle). Oracle certainly supports XPath based searches so you'd get a good opportunity for querying the resulting dataset.
Finally, if you're thinking that XML is too bloated, and you're in control of any resulting changes to the pojos you could serialize the unmarshalled pojos directly into the database as BLOBs. You'll have a fairly compact schema and database, but you'll suffer when it comes to querying since it's all gonna be binary. And you'll have binary version compatibility issues later on if you have to deserialize a very old dataset based on old pojos.
So, to summarise, JAXB is a very good way to handle the unmarshalling and later marshalling processes. It's quick and simple and (nod to @Blaise Doughan here) very well supported on SO for one thing. JPA is the technology of choice to perform your database operations. Hibernate is one implementer of JPA (with good extensions), and Spring supports it beautifully through the HibernateTemplate. Equally, you can use Spring's JpaTemplate which has, perhaps, a slightly shallower learning curve.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With