I have a file that consists of concatenated valid XML documents. I'd like to separate individual XML documents efficiently.
Contents of the concatenated file will look like this, thus the concatenated file is not itself a valid XML document.
<?xml version="1.0" encoding="UTF-8"?>
<someData>...</someData>
<?xml version="1.0" encoding="UTF-8"?>
<someData>...</someData>
<?xml version="1.0" encoding="UTF-8"?>
<someData>...</someData>
Each individual XML document around 1-4 KB, but there is potentially a few hundred of them. All XML documents correspond to same XML Schema.
Any suggestions or tools? I am working in the Java environment.
Edit: I am not sure if the xml-declaration will be present in documents or not.
Edit: Let's assume that the encoding for all the xml docs is UTF-8.
Don't split! Add one big tag around it! Then it becomes one XML file again:
<BIGTAG>
<?xml version="1.0" encoding="UTF-8"?>
<someData>...</someData>
<?xml version="1.0" encoding="UTF-8"?>
<someData>...</someData>
<?xml version="1.0" encoding="UTF-8"?>
<someData>...</someData>
</BIGTAG>
Now, using /BIGTAG/SomeData would give you all the XML roots.
By merging them into one file, you've altered the encoding...
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With