Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Convert an XML file to CSV file using java

Tags:

java

xml

csv

I need help understanding the steps involved in converting an XML file into a CSV file using java. Here is an example of an XML file

<?xml version="1.0"?>
<Sites>
<Site id="101" name="NY-01" location="New York">
    <Hosts>
        <Host id="1001">
           <Host_Name>srv001001</Host_Name>
           <IP_address>10.1.2.3</IP_address>
           <OS>Windows</OS>
           <Load_avg_1min>1.3</Load_avg_1min>
           <Load_avg_5min>2.5</Load_avg_5min>
           <Load_avg_15min>1.2</Load_avg_15min>
        </Host>
        <Host id="1002">
           <Host_Name>srv001002</Host_Name>
           <IP_address>10.1.2.4</IP_address>
           <OS>Linux</OS>
           <Load_avg_1min>1.4</Load_avg_1min>
           <Load_avg_5min>2.5</Load_avg_5min>
           <Load_avg_15min>1.2</Load_avg_15min>
        </Host>
        <Host id="1003">
           <Host_Name>srv001003</Host_Name>
           <IP_address>10.1.2.5</IP_address>
           <OS>Linux</OS>
           <Load_avg_1min>3.3</Load_avg_1min>
           <Load_avg_5min>1.6</Load_avg_5min>
           <Load_avg_15min>1.8</Load_avg_15min>
        </Host>
        <Host id="1004">
           <Host_Name>srv001004</Host_Name>
           <IP_address>10.1.2.6</IP_address>
           <OS>Linux</OS>
           <Load_avg_1min>2.3</Load_avg_1min>
           <Load_avg_5min>4.5</Load_avg_5min>
           <Load_avg_15min>4.2</Load_avg_15min>
        </Host>     
    </Hosts>
</Site>
</Sites>

and here is the resulting CSV file.

site_id, site_name, site_location, host_id, host_name, ip_address, operative_system, load_avg_1min, load_avg_5min, load_avg_15min
101, NY-01, New York, 1001, srv001001, 10.1.2.3, Windows, 1.3, 2.5, 1.2
101, NY-01, New York, 1002, srv001002, 10.1.2.4, Linux, 1.4, 2.5, 1.2
101, NY-01, New York, 1003, srv001003, 10.1.2.5, Linux, 3.3, 1.6, 1.8
101, NY-01, New York, 1004, srv001004, 10.1.2.6, Linux, 2.3, 4.5, 4.2

I was thinking of using a DOM parser to read the xml file. The problem I have with that is I would need to specify specific elements in to code by name, but I want it to be able to parse it without doing that.

Are there any tools or libraries in java that would be able to help me achieve this.

If I have a XML file of this format below and want to add the value of the InitgPty in the same row with MSgId (Pls note :InitgPty is in the next tag level, so it prints the value in the next row)

<?xml version="1.0"?>
<CstmrCdtTrfInitn>
<GrpHdr>
<MsgId>XYZ07/ABC</MsgId>
<NbOfTxs>100000</NbOfTxs>
<InitgPty>
<Nm>XYZ</Nm>
</InitgPty>
like image 567
Emre801 Avatar asked Jan 28 '14 18:01

Emre801


4 Answers

here's a working example, data.xml has your data:

import java.io.File;

import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.transform.Result;
import javax.xml.transform.Source;
import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.dom.DOMSource;
import javax.xml.transform.stream.StreamResult;
import javax.xml.transform.stream.StreamSource;

import org.w3c.dom.Document;

class Xml2Csv {

    public static void main(String args[]) throws Exception {
        File stylesheet = new File("src/main/resources/style.xsl");
        File xmlSource = new File("src/main/resources/data.xml");

        DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
        DocumentBuilder builder = factory.newDocumentBuilder();
        Document document = builder.parse(xmlSource);

        StreamSource stylesource = new StreamSource(stylesheet);
        Transformer transformer = TransformerFactory.newInstance()
                .newTransformer(stylesource);
        Source source = new DOMSource(document);
        Result outputTarget = new StreamResult(new File("/tmp/x.csv"));
        transformer.transform(source, outputTarget);
    }
}

style.xsl

<?xml version="1.0"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:fo="http://www.w3.org/1999/XSL/Format" >
<xsl:output method="text" omit-xml-declaration="yes" indent="no"/>
<xsl:template match="/">
Host_Name,IP_address,OS,Load_avg_1min,Load_avg_5min,Load_avg_15min
<xsl:for-each select="//Host">
<xsl:value-of select="concat(Host_Name,',',IP_address,',',OS,Load_avg_1min,',',Load_avg_5min,',',Load_avg_15min,'&#xA;')"/>
</xsl:for-each>
</xsl:template>
</xsl:stylesheet>

output:

Host_Name,IP_address,OS,Load_avg_1min,Load_avg_5min,Load_avg_15min
srv001001,10.1.2.3,Windows1.3,2.5,1.2
srv001002,10.1.2.4,Linux1.4,2.5,1.2
srv001003,10.1.2.5,Linux3.3,1.6,1.8
srv001004,10.1.2.6,Linux2.3,4.5,4.2
like image 74
Guy Gavriely Avatar answered Oct 21 '22 08:10

Guy Gavriely


Three steps:

  1. Parse the XML file into a java XML library object.
  2. Retrieve relevant data from the object for each row.
  3. Write the results to a text file using native java functions, saving with *.csv extension.
like image 26
Jono Avatar answered Oct 21 '22 09:10

Jono


Your best best is to use XSLT to "transform" the XML to CSV. There are some Q/As on so (like here) that cover how to do this. The key is to provide a schema for your source data so the XSLT transform process knows how to read it so it can properly format the results.

Then you can use Xalan to input the XML, read the XSLT and output your results.

like image 2
Pedantic Avatar answered Oct 21 '22 07:10

Pedantic


The answer has already been provided by Pedantic (using the DOM-like approach {Document Object Model}) and Jono (with the SAX-like approach this time) in January.

My opinion is that both methods work well for small files but the latter works better with big XML files. You didn't mention the actual size of your XML files but you should take this into account.

Whatever method is used a specific program (which would detect special tags tailored to your local XML) will be easier to write but won't work without code adaptations for another XML flavor, while a more generic program will be harder to devise but will work for all XML files. You said you wanted to be able to parse a file without specifying specific element names so I guess the generic approach is what you prefer, and I agree with that, but please note that it's easier said than done. Indeed, I had the same problem on january too, implying this time a big XML file (>>100Mo) and I was surprised that nothing was available over the Internet so far. Turning frustration into something better is always a good thing so I decided to deal with that specific problem in the most generic way by myself, with a special concern for the big-XML-file-issue.

You might be interested to know that the generic Java library I wrote, which is now published as free software, converted your XML file into CSV the way you expected (in -x -u mode {please refer to the documentation for further information}).

So the answer to the last part of your question is: yes, there is at least one library which will help you achieve your goal, mine, which is named "XML2CSV-Generic-Converter". There might be other ones of course, and better ones certainly, but I couldn't pick any decent (free) one by myself.

I won't provide any link here to comply with Peter Foti 's judicious remark - but if you key "XML2CSV-Generic-Converter" in your favorite search engine you should find it easily.

like image 1
Lochrann Avatar answered Oct 21 '22 07:10

Lochrann