Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Fast CSV parsing

Tags:

java

parsing

csv

I have a java server app that download CSV file and parse it. The parsing can take from 5 to 45 minutes, and happens each hour.This method is a bottleneck of the app so it's not premature optimization. The code so far:

        client.executeMethod(method);
        InputStream in = method.getResponseBodyAsStream(); // this is http stream

        String line;
        String[] record;

        reader = new BufferedReader(new InputStreamReader(in), 65536);

        try {
            // read the header line
            line = reader.readLine();
            // some code
            while ((line = reader.readLine()) != null) {
                 // more code

                 line = line.replaceAll("\"\"", "\"NULL\"");

                 // Now remove all of the quotes
                 line = line.replaceAll("\"", "");     


                 if (!line.startsWith("ERROR"){
                   //bla bla 
                    continue;
                 }

                 record = line.split(",");
                 //more error handling
                 // build the object and put it in HashMap
         }
         //exceptions handling, closing connection and reader

Is there any existing library that would help me to speed up things? Can I improve existing code?

like image 823
Lukasz Madon Avatar asked Jul 28 '11 10:07

Lukasz Madon


2 Answers

Apache Commons CSV

Have you seen Apache Commons CSV?

Caveat On Using split

Bear in mind is that split only returns a view of the data, meaning that the original line object is not eligible for garbage collection whilst there is a reference to any of its views. Perhaps making a defensive copy will help? (Java bug report)

It also is not reliable in grouping escaped CSV columns containing commas

like image 191
Jeff Foster Avatar answered Sep 21 '22 19:09

Jeff Foster


opencsv

Take a look at opencsv.

This blog post, opencsv is an easy CSV parser, has example usage.

like image 27
flash Avatar answered Sep 22 '22 19:09

flash