Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Reading CSVs from a zip file a line at a time

Tags:

java

csv

zip

I've got a Spring MVC app with a file upload capability. Files are passed to the controller as MultipartFile from which it's easy to get an InputStream. I'm uploading zip files that contain CSVs and I'm struggling to find a way to open the CSVs and read them a line at a time. There are plenty of examples on the 'net of reading into a fixed sizes buffer. I've tried this, but the buffers don't concatenate very well and it soon gets out of sync and uses a lot of memory:

        ZipEntry entry = input.getNextEntry();

        while(entry != null)
        {
            if (entry.getName().matches("Data/CSV/[a-z]{0,1}[a-z]{0,1}.csv"))
            {
                final String fullPath = entry.getName();
                final String filename = fullPath.substring(fullPath.lastIndexOf('/') + 1);

                visitor.startFile(filename);                    

                final StringBuilder fileContent = new StringBuilder();

                final byte[] buffer = new byte[1024];                   

                while (input.read(buffer) > 0)
                    fileContent.append(new String(buffer));

                final String[] lines = fileContent.toString().split("\n");  

                for(String line : lines)
                {
                    final String[] columns = line.split(",");
                    final String postcode = columns[0].replace(" ", "").replace("\"", "");

                    if (columns.length > 3)
                        visitor.location(postcode, "", "");
                }   

                visitor.endFile();                  
            }

            entry = input.getNextEntry();
        }

There must be a better way that actually works.

like image 973
Paul Grenyer Avatar asked Nov 04 '13 19:11

Paul Grenyer


People also ask

Can I convert a ZIP file to CSV?

You can convert your ZIP documents from any platform (Windows, Linux, macOS). No registration needed. Just drag and drop your ZIP file on upload form, choose the desired output format and click convert button. Once conversion completed you can download your CSV file.

How do I read a ZIP file in pandas?

Method #1: Using compression=zip in pandas. read_csv() method. By assigning the compression argument in read_csv() method as zip, then pandas will first decompress the zip and then will create the dataframe from CSV file present in the zipped file.

How do I unzip a CSV file?

First, you need to add a file for conversion: drag & drop your ZIP file or click inside the white area for choose a file. Then click the "Convert" button. When ZIP to CSV conversion is completed, you can download your CSV file.


2 Answers

Not clear if this suits your need, but have you tried opencsv (http://opencsv.sourceforge.net)? Their example is really intuitive:

CSVReader reader = new CSVReader(new FileReader("yourfile.csv"));
String [] nextLine;
while ((nextLine = reader.readNext()) != null) {
    // nextLine[] is an array of values from the line
    System.out.println(nextLine[0] + nextLine[1] + "etc...");
}

For your case, all you will need is to wrap the zipped file stream into a buffered reader and pass the reader to create a CSVReader and use it:

FileInputStream fis = new FileInputStream(file);
GZIPInputStream gis = new GZIPInputStream(fis);
InputStreamReader isr = new InputStreamReader(gis);
BufferedReader br = new BufferedReader(isr);
CSVReader reader = new CSVReader(br);
like image 144
neurite Avatar answered Sep 21 '22 13:09

neurite


You could use a BufferedReader which includes the convenient readLine() method and wont load the entire contents of the file into memory e.g.

BufferedReader in = new BufferedReader(new InputStreamReader(input), 1024);
String line=null;
while((line=br.readLine())!=null) {
   String[] columns = line.split(",");
   //rest of your code
}
like image 23
samlewis Avatar answered Sep 19 '22 13:09

samlewis