Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Parsing a CSV file for a unique row using the new Java 8 Streams API

I am trying to use the new Java 8 Streams API (for which I am a complete newbie) to parse for a particular row (the one with 'Neda' in the name column) in a CSV file. Using the following article for motivation, I modified and fixed some errors so that I could parse the file containing 3 columns - 'name', 'age' and 'height'.

name,age,height
Marianne,12,61
Julie,13,73
Neda,14,66
Julia,15,62
Maryam,18,70

The parsing code is as follows:

@Override
public void init() throws Exception {
    Map<String, String> params = getParameters().getNamed();
    if (params.containsKey("csvfile")) {
        Path path = Paths.get(params.get("csvfile"));
        if (Files.exists(path)){
            // use the new java 8 streams api to read the CSV column headings
            Stream<String> lines = Files.lines(path);
            List<String> columns = lines
                .findFirst()
                .map((line) -> Arrays.asList(line.split(",")))
                .get();
            columns.forEach((l)->System.out.println(l));
            // find the relevant sections from the CSV file
            // we are only interested in the row with Neda's name
            int nameIndex = columns.indexOf("name");
            int ageIndex columns.indexOf("age");
            int heightIndex = columns.indexOf("height");
            // we need to know the index positions of the 
            // have to re-read the csv file to extract the values
            lines = Files.lines(path);
            List<List<String>> values = lines
                .skip(1)
                .map((line) -> Arrays.asList(line.split(",")))
                .collect(Collectors.toList());
            values.forEach((l)->System.out.println(l));
        }
    }        
}

Is there any way to avoid re-reading the file following the extraction of the header line? Although this is a very small example file, I will be applying this logic to a large CSV file.

Is there technique to use the streams API to create a map between the extracted column names (in the first scan of the file) to the values in the remaining rows?

How can I return just one row in the form of List<String> (instead of List<List<String>> containing all the rows). I would prefer to just find the row as a mapping between the column names and their corresponding values. (a bit like a result set in JDBC). I see a Collectors.mapMerger function that might be helpful here, but I have no idea how to use it.

like image 236
johnco3 Avatar asked Jan 06 '16 18:01

johnco3


People also ask

How do I read a csv file in Java by line?

We can read a CSV file line by line using the readLine() method of BufferedReader class. Split each line on comma character to get the words of the line into an array. Now we can easily print the contents of the array by iterating over it or by using an appropriate index.

Can we read CSV file using Apache POI?

Apache POI was never designed to call on CSV files. While a CSV File may be opened in Excel, Excel has its own reader that does an auto import. This is assuming that your CSV has the . csv instead of the .


2 Answers

Using a CSV-processing library

Other Answers are good. But I recommend using a CSV-processing library to read your input files. As others noted, the CSV format is not as simple as it may seem. To begin with, the values may or may not be nested in quote-marks. And there are many variations of CSV, such as those used in Postgres, MySQL, Mongo, Microsoft Excel, and so on.

The Java ecosystem offers several such libraries. I use Apache Commons CSV.

The Apache Commons CSV library does make not use of streams. But you have no need for streams for your work if using a library to do the scut work. The library makes easy work of looping the rows from the file, without loading large file into memory.

create a map between the extracted column names (in the first scan of the file) to the values in the remaining rows?

Apache Commons CSV does this automatically when you call withHeader.

return just one row in the form of List

Yes, easy to do.

As you requested, we can fill List with each of the 3 field values for one particular row. This List acts as a tuple.

List < String > tuple = List.of();  // Our goal is to fill this list of values from a single row. Initialize to an empty nonmodifiable list.

We specify the format we expect of our input file: standard CSV (RFC 4180), with the first row populated by column names.

CSVFormat format =  CSVFormat.RFC4180.withHeader() ;

We specify the file path where to find our input file.

Path path = Path.of("/Users/basilbourque/people.csv");

We use try-with-resources syntax (see Tutorial) to automatically close our parser.

As we read in each row, we check for the name being Neda. If found, we report file our tuple List with that row's field values. And we interrupt the looping. We use List.of to conveniently return a List object of some unknown concrete class that is unmodifiable, meaning you cannot add nor remove elements from the list.

try (
        CSVParser parser =CSVParser.parse( path , StandardCharsets.UTF_8, format ) ;
)
{
    for ( CSVRecord record : parser )
    {
        if ( record.get( "name" ).equals( "Neda" ) )
        {
            tuple = List.of( record.get( "name" ) , record.get( "age" ) , record.get( "height" ) );
            break ;
        }
    }
}
catch ( FileNotFoundException e )
{
    e.printStackTrace();
}
catch ( IOException e )
{
    e.printStackTrace();
}

If we found success, we should see some items in our List.

if ( tuple.isEmpty() )
{
    System.out.println( "Bummer. Failed to report a row for `Neda` name." );
} else
{
    System.out.println( "Success. Found this row for name of `Neda`:" );
    System.out.println( tuple.toString() );
}

When run.

Success. Found this row for name of Neda:

[Neda, 14, 66]

Instead of using a List as a tuple, I suggest your define a Person class to represent this data with proper data types. Our code here would return a Person instance rather than a List<String>.

like image 177
Basil Bourque Avatar answered Oct 20 '22 11:10

Basil Bourque


Use a BufferedReader explicitly:

List<String> columns;
List<List<String>> values;
try(BufferedReader br=Files.newBufferedReader(path)) {
    String firstLine=br.readLine();
    if(firstLine==null) throw new IOException("empty file");
    columns=Arrays.asList(firstLine.split(","));
    values = br.lines()
        .map(line -> Arrays.asList(line.split(",")))
        .collect(Collectors.toList());
}

Files.lines(…) also resorts to BufferedReader.lines(…). The only difference is that Files.lines will configure the stream so that closing the stream will close the reader, which we don’t need here, as the explicit try(…) statement already ensures the closing of the BufferedReader.

Note that there is no guarantee about the state of the reader after the stream returned by lines() has been processed, but we can safely read lines before performing the stream operation.

like image 27
Holger Avatar answered Oct 20 '22 13:10

Holger