Parsing a CSV file for a unique row using the new Java 8 Streams API

Tags:

I am trying to use the new Java 8 Streams API (for which I am a complete newbie) to parse for a particular row (the one with 'Neda' in the name column) in a CSV file. Using the following article for motivation, I modified and fixed some errors so that I could parse the file containing 3 columns - 'name', 'age' and 'height'.

name,age,height
Marianne,12,61
Julie,13,73
Neda,14,66
Julia,15,62
Maryam,18,70

The parsing code is as follows:

@Override
public void init() throws Exception {
    Map<String, String> params = getParameters().getNamed();
    if (params.containsKey("csvfile")) {
        Path path = Paths.get(params.get("csvfile"));
        if (Files.exists(path)){
            // use the new java 8 streams api to read the CSV column headings
            Stream<String> lines = Files.lines(path);
            List<String> columns = lines
                .findFirst()
                .map((line) -> Arrays.asList(line.split(",")))
                .get();
            columns.forEach((l)->System.out.println(l));
            // find the relevant sections from the CSV file
            // we are only interested in the row with Neda's name
            int nameIndex = columns.indexOf("name");
            int ageIndex columns.indexOf("age");
            int heightIndex = columns.indexOf("height");
            // we need to know the index positions of the 
            // have to re-read the csv file to extract the values
            lines = Files.lines(path);
            List<List<String>> values = lines
                .skip(1)
                .map((line) -> Arrays.asList(line.split(",")))
                .collect(Collectors.toList());
            values.forEach((l)->System.out.println(l));
        }
    }        
}

Is there any way to avoid re-reading the file following the extraction of the header line? Although this is a very small example file, I will be applying this logic to a large CSV file.

Is there technique to use the streams API to create a map between the extracted column names (in the first scan of the file) to the values in the remaining rows?

How can I return just one row in the form of List<String> (instead of List<List<String>> containing all the rows). I would prefer to just find the row as a mapping between the column names and their corresponding values. (a bit like a result set in JDBC). I see a Collectors.mapMerger function that might be helpful here, but I have no idea how to use it.

236

asked Jan 06 '16 18:01

johnco3

2 Answers

Using a CSV-processing library

Other Answers are good. But I recommend using a CSV-processing library to read your input files. As others noted, the CSV format is not as simple as it may seem. To begin with, the values may or may not be nested in quote-marks. And there are many variations of CSV, such as those used in Postgres, MySQL, Mongo, Microsoft Excel, and so on.

The Java ecosystem offers several such libraries. I use Apache Commons CSV.

The Apache Commons CSV library does make not use of streams. But you have no need for streams for your work if using a library to do the scut work. The library makes easy work of looping the rows from the file, without loading large file into memory.

create a map between the extracted column names (in the first scan of the file) to the values in the remaining rows?

Apache Commons CSV does this automatically when you call withHeader.

return just one row in the form of List

Yes, easy to do.

As you requested, we can fill List with each of the 3 field values for one particular row. This List acts as a tuple.

List < String > tuple = List.of();  // Our goal is to fill this list of values from a single row. Initialize to an empty nonmodifiable list.

We specify the format we expect of our input file: standard CSV (RFC 4180), with the first row populated by column names.

CSVFormat format =  CSVFormat.RFC4180.withHeader() ;

We specify the file path where to find our input file.

Path path = Path.of("/Users/basilbourque/people.csv");

We use try-with-resources syntax (see Tutorial) to automatically close our parser.

As we read in each row, we check for the name being Neda. If found, we report file our tuple List with that row's field values. And we interrupt the looping. We use List.of to conveniently return a List object of some unknown concrete class that is unmodifiable, meaning you cannot add nor remove elements from the list.

try (
        CSVParser parser =CSVParser.parse( path , StandardCharsets.UTF_8, format ) ;
)
{
    for ( CSVRecord record : parser )
    {
        if ( record.get( "name" ).equals( "Neda" ) )
        {
            tuple = List.of( record.get( "name" ) , record.get( "age" ) , record.get( "height" ) );
            break ;
        }
    }
}
catch ( FileNotFoundException e )
{
    e.printStackTrace();
}
catch ( IOException e )
{
    e.printStackTrace();
}

If we found success, we should see some items in our List.

if ( tuple.isEmpty() )
{
    System.out.println( "Bummer. Failed to report a row for `Neda` name." );
} else
{
    System.out.println( "Success. Found this row for name of `Neda`:" );
    System.out.println( tuple.toString() );
}

When run.

Success. Found this row for name of Neda:

[Neda, 14, 66]

Instead of using a List as a tuple, I suggest your define a Person class to represent this data with proper data types. Our code here would return a Person instance rather than a List<String>.

177

answered Oct 20 '22 11:10

Basil Bourque

Use a BufferedReader explicitly:

List<String> columns;
List<List<String>> values;
try(BufferedReader br=Files.newBufferedReader(path)) {
    String firstLine=br.readLine();
    if(firstLine==null) throw new IOException("empty file");
    columns=Arrays.asList(firstLine.split(","));
    values = br.lines()
        .map(line -> Arrays.asList(line.split(",")))
        .collect(Collectors.toList());
}

Files.lines(…) also resorts to BufferedReader.lines(…). The only difference is that Files.lines will configure the stream so that closing the stream will close the reader, which we don’t need here, as the explicit try(…) statement already ensures the closing of the BufferedReader.

Note that there is no guarantee about the state of the reader after the stream returned by lines() has been processed, but we can safely read lines before performing the stream operation.

answered Oct 20 '22 13:10

Holger

Related questions
                            
                                Remove trailing substring from String in Java
                            
                                JavaFX TableView how to get cell's data?
                            
                                Why is Java Lambda also called Closures [duplicate]
                            
                                Spring Boot app not serving static content
                            
                                How to validate 2 field with OR condition?
                            
                                How to embed JPanel into JavaFX pane?
                            
                                java.lang.IllegalStateException: Not on FX application thread; currentThread = Thread-4
                            
                                Get new instance of a spring bean
                            
                                Embedded id and "repeated column in mapping for entity..." exception
                            
                                Is the 'IT.java' filename Suffix (instead of 'Test.java') for JUnit Integration Tests a convention? [closed]
                            
                                How to create a two dimensional array from a stream in Java 8?
                            
                                how to get java source code from war file?
                            
                                Get line number of method caller?
                            
                                What is the difference between "Game", "Screen" and "ApplicationAdapter" in libgdx?
                            
                                Conditional if-else statement in java [duplicate]
                            
                                Can I get properties defined in "gradle.properties" in JAVA STATEMENT?
                            
                                Jenkins - groovy script - get last successful build date in dd-mm-yyyy format
                            
                                "Monitor" in java threads
                            
                                What happens if you call an overridden method using super in a constructor
                            
                                cascade type save update in Hibernate

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Parsing a CSV file for a unique row using the new Java 8 Streams API

Tags:

java

csv

java-8

java-stream