Handle duplicate header names in CSV file

Tags:

I use a simulation program which generates a CSV file in the background containing the simulation data (such as the length of the road, how long is the simulation etc)

One example of how the file looks like:

example

So, in this case, I know that there is only one car ( because of the carId). Now if another CSV gets generated and the simulation contains 2 cars the columns changes:

two different cars

So as you can see different columns where added ( a lot of them are irrelevant as I only need carId and mph).

If there are more and more cars the number of columns increases and I never know at which position is the carId with the mph.

What I want is to create a CarObject for each carId with mph appearance.

Is there a way in java to create like an own table for each (carId, mph) column. So I have for eg. a List<CarDataRows> and CarDataRows is like a List where each Item contains the carId with the specific mph.

EDIT

In this case the collection should contain two items. And the value from the Item should also smth like a list which contains the carId with the different mph.:

enter image description here

myList.get(0)
-> 1,3
   1,4
   1,7


myList.get(1)
-> 2,12
   2,23
   2,0

In phyton for example you can use panda and create foreach pattern(carId with mph) and own dataframe. This is what I am trying to do here

331

asked Aug 21 '18 11:08

plshm

1 Answers

Your CSV looks quite odd, I wouldn't expect people to append columns, but rather rows, especially when you have an ID column to tell the rows apart.

That said, I'd parse the data into Car objects and then work on those. This involves three steps.

On the first line of the file, find the header fields that indicate the interesting columns, here carId and mph. Apache's CSV library chokes on duplicated header names (and I cannot really blame it for that), so you'll need to find a more lenient library or parse the header yourself.

With the information from the header, you create Car objects, telling them their columns. Each Car is in charge of its own column numbers as well as all the speeds (the information you really want).

Then you read the rest of the file, and assign the speeds to their cars. Since the cars own their data (their column numbers and speeds), you just pass each record field to all cars and ask them to act on it if it's their's (Tell Don't Ask principle).

Your Car class could look like this:

import java.util.List;
import java.util.ArrayList;

public class Car {
    private int id;
    private final int idColumn;
    private final int speedColumn;
    private List<Integer> speeds;

    public Car(int idColumn, int speedColumn) {
        id = 0;
        this.idColumn = idColumn;
        this.speedColumn = speedColumn;
        speeds = new ArrayList<>();
    }

    public void add(int column, String value) {
        if (this.idColumn == column) {
            int i = Integer.parseInt(value);
            if (id != 0 && id != i) {
                throw new IllegalStateException("changed ID from "  + id + " to " + i);
            }
            id = i;
        }
        if (this.speedColumn == column) {
            speeds.add(Integer.parseInt(value));
        }
    }

    public List<Integer> getSpeeds() {
        return speeds;
    }
}

Then you work the CSV input like this, using the Apache Common CSV parser:

import java.io.*;
import java.util.*;
import org.apache.commons.csv.*;

public class YourCarCsvParser {
    /** Column name for a car's ID. */
    private static String CAR_ID = "carId";

    public static void main(String[] args) throws Exception {
        App app = new App();
        for (String a : args) {
            app.parse(new InputStreamReader(new FileInputStream(a)));
        }
    }

    public List<Car> parse(Reader input) throws IOException {
        // Initialize the list so you'll get an empty list not null if there is no CSV data.
        List<Car> cars = new ArrayList<>(); 
        CSVParser parser = CSVParser.parse(input, CSVFormat.RFC4180);
        boolean hadHeaders = false;
        for (CSVRecord record : parser) {
            if (!hadHeaders) {
                cars = makeCars(record);
                hadHeaders = true;
            }
            else {
                addSpeedsToCars(cars, record);
            }
        }
        return cars;
    }

    private List<Car> makeCars(CSVRecord record) {
        List<Car> cars = new ArrayList<>();
        for (int i = 0; i < record.size(); i++) {
            String field = record.get(i);
            if (CAR_ID.equals(field)) {
                cars.add(new Car(i, speedColumnFor(i)));
            }
        }
        return cars;
    }

    private int speedColumnFor(int idColumn) {
        // Assumes speed is always right of car ID.
        return idColumn + 1;
    }

    private void addSpeedsToCars(List<Car> cars, CSVRecord record) {
        for (int i = 0; i < record.size(); i++) {
            for (Car c : cars) {
                 c.add(i, record.get(i));
            }
        }
    }
}

This will turn your CSV into a list of Cars, each with a list of their speeds.

This code is based on a few assumptions, but you can probably fix them easily as needed:

Column names in your CSV are case-sensitive. If they are not, change the equals to equalsIgnoreCase when checking the CAR_ID.
The speed column is right next to the car ID column: if that's not the case, adjust the speedColumnFor method to return the appropriate column. This gets a bit more tricky when you don't have a fixed offset but need to look at the headers, e.g., to pick the following mph header after you saw an ID column. In that case you'll have to adjust the logic makeCars to remember the ID column position and only create the Car when you encounter an mph column, using the remembered ID column position.
Car IDs are Integers, like in your example data. You could also change them to be Strings.
Speeds are Integers. If you need floats, adjust the Integer.parseInt to Float.parseFloat when converting from Strings.

156

answered Oct 26 '22 16:10

Robert

Related questions
                            
                                Why is JSON document not fully consumed?
                            
                                Alternative to deprecated XmlJsonDataFormat in Apache Camel
                            
                                Spring Boot Unable to determine jdbc url from datasource
                            
                                When pitest cannot find the test classes
                            
                                VS Code: The declared package "controller" does not match the expected package ""
                            
                                jstack in mixed mode: WrongTypeException: No suitable match for type of address
                            
                                How to deploy and access a Java WebSocket endpoint on Heroku without any additional frameworks
                            
                                Is there a way to print a functional interface?
                            
                                Combining multiple java streams in a structured way
                            
                                Jackson JSON serialization: How to ignore a nested object when all its fields are null?
                            
                                possible ways to store JSON in dynamodb
                            
                                Spark SQL - Encoders for Tuple Containing a List or Array as an Element
                            
                                Spring CriteriaBuilder search enum by it's name
                            
                                Linking between modules in javadoc
                            
                                Could not find method implementation() for arguments [directory 'libs'] on object
                            
                                Maven - Package does not exist
                            
                                Maven Compilation Failure but no errors found
                            
                                AWS Kinesis - how to resume consuming from last checkpoint
                            
                                java.lang.IllegalStateException: Encountered invalid @Scheduled method 'execute': For input string: "1#1"
                            
                                Avoiding unchecked casting using generics

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Handle duplicate header names in CSV file

Tags:

java

duplicates

csv

plshm

People also ask

1 Answers

Robert

Recent Activity

Donate For Us