Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Apache CSV parser with duplicate column headers

I need to process CSV files which have duplicate headers, each data is in three columns (min, max and avg), but the header is the same for each column. The first column is min, second is average, third is max.

Apache CSV parser throws :

java.lang.IllegalArgumentException: The header contains a duplicate name:

How can I configure the parser to accept duplicate headers ?

like image 675
klonq Avatar asked Jul 24 '16 07:07

klonq


2 Answers

There is no pre-defined configuration parameter in CSVParser that controls whether duplicate column names are acceptable.

A look at the source code shows that the initializeHeader method creates a Map which will have column names as keys and column indices as values. If you want to use header mappings, the column names must be unique.

However, there is a solution:

Specify a CSVFormat that ignores the column names defined on the first row of the CSV file, and define your column names manually.

From the CSVFormat documentation:

Defining column names

To define the column names you want to use to access records, write:

CSVFormat.EXCEL.withHeader("Col1", "Col2", "Col3");

Calling withHeader(String...) let's you use the given names to address values in a CSVRecord, and assumes that your CSV source does not contain a first record that also defines column names. If it does, then you are overriding this metadata with your names and you should skip the first record by calling withSkipHeaderRecord(boolean) with true.

like image 146
Matthias Wiehl Avatar answered Dec 09 '22 04:12

Matthias Wiehl


Can now configure CSVParser to allow duplicate headers.

CSVFormat csvFormat = CSVFormat.withAllowDuplicateHeaderNames()
like image 37
Miha Hribar Avatar answered Dec 09 '22 06:12

Miha Hribar