I need to process CSV files which have duplicate headers, each data is in three columns (min, max and avg), but the header is the same for each column. The first column is min, second is average, third is max.
Apache CSV parser throws :
java.lang.IllegalArgumentException: The header contains a duplicate name:
How can I configure the parser to accept duplicate headers ?
There is no pre-defined configuration parameter in CSVParser
that controls whether duplicate column names are acceptable.
A look at the source code shows that the initializeHeader
method creates a Map
which will have column names as keys and column indices as values. If you want to use header mappings, the column names must be unique.
However, there is a solution:
Specify a CSVFormat
that ignores the column names defined on the first row of the CSV file, and define your column names manually.
From the CSVFormat
documentation:
Defining column names
To define the column names you want to use to access records, write:
CSVFormat.EXCEL.withHeader("Col1", "Col2", "Col3");
Calling
withHeader(String...)
let's you use the given names to address values in aCSVRecord
, and assumes that your CSV source does not contain a first record that also defines column names. If it does, then you are overriding this metadata with your names and you should skip the first record by callingwithSkipHeaderRecord(boolean)
withtrue
.
Can now configure CSVParser to allow duplicate headers.
CSVFormat csvFormat = CSVFormat.withAllowDuplicateHeaderNames()
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With