Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Lumenworks Csv reader To read columns with same Names Or avoid ` An item with the same key has already been added`

Tags:

c#

csv

lumenworks

I wanted to know if there is any way to make CSV reader read all the columns in the CSV (which will have same column names). I get a An item with the same key has already been added error. I want this to work because my logic is to make a array of similar named columns if it exists and later for each instance of the array element I write further logic.

The final point is I want to be able to read all the columns even if there are columns with same name. I am using a custom object to hold the name value data. So no need to worry about dictionary causing same key exist error. If Lumen-works CSV doesn't support it then what can I use?. Also my CSV file has Json data (with double quotes, comma's) I need to handle this too.

like image 480
Rajshekar Reddy Avatar asked Dec 30 '14 07:12

Rajshekar Reddy


2 Answers

You've stumped me -- I don't know of any CSV parser that accounts for duplicate column headers, and I've tested quite a few of them. There are CSV parsers which will give you raw column data, though, and with some leg work you can use this as a building block to get your data into a friendlier state.

This will return a sequence of Dictionary<string, List<string>>, one for each record, with the key being the header and the list being all the columns with the same header:

using System.IO;
using System.Collections.Generic;
using Ctl.Data;

static IEnumerable<Dictionary<string, List<string>>> ReadCsv(string filePath)
{
    using (StreamReader sr = new StreamReader(filePath))
    {
        CsvReader csv = new CsvReader(sr);

        // first read in the header.

        if (!csv.Read())
        {
            yield break; // an empty file, break out early.
        }

        RowValue header = csv.CurrentRow;

        // now the records.

        while (csv.Read())
        {
            Dictionary<string, List<string>> dict =
                new Dictionary<string, List<string>>(header.Count);

            RowValue record = csv.CurrentRow;

            // map each column to a header value

            for (int i = 0; i < record.Count; ++i)
            {
                // if there are more values in the record than the header,
                // assume an empty string as header.

                string headerValue = (i < header.Count ? header[i].Value : null)
                    ?? string.Empty;

                // get the list, or create if it doesn't exist.

                List<string> list;

                if (!dict.TryGetValue(headerValue, out list))
                {
                    dict[headerValue] = list = new List<string>();
                }

                // finally add column value to the list.

                list.Add(record[i].Value);
            }

            yield return dict;
        }
    }
}

I'm not familiar enough with Lumenworks -- this uses Ctl.Data, which I know will allow for formatted JSON data and any other weirdness within columns so long as it is properly quoted. (disclaimer: I'm the author of Ctl.Data)

like image 136
Cory Nelson Avatar answered Nov 02 '22 10:11

Cory Nelson


This is supported as of LumenWorks 4.0 thanks to jonreis.

See LumenWorks.Framework.Tests.Unit/IO/Csv/CsvReaderTest.cs

    using (CsvReader csvReader = new CsvReader(new StringReader("Header,Header\r\nValue1,Value2"), true))
        {
          csvReader.DuplicateHeaderEncountered += (s, e) => e.HeaderName = $"{e.HeaderName}_{e.Index}";
like image 35
LeslieM Avatar answered Nov 02 '22 11:11

LeslieM