Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Lumenworks Fast CsvReader - Exception error reading tab delimit file due to quote character

Tags:

c#

csv

lumenworks

I'm using Lumenworks Fast CsvReader and the exception error occurred while reading the Kelley Blue Book's file:

The CSV appears to be corrupt near record '1281' field '5 at position '1169'

The file is tab delimited. In there I found double quotes was used but I don't see how to escape it and resume normally because it's tab delimited data.

--Characters in Text File--
12345    2013    RAV4 "Sport" Sport Utility 4D    2

--Source Code--
using(CsvReader csvReader = new CsvReader(new StreamReader(filePath), false, '\t', '"', '"', '#', LumenWorks.Framework.IO.Csv.ValueTrimmingOptions.QuotedOnly))
{
}

I tried a number of different CsvReader setting with no luck. What do you use that works great? I do not have that much trouble with comma delimited files.

like image 716
fletchsod Avatar asked Feb 06 '14 14:02

fletchsod


1 Answers

There is a mssing closing bracket behind the StreamReader:

using (CsvReader csvReader = new CsvReader(new StreamReader(filePath), false, '\t', '"', '"', '#', LumenWorks.Framework.IO.Csv.ValueTrimmingOptions.All))
{
    int fieldCount = csvReader.FieldCount;
    while (csvReader.ReadNextRecord())
    {
        for (int i = 0; i < fieldCount; i++)
            Console.WriteLine("Column {0}: {1}", i + 1, csvReader[i]);
    }
}

I have tested it with your line above (forced tab as delimiter in the file) and it worked.

Output was:

Column 1: 12345
Column 2: 2013
Column 3: RAV4
Column 4: Sport
Column 5: Sport Utility
Column 6: 4D
Column 7: 2

Update, according your comment and the provided text-file:

This csv-reader enables to handle FillError and ParseError exceptions raised by invalid or corrupt data. So you handle them to get more informations and for logging purposes.

For example:

void csv_ParseError(object sender, ParseErrorEventArgs e)
{
    // if the error is that a field is missing, then skip to next line
    if (e.Error is MissingFieldCsvException)
    {
        //Log.Write(e.Error, "--MISSING FIELD ERROR OCCURRED!" + Environment.NewLine);
        e.Action = ParseErrorAction.AdvanceToNextLine;
    }
    else if (e.Error is MalformedCsvException)
    {
        //Log.Write(e.Error, "--MALFORMED CSV ERROR OCCURRED!" + Environment.NewLine);
        e.Action = ParseErrorAction.AdvanceToNextLine;
    }
    else
    {
        //Log.Write(e.Error, "--UNKNOWN PARSE ERROR OCCURRED!" + Environment.NewLine);
        e.Action = ParseErrorAction.AdvanceToNextLine;
    }
}

You need to listen to this event:

csvReader.MissingFieldAction = MissingFieldAction.ParseError;
csvReader.DefaultParseErrorAction = ParseErrorAction.RaiseEvent;
csvReader.ParseError += csv_ParseError;

I have recognized that it doesn't work to use " as quoting character with your text-file since some fields contain data like RAV4 "Sport" Sport Utility 4D. So the field itself contains the quoting character. Instead you don't need one at all since no fields are quoted. So don't provide one in the constructor or set it as '\0'. Then this runs without a problem:

using(var rd  = new StreamReader(filePath))
using (var csvReader = new CsvReader(rd, false, '\t', '\0', '\0', '#', ValueTrimmingOptions.All))
{
    csvReader.MissingFieldAction = MissingFieldAction.ParseError;
    csvReader.DefaultParseErrorAction = ParseErrorAction.RaiseEvent;
    csvReader.ParseError += csv_ParseError;
    csvReader.SkipEmptyLines = true;
    int fieldCount = csvReader.FieldCount;
    while (csvReader.ReadNextRecord())
    {
       var fields = new List<string>();
        for (int i = 0; i < fieldCount; i++)
        {
            fields.Add(csvReader[i]);
        }
        lines.Add(fields);
    }
}
like image 123
Tim Schmelter Avatar answered Nov 14 '22 21:11

Tim Schmelter