Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to read a header from a specific line with CsvHelper?

I'm trying to read a CSV file where header is at row 3:

some crap line
some empty line
COL1,COL2,COl3,...
val1,val2,val3
val1,val2,val3

How do I tell CSVHelper the header is not at the first row?

I tried to skip 2 lines with Read() but the succeeding call to ReadHeader() throws an exception that the header has already been read.

using (var csv = new CsvReader(new StreamReader(stream), csvConfiguration)) {
   csv.Read();
   csv.Read();
   csv.ReadHeader();
   .....

If I set csvConfiguration.HasHeaderRecord to false ReadHeader() fails again.

like image 672
UserControl Avatar asked Sep 22 '16 14:09

UserControl


2 Answers

Try this:

using (var reader = new StreamReader(stream)) {
      reader.ReadLine();
      reader.ReadLine();
      using (var csv = new CsvReader(reader)) {                    
          csv.ReadHeader();                    
    }
}
like image 77
Evk Avatar answered Sep 20 '22 10:09

Evk


As of CsvHelper 27.0 the problem is no longer reproducible. A header can now be read in from any line. This may have been implemented as far back as Release 3.0.0 from 2017 which included, according to the change log:

3.0.0

Read more than 1 header row.

Thus the following code now just works, and has worked for a while:

var csvText = "some crap line\nsome empty line\nCOL1,COL2,COl3\nval1,val2,val3\nval1,val2,val3\n\n";
using var stream = new MemoryStream(Encoding.UTF8.GetBytes(csvText));

var csvConfiguration = new CsvConfiguration(CultureInfo.InvariantCulture)
{
    // Your settings here.
};
using (var csv = new CsvReader(new StreamReader(stream), csvConfiguration))
{
    csv.Read(); // Read in the first row "some crap line"
    csv.Read(); // Read in the second row "some empty line"
    csv.Read(); // Read in the third row which is the actual header.
    csv.ReadHeader(); // Process the currently read row as the header.

    Assert.AreEqual(3, csv.HeaderRecord.Length);
    Assert.AreEqual(@"COL1,COL2,COl3", String.Join(",", csv.HeaderRecord));

Successful demo fiddle #1 here.

Warning: be aware that CsvHelper skips blank lines by default so if some of the preliminary lines to be skipped might or might not be blank, then csv.Read() might silently read past them -- and then consume your header also, resulting the the wrong row being used as a header row!

Failing demo fiddle #2 here.

To avoid this possibility and deterministically skip a certain number of lines at the beginning of the file, you must set CsvConfiguration.IgnoreBlankLines = false. However, this property cannot be modified once the CsvReader is created, so if you need to skip blank data lines this can be accomplished by using a ShouldSkipRecord callback:

bool ignoreBlankLines = false;
var csvConfiguration = new CsvConfiguration(CultureInfo.InvariantCulture)
{
    IgnoreBlankLines = false,
    ShouldSkipRecord = (args) => !ignoreBlankLines ? false : args.Record.Length == 0 || args.Record.Length == 1 && string.IsNullOrEmpty(args.Record[0]),
    // Your settings here.
};
using (var csv = new CsvReader(new StreamReader(stream), csvConfiguration))
{
    csv.Read(); // Read in the first row "some crap line"
    csv.Read(); // Read in the second empty row, which is empty.
    csv.Read(); // Read in the third row which is the actual header.
    csv.ReadHeader(); // Process the currently read row as the header.
    ignoreBlankLines = true; // Now that the header has been read, ignore blank data lines.

Successful demo fiddle #3 here.

like image 40
dbc Avatar answered Sep 22 '22 10:09

dbc