Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

CsvHelper : How to detect the Delimiter from the given csv file

Tags:

csvhelper

I am using CsvHelper to read/writer the data into Csv file. Now I want to parse the delimiter of the csv file. How can I get this please?

My code:

     var parser = new CsvParser(txtReader);
     delimiter = parser.Configuration.Delimiter;

I always got delimiter is "," but actually in the csv file the delimiter is "\t".

like image 873
jamie2015 Avatar asked Oct 26 '15 08:10

jamie2015


3 Answers

Since I had to deal with the possibility that, depending on the localization settings of the user, the CSV file (Saved in MS Excel) could contain a different delimiter, I ended up with the following approach :

public static string DetectDelimiter(StreamReader reader)
{
    // assume one of following delimiters
    var possibleDelimiters =  new List<string> {",",";","\t","|"};

    var headerLine = reader.ReadLine();

    // reset the reader to initial position for outside reuse
    // Eg. Csv helper won't find header line, because it has been read in the Reader
    reader.BaseStream.Position = 0;
    reader.DiscardBufferedData();

    foreach (var possibleDelimiter in possibleDelimiters)
    {
        if (headerLine.Contains(possibleDelimiter))
        {
            return possibleDelimiter;
        }
    }

    return possibleDelimiters[0];
}

I also needed to reset the reader's read position, since it was the same instance I used In the CsvReader constructor.

The usage was then as follows:

using (var textReader = new StreamReader(memoryStream))
{
    var delimiter = DetectDelimiter(textReader);

    using (var csv = new CsvReader(textReader))
    {
        csv.Configuration.Delimiter = delimiter;

        ... rest of the csv reader process

    }
}
like image 179
Steven Avatar answered Sep 19 '22 05:09

Steven


I found this piece of code in this site

public static char Detect(TextReader reader, int rowCount, IList<char> separators)
{
    IList<int> separatorsCount = new int[separators.Count];

    int character;

    int row = 0;

    bool quoted = false;
    bool firstChar = true;

    while (row < rowCount)
    {
        character = reader.Read();

        switch (character)
        {
            case '"':
                if (quoted)
                {
                    if (reader.Peek() != '"') // Value is quoted and 
            // current character is " and next character is not ".
                        quoted = false;
                    else
                        reader.Read(); // Value is quoted and current and 
                // next characters are "" - read (skip) peeked qoute.
                }
                else
                {
                    if (firstChar)  // Set value as quoted only if this quote is the 
                // first char in the value.
                        quoted = true;
                }
                break;
            case '\n':
                if (!quoted)
                {
                    ++row;
                    firstChar = true;
                    continue;
                }
                break;
            case -1:
                row = rowCount;
                break;
            default:
                if (!quoted)
                {
                    int index = separators.IndexOf((char)character);
                    if (index != -1)
                    {
                        ++separatorsCount[index];
                        firstChar = true;
                        continue;
                    }
                }
                break;
        }

        if (firstChar)
            firstChar = false;
    }

    int maxCount = separatorsCount.Max();

    return maxCount == 0 ? '\0' : separators[separatorsCount.IndexOf(maxCount)];
}

With separators is the possible separators that you can have.

Hope that help :)

like image 32
Maraboc Avatar answered Sep 20 '22 05:09

Maraboc


CSV is Comma Separated Values. I don't think you can reliably detect if there is a different character used a separator. If there is a header row, then you might be able to count on it.

You should know the separator that is used. You should be able to see it when opening the file. If the source of the files gives you a different separator each time and is not reliable, then I'm sorry. ;)

If you just want to parse using a different delimiter, then you can set csv.Configuration.Delimiter. http://joshclose.github.io/CsvHelper/#configuration-delimiter

like image 30
Josh Close Avatar answered Sep 19 '22 05:09

Josh Close