Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

CSV Text file parser with TextFieldParser - MalformedLineException

Tags:

c#

parsing

csv

I am working on a CSV parser using C# TextFieldParser class.

My CSV data is deliminated by , and the string is enclosed by a " character.

However, sometimes the data row cell can also have a " which appears to be making the parser throw an exception.

enter image description here

This is my C# code so far:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.IO;
using Microsoft.VisualBasic.FileIO;

namespace CSV_Parser
{
    class Program
    {
        static void Main(string[] args)
        {
            // Init
            string CSV_File = "test.csv";

            // Proceed If File Is Found
            if (File.Exists(CSV_File))
            {
                // Test
                Parse_CSV(CSV_File);
            }

            // Finished
            Console.WriteLine("Press any to exit ...");
            Console.ReadKey();
        }

        static void Parse_CSV(String Filename)
        {
            using (TextFieldParser parser = new TextFieldParser(Filename))
            {
                parser.TextFieldType = FieldType.Delimited;
                parser.SetDelimiters(",");
                parser.TrimWhiteSpace = true;
                while (!parser.EndOfData)
                {
                    string[] fieldRow = parser.ReadFields();
                    foreach (string fieldRowCell in fieldRow)
                    {
                        // todo
                    }
                }
            }
        }
    }
}

This is the content of my test.csv file:

" dummy test"s data",   b  ,  c  
d,e,f
gh,ij

What is the best way to deal with " in my row cell data?


UPDATE

Based on Tim Schmelter's answer, I have modified my code to the following:

static void Parse_CSV(String Filename)
{
    using (TextFieldParser parser = new TextFieldParser(Filename))
    {
        parser.TextFieldType = FieldType.Delimited;
        parser.SetDelimiters(",");
        parser.HasFieldsEnclosedInQuotes = false;
        parser.TrimWhiteSpace = true;
        while (parser.PeekChars(1) != null)
        {
            var cleanFieldRowCells = parser.ReadFields().Select(
                f => f.Trim(new[] { ' ', '"' }));

            Console.WriteLine(String.Join(" | ", cleanFieldRowCells));
        }
    }
}

Which appears to produce the following (correctly):

enter image description here

Is this is the best way to deal with string enclosed by quotes, having quotes?

like image 610
Latheesan Avatar asked Mar 10 '14 10:03

Latheesan


1 Answers

Could you omit the quoting-character by setting HasFieldsEnclosedInQuotes to false?

using (var parser = new TextFieldParser(@"Path"))
{
    parser.HasFieldsEnclosedInQuotes = false;
    parser.Delimiters = new[]{","};
    while(parser.PeekChars(1) != null)
    {
        string[] fields = parser.ReadFields();
    }
}

You can remove the quotes manually:

var cleanFields = fields.Select(f => f.Trim(new[]{ ' ', '"' }));
like image 113
Tim Schmelter Avatar answered Sep 22 '22 14:09

Tim Schmelter