Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Read Csv file encoding error

I am using the following method for reading Csv file content:

    /// <summary>
    /// Reads data from a CSV file to a datatable
    /// </summary>
    /// <param name="filePath">Path to the CSV file</param>
    /// <returns>Datatable filled with data read from the CSV file</returns>
    public DataTable ReadCsv(string filePath)
    {
        if (string.IsNullOrEmpty(filePath))
        {
            log.Error("Invalid CSV file name.");
            return null;
        }

        try
        {
            DataTable dt = new DataTable();

            string folder = FileMngr.Instance.ExtractFileDir(filePath);
            string fileName = FileMngr.Instance.ExtractFileName(filePath);
            string connectionString = 
            string.Concat(@"Driver={Microsoft Text Driver (*.txt; *.csv)};Dbq=",
            folder, ";");

            using (OdbcConnection conn = 
                   new System.Data.Odbc.OdbcConnection(connectionString))
            {
                string selectCommand = string.Concat("select * from [", fileName, "]");
                using (OdbcDataAdapter da = new OdbcDataAdapter(selectCommand, conn))
                {
                    da.Fill(dt);
                }
            }

            return dt;
        }
        catch (Exception ex)
        {
            log.Error("Error loading CSV content", ex);
            return null;
        }
    }

This method works if I have a UTF-8 encoded Csv file with a schema.ini that looks something like this:

[Example.csv]
Format=Delimited(,)
ColNameHeader=True
MaxScanRows=2
CharacterSet=ANSI

If I have German characters in a Csv file with Unicode encoding, the method cannot read the data correctly.

What modifications can I make to the above method to read Unicode Csv files? If there is no way to do it this way, what Csv-reading code can you suggest?

like image 471
Germstorm Avatar asked Jan 12 '09 07:01

Germstorm


People also ask

What is UTF-8 encoding for a CSV?

UTF-8, or "Unicode Transformation Format, 8 Bit" is a marketing operations pro's best friend when it comes to data imports and exports. It refers to how a file's character data is encoded when moving files between systems.

How do I check the encoding of a CSV file?

The Best Answer is The evaluated encoding of the open file will display on the bottom bar, far right side. The encodings supported can be seen by going to Settings -> Preferences -> New Document/Default Directory and looking in the drop down.

What encoding should I use for CSV?

It is recommended to always use UTF-8 encoding in your CSV files. Why? UTF-8 encoding contains 1,112,064 characters, which encompasses just about any character you would type, in any language.


2 Answers

Try using CharacterSet=UNICODE in your schema.ini file. Although this is not documented on MSDN it works according to this thread on Microsoft Forums.

like image 111
csgero Avatar answered Oct 04 '22 06:10

csgero


Well, a very good and well-used streaming CSV reader is on CodeProject; that is the first thing I'd try... but it sounds like your encoding may be borked, which might not make it simple... of course, it could just be odbc that is breaking, in which case the above might work fine.

For simple CSV you could try parsing it yourself (string.Split etc), but there are enough edge-cases that a pre-rolled parser is worth using.

like image 42
Marc Gravell Avatar answered Oct 04 '22 04:10

Marc Gravell