I am using the following method for reading Csv file content:
/// <summary>
/// Reads data from a CSV file to a datatable
/// </summary>
/// <param name="filePath">Path to the CSV file</param>
/// <returns>Datatable filled with data read from the CSV file</returns>
public DataTable ReadCsv(string filePath)
{
if (string.IsNullOrEmpty(filePath))
{
log.Error("Invalid CSV file name.");
return null;
}
try
{
DataTable dt = new DataTable();
string folder = FileMngr.Instance.ExtractFileDir(filePath);
string fileName = FileMngr.Instance.ExtractFileName(filePath);
string connectionString =
string.Concat(@"Driver={Microsoft Text Driver (*.txt; *.csv)};Dbq=",
folder, ";");
using (OdbcConnection conn =
new System.Data.Odbc.OdbcConnection(connectionString))
{
string selectCommand = string.Concat("select * from [", fileName, "]");
using (OdbcDataAdapter da = new OdbcDataAdapter(selectCommand, conn))
{
da.Fill(dt);
}
}
return dt;
}
catch (Exception ex)
{
log.Error("Error loading CSV content", ex);
return null;
}
}
This method works if I have a UTF-8 encoded Csv file with a schema.ini that looks something like this:
[Example.csv]
Format=Delimited(,)
ColNameHeader=True
MaxScanRows=2
CharacterSet=ANSI
If I have German characters in a Csv file with Unicode encoding, the method cannot read the data correctly.
What modifications can I make to the above method to read Unicode Csv files? If there is no way to do it this way, what Csv-reading code can you suggest?
UTF-8, or "Unicode Transformation Format, 8 Bit" is a marketing operations pro's best friend when it comes to data imports and exports. It refers to how a file's character data is encoded when moving files between systems.
The Best Answer is The evaluated encoding of the open file will display on the bottom bar, far right side. The encodings supported can be seen by going to Settings -> Preferences -> New Document/Default Directory and looking in the drop down.
It is recommended to always use UTF-8 encoding in your CSV files. Why? UTF-8 encoding contains 1,112,064 characters, which encompasses just about any character you would type, in any language.
Try using CharacterSet=UNICODE
in your schema.ini file. Although this is not documented on MSDN it works according to this thread on Microsoft Forums.
Well, a very good and well-used streaming CSV reader is on CodeProject; that is the first thing I'd try... but it sounds like your encoding may be borked, which might not make it simple... of course, it could just be odbc that is breaking, in which case the above might work fine.
For simple CSV you could try parsing it yourself (string.Split
etc), but there are enough edge-cases that a pre-rolled parser is worth using.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With