how remove the BOM(ï»¿) characters from a UTF 8 encoded csv?

Tags:

I need to parse a utf8 encoded csv. After conversion i just saw that the problem is with the BOM (ï»¿) character at the beginging. I cannot create a csv avoiding the BOM with utf8 encoding as i need to parse it even if it is utf8 encoded.

Any one please tell me how can i remove the BOM (ï»¿) character from a csv using c#.net..

Update : I have added my code to read the csv headers since im getting the BOM at the beginning of the file.

 string CSVConnectionString = "Driver={Microsoft Text Driver (*.txt; *.csv)};Dbq=" + ConfigurationSettings.AppSettings["CSVFolder"].ToString() + ";Extensions=asc,csv,tab,txt;Persist Security Info=False;";

        using (OdbcConnection Connection = new OdbcConnection(CSVConnectionString))
        {
            List<string> CSVHeaders = new List<string>();

            string SelectQuery = string.Format(@"SELECT TOP 1 * FROM [{0}]", CSVFileName);

            OdbcCommand Command = new OdbcCommand(SelectQuery, Connection);

            Connection.Open();

            OdbcDataReader Reader = Command.ExecuteReader(System.Data.CommandBehavior.CloseConnection);

            int ColumnCount = Reader.FieldCount;

            for (int column = 0; column < ColumnCount; column++)
            {
                CSVHeaders.Add(Reader.GetName(column));
            }

            return CSVHeaders;
        }

278

asked Jun 07 '11 05:06

Harun

3 Answers

Actually, C# can read UTF-8 encoded files containing a BOM just fine. It's the broken CSV text driver you're using that's actually causing the problem. I'd recommend one of the other CSV reading solutions from this answer.

answered Oct 17 '22 16:10

Daniel Pryden

Here is a function that does this:

    public static void SaveAsUTF8WithoutByteOrderMark(string fileName)
    {
        SaveAsUTF8WithoutByteOrderMark(fileName, null);
    }

    public static void SaveAsUTF8WithoutByteOrderMark(string fileName, Encoding encoding)
    {
        if (fileName == null)
            throw new ArgumentNullException("fileName");

        if (encoding == null)
        {
            encoding = Encoding.Default;
        }

        File.WriteAllText(fileName, File.ReadAllText(fileName, encoding), new UTF8Encoding(false));
    }

answered Oct 17 '22 17:10

Simon Mourier

Instead of changing horses (use another .csv driver) or help the given horse by pulling the wagon yourself (change the encoding), you should tell the horse (the standard ODBC Text driver) what it needs to know to do the job by adding a schema.ini file:

[withbomgood.txt]
Format=TabDelimited
ColNameHeader=True
CharacterSet=65001
Col1=FrsColümn CHAR

to define the format of withbomgood.txt:

FrsColümn
whätever

which is an exact copy of withbombad.txt; both files have a BOM:

ï»¿FrsColÃ¼mn
whÃ¤tever

If you now call a slightly modified copy

static void Harun00(string CSVFileName)
{
    string CSVFilePath = @"E:\trials\SoTrials\answers\6260911\data";
    string CSVConnectionString = 
        "Driver={Microsoft Text Driver (*.txt; *.csv)};Dbq=" + 
        CSVFilePath +
        ";Extensions=asc,csv,tab,txt;Persist Security Info=False;";

    using (OdbcConnection Connection = new OdbcConnection(CSVConnectionString))
    {
        List<string> CSVHeaders = new List<string>();

        string SelectQuery = string.Format(@"SELECT TOP 1 * FROM [{0}]", CSVFileName);

        OdbcCommand Command = new OdbcCommand(SelectQuery, Connection);

        Connection.Open();

        OdbcDataReader Reader = Command.ExecuteReader(System.Data.CommandBehavior.CloseConnection);

        int ColumnCount = Reader.FieldCount;

        for (int column = 0; column < ColumnCount; column++)
        {
            CSVHeaders.Add(Reader.GetName(column));
        }

        Console.WriteLine(CSVHeaders[0]);
    }
}

of your code twice:

static void Main(string[] args)
{
    Harun00("withbombad.txt");
    Harun00("withbomgood.txt");
}

you get:

ï»¿FrsColÃ¼mn
FrsColümn
Press any key to continue . . .

which proves that the driver will read an UTF8 with BOM file correctly and without any further ADO if you follow the rule: define your csv tables in a schema.ini file.

answered Oct 17 '22 16:10

Ekkehard.Horner

Related questions
                            
                                NetworkCredential error in ASP.NET
                            
                                Visual Studio Green Warning Underlines
                            
                                Inheritance with Silverlight User Control Partial Classes
                            
                                Linq to XML Queries
                            
                                How to maintain tab order after postback
                            
                                detect windows errors/popups
                            
                                Accessing a static property of a child in a parent method - Design considerations
                            
                                My Enum is not recognized using reflection and PropertyInfo
                            
                                Is there a way to tell which EventLog caused the EntryWritten event in C#?
                            
                                Why is Thread.CurrentThread.CurrentCulture.Name showing "en-US" when my server's regional language is set to English (United Kingdom)
                            
                                Silverlight: Value does not fall within the expected range exception
                            
                                C# and VB.NET LDAP Search Different?
                            
                                MVC 3 Custom Errors Not Showing
                            
                                RNGCryptoServiceProvider: generate random numbers in the range [0, randomMax)
                            
                                SendInput to minimized window
                            
                                LuaInterface - how-to restrict access to .Net classes?
                            
                                DateTime.Now Returned Weird Date
                            
                                Implicit Cast not happening in Expression Tree
                            
                                log4net not working in dll
                            
                                Garbage collector won't collect an object created with using

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

how remove the BOM(ï»¿) characters from a UTF 8 encoded csv?

Tags:

c#

parsing

csv

utf-8