I'm having a problem comparing strings in a Unit Test in C# 4.0 using Visual Studio 2010. This same test case works properly in Visual Studio 2008 (with C# 3.5). Here's the relevant code snippet: <pre class="prettyprint"><code>byte[] rawData = GetData(); string data = Encoding.UTF8.GetString(rawData); Assert.AreEqual("Constant", data, false, CultureInfo.InvariantCulture); </code></pre> While debugging this test, the <code>data</code> string appears to the naked eye to contain exactly the same string as the literal. When I called <code>data.ToCharArray()</code>, I noticed that the first byte of the string <code>data</code> is the value <code>65279</code> which is the UTF-8 Byte Order Marker. What I don't understand is why <code>Encoding.UTF8.GetString()</code> keeps this byte around. How do I get <code>Encoding.UTF8.GetString()</code> to not put the Byte Order Marker in the resulting string? Update: The problem was that <code>GetData()</code>, which reads a file from disk, reads the data from the file using <code>FileStream.readbytes()</code>. I corrected this by using a <code>StreamReader</code> and converting the string to bytes using <code>Encoding.UTF8.GetBytes()</code>, which is what it should've been doing in the first place! Thanks for all the help.

Well, I assume it's because the raw binary data includes the BOM. You could always remove the BOM yourself after decoding, if you don't want it - but you should consider whether the byte array should consider the BOM to start with. EDIT: Alternatively, you could use a <code>StreamReader</code> to perform the decoding. Here's an example, showing the same byte array being converted into two characters using <code>Encoding.GetString</code> or one character via a <code>StreamReader</code>: <pre class="prettyprint"><code>using System; using System.IO; using System.Text; class Test { static void Main() { byte[] withBom = { 0xef, 0xbb, 0xbf, 0x41 }; string viaEncoding = Encoding.UTF8.GetString(withBom); Console.WriteLine(viaEncoding.Length); string viaStreamReader; using (StreamReader reader = new StreamReader (new MemoryStream(withBom), Encoding.UTF8)) { viaStreamReader = reader.ReadToEnd(); } Console.WriteLine(viaStreamReader.Length); } } </code></pre>

There is a slightly more efficient way to do it than creating StreamReader and MemoryStream: 1) If you know that there is always a BOM <pre class="prettyprint"><code>string viaEncoding = Encoding.UTF8.GetString(withBom, 3, withBom.Length - 3); </code></pre> 2) If you don't know, check: <pre class="prettyprint"><code>string viaEncoding; if (withBom.Length >= 3 && withBom[0] == 0xEF && withBom[1] == 0xBB && withBom[2] == 0xBF) viaEncoding = Encoding.UTF8.GetString(withBom, 3, withBom.Length - 3); else viaEncoding = Encoding.UTF8.GetString(withBom); </code></pre>

How do I ignore the UTF-8 Byte Order Marker in String comparisons?

Tags:

c#

equality

unit-testing

visual-studio-2010

utf-8

I'm having a problem comparing strings in a Unit Test in C# 4.0 using Visual Studio 2010. This same test case works properly in Visual Studio 2008 (with C# 3.5).

Here's the relevant code snippet:

byte[] rawData = GetData();
string data = Encoding.UTF8.GetString(rawData);

Assert.AreEqual("Constant", data, false, CultureInfo.InvariantCulture);

While debugging this test, the data string appears to the naked eye to contain exactly the same string as the literal. When I called data.ToCharArray(), I noticed that the first byte of the string data is the value 65279 which is the UTF-8 Byte Order Marker. What I don't understand is why Encoding.UTF8.GetString() keeps this byte around.

How do I get Encoding.UTF8.GetString() to not put the Byte Order Marker in the resulting string?

Update: The problem was that GetData(), which reads a file from disk, reads the data from the file using FileStream.readbytes(). I corrected this by using a StreamReader and converting the string to bytes using Encoding.UTF8.GetBytes(), which is what it should've been doing in the first place! Thanks for all the help.

666

asked May 26 '10 17:05

Skrud

2 Answers

Well, I assume it's because the raw binary data includes the BOM. You could always remove the BOM yourself after decoding, if you don't want it - but you should consider whether the byte array should consider the BOM to start with.

EDIT: Alternatively, you could use a StreamReader to perform the decoding. Here's an example, showing the same byte array being converted into two characters using Encoding.GetString or one character via a StreamReader:

using System;
using System.IO;
using System.Text;

class Test
{
    static void Main()
    {
        byte[] withBom = { 0xef, 0xbb, 0xbf, 0x41 };
        string viaEncoding = Encoding.UTF8.GetString(withBom);
        Console.WriteLine(viaEncoding.Length);

        string viaStreamReader;
        using (StreamReader reader = new StreamReader
               (new MemoryStream(withBom), Encoding.UTF8))
        {
            viaStreamReader = reader.ReadToEnd();           
        }
        Console.WriteLine(viaStreamReader.Length);
    }
}

167

answered Oct 16 '22 18:10

Jon Skeet

There is a slightly more efficient way to do it than creating StreamReader and MemoryStream:

1) If you know that there is always a BOM

string viaEncoding = Encoding.UTF8.GetString(withBom, 3, withBom.Length - 3);

2) If you don't know, check:

string viaEncoding;
if (withBom.Length >= 3 && withBom[0] == 0xEF && withBom[1] == 0xBB && withBom[2] == 0xBF)
    viaEncoding = Encoding.UTF8.GetString(withBom, 3, withBom.Length - 3);
else
    viaEncoding = Encoding.UTF8.GetString(withBom);

answered Oct 16 '22 17:10

Tergiver

Related questions
                            
                                Update Build Controller/Agents to build C# 6 /.NET 4.6 application
                            
                                Develop WPF App like Windows 10 Setting App [closed]
                            
                                Async inside Using block
                            
                                Can a String based Include alternative be created in Entity Framework Core?
                            
                                How to add .Net framework prerequisite to setup install
                            
                                NSubstitute: How to access actual parameters in Returns
                            
                                Unable to determine the relationship represented by navigation property ASP.NET core 2.0 Entity Framework
                            
                                Get HTML Code from a website after it completed loading
                            
                                Scroll to specified part of page when clicking top navigation link in Blazor
                            
                                Genealogy Tree Control [closed]
                            
                                Regex to match all words except a given list
                            
                                Why does the c# compiler emit Activator.CreateInstance when calling new in with a generic type with a new() constraint?
                            
                                How can I convert anonymous type to strong type in LINQ?
                            
                                C# MVC: Performance and Advantages of MVC Html Helpers vs. Direct HTML in views
                            
                                In C#, sign an xml with a x.509 certificate and check the signature
                            
                                List all processes and their current memory & CPU consumption?
                            
                                "GetOrCreate" - does that idiom have an established name?
                            
                                How to restrict file types with HTML input file type?
                            
                                XmlSerializer.Deserialize on a List<> item
                            
                                How do you handle large projects? [closed]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With