Strip Byte Order Mark from string in C#

Tags:

In C#, I have a string that I'm obtaining from WebClient.DownloadString. I've tried setting client.Encoding to new UTF8Encoding(false), but that's made no difference - I still end up with a byte order mark for UTF-8 at the beginning of the result string. I need to remove this (to parse the resulting XML with LINQ), and want to do so in memory.

So I have a string that starts with \x00EF\x00BB\x00BF, and I want to remove that if it exists. Right now I'm using

if (xml.StartsWith(ByteOrderMarkUtf8)) {     xml = xml.Remove(0, ByteOrderMarkUtf8.Length); }

but that just feels wrong. I've tried all sorts of code with streams, GetBytes, and encodings, and nothing works. Can anyone provide the "right" algorithm to strip a BOM from a string?

605

asked Aug 23 '09 03:08

TrueWill

1 Answers

I recently had issues with the .NET 4 upgrade, but until then the simple answer is

String.Trim()

removes the BOM up until .NET 3.5.

However, in .NET 4 you need to change it slightly:

String.Trim(new char[]{'\uFEFF'});

That will also get rid of the byte order mark, though you may also want to remove the ZERO WIDTH SPACE (U+200B):

String.Trim(new char[]{'\uFEFF','\u200B'});

This you could also use to remove other unwanted characters.

Some further information is from String.Trim Method:

The .NET Framework 3.5 SP1 and earlier versions maintain an internal list of white-space characters that this method trims. Starting with the .NET Framework 4, the method trims all Unicode white-space characters (that is, characters that produce a true return value when they are passed to the Char.IsWhiteSpace method). Because of this change, the Trim method in the .NET Framework 3.5 SP1 and earlier versions removes two characters, ZERO WIDTH SPACE (U+200B) and ZERO WIDTH NO-BREAK SPACE (U+FEFF), that the Trim method in the .NET Framework 4 and later versions does not remove. In addition, the Trim method in the .NET Framework 3.5 SP1 and earlier versions does not trim three Unicode white-space characters: MONGOLIAN VOWEL SEPARATOR (U+180E), NARROW NO-BREAK SPACE (U+202F), and MEDIUM MATHEMATICAL SPACE (U+205F).

110

answered Sep 21 '22 15:09

PJUK

Related questions
                            
                                What's the point of the var keyword? [duplicate]
                            
                                Visual Studio 2010: Could not resolve mscorlib for target framework '.NETFramework,Version=v4.0'
                            
                                Setting Culture for ASP.NET MVC application on VS dev server and IIS
                            
                                IsNullOrEmpty equivalent for Array? C#
                            
                                Datatable to html Table
                            
                                overhead to unused "using" declarations?
                            
                                Can you create sql views / stored procedure using Entity Framework 4.1 Code first approach
                            
                                How can I compare a date in C# to "1/1/0001 12:00:00 AM")
                            
                                How does Mono work
                            
                                Set background image on grid in WPF using C#
                            
                                Finding first index of element that matches a condition using LINQ
                            
                                A call to PInvoke function '[...]' has unbalanced the stack
                            
                                Find if listA contains any elements not in listB
                            
                                The non-generic method 'IServiceProvider.GetService(Type)' cannot be used with type arguments
                            
                                How do I discover the quarter of a given date
                            
                                How do I add a custom routed command in WPF?
                            
                                Await for list of Tasks
                            
                                Formatting a double to two decimal places
                            
                                Identifying Exception Type in a handler Catch Block
                            
                                "if (a() && b != null)" will "a()" always be evaluated?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Strip Byte Order Mark from string in C#

Tags:

string

c#

encoding

TrueWill

People also ask

1 Answers

PJUK

Recent Activity

Donate For Us