We have recently upgraded all our projects from .NET 3.5 to .NET 4. I have come across a rather strange issue with respect to <code>string.IndexOf()</code>. My code obviously does something slightly different, but in the process of investigating the issue, I found that calling <code>IndexOf()</code> on a string with itself returned 1 instead of 0. In other words: <pre class="prettyprint"><code>string text = "\xAD\x2D"; // problem happens with "-dely N.China", too; int index = text.IndexOf(text); // see update note below. </code></pre> Gave me an index of 1, instead of 0. A couple of things to note about this problem: <ul> <li>The problems seems related to these hyphens (the first character is the Unicode soft hyphen, the second is a regular hyphen).</li> <li>I have double checked, this does not happen in .NET 3.5 but does in .NET 4.</li> <li>Changing the <code>IndexOf()</code> to do an ordinal compare fixes the issue, so for some reason that first character is ignored with the default <code>IndexOf</code>.</li> </ul> Does anyone know why this happens? EDIT Sorry guys, made a bit of a stuff up on the original post and got the hidden dash in there twice. I have updated the string, this should return index of 1 instead of 2, as long as you paste it in the correct editor. Update: Changed the original problem string to one where every actual character is clearly visible (using escaping). This simplifies the question a bit.

Your string exists of two characters: a soft hyphen (Unicode code point 173) and a hyphen (Unicode code point 45). <blockquote> Wiki: According to the Unicode standard, a soft hyphen is not displayed if the line is not broken at that point. </blockquote> When using <code>"\xAD\x2D".IndexOf("\xAD\x2D")</code> in .NET 4, it seems to ignore that you're looking for the soft hyphen, returning a starting index of 1 (the index of <code>\x2D</code>). In .NET 3.5, this returns 0. More fun, if you run this code (so when only looking for the soft hyphen): <pre class="prettyprint"><code>string text = "\xAD\x2D"; string shy = "\xAD"; int i1 = text.IndexOf(shy); </code></pre> then <code>i1</code> becomes 0, regardless of the .NET version used. The result of <code>text.IndexOf(text);</code> varies indeed, which at a glance looks like a bug to me. As far as I can track back through the framework, older .NET versions use an InternalCall to <code>IndexOfString()</code> (I can't figure out to which API call that goes), while from .NET 4 a QCall to <code>InternalFindNLSStringEx()</code> is made, which in turn calls <code>FindNLSStringEx()</code>. The issue (I really can't figure out if this is intended behaviour) indeed occurs when calling <code>FindNLSStringEx</code>: <pre class="prettyprint"><code>LPCWSTR lpStringSource = L"\xAD\x2D"; LPCWSTR lpStringValue = L"\xAD"; int length; int i = FindNLSStringEx( LOCALE_NAME_SYSTEM_DEFAULT, FIND_FROMSTART, lpStringSource, -1, lpStringValue, -1, &length, NULL, NULL, 1); Console::WriteLine(i); i = FindNLSStringEx( LOCALE_NAME_SYSTEM_DEFAULT, FIND_FROMSTART, lpStringSource, -1, lpStringSource, -1, &length, NULL, NULL, 1); Console::WriteLine(i); Console::ReadLine(); </code></pre> Prints 0 and then 1. Note that <code>length</code>, an out parameter indicating the length of the found string, is 0 after the first call and 1 after the second; the soft hyphen is counted as having a length of 0. The workaround is to use <code>text.IndexOf(text, StringComparison.OrdinalIgnoreCase);</code>, as you've noted. This makes a QCall to <code>InternalCompareStringOrdinalIgnoreCase()</code> which in turn calls <code>FindStringOrdinal()</code>, which returns 0 for both cases.

someString.IndexOf(someString) returns 1 instead of 0 under .NET 4

Tags:

string

c#

.net-3.5

.net-4.0

We have recently upgraded all our projects from .NET 3.5 to .NET 4. I have come across a rather strange issue with respect to string.IndexOf().

My code obviously does something slightly different, but in the process of investigating the issue, I found that calling IndexOf() on a string with itself returned 1 instead of 0. In other words:

string text = "\xAD\x2D";          // problem happens with "-dely N.China", too; int index = text.IndexOf(text);    // see update note below.

Gave me an index of 1, instead of 0. A couple of things to note about this problem:

The problems seems related to these hyphens (the first character is the Unicode soft hyphen, the second is a regular hyphen).
I have double checked, this does not happen in .NET 3.5 but does in .NET 4.
Changing the IndexOf() to do an ordinal compare fixes the issue, so for some reason that first character is ignored with the default IndexOf.

Does anyone know why this happens?

EDIT

Sorry guys, made a bit of a stuff up on the original post and got the hidden dash in there twice. I have updated the string, this should return index of 1 instead of 2, as long as you paste it in the correct editor.

Update:

Changed the original problem string to one where every actual character is clearly visible (using escaping). This simplifies the question a bit.

556

asked Jul 13 '12 09:07

knersis

2 Answers

Your string exists of two characters: a soft hyphen (Unicode code point 173) and a hyphen (Unicode code point 45).

Wiki: According to the Unicode standard, a soft hyphen is not displayed if the line is not broken at that point.

When using "\xAD\x2D".IndexOf("\xAD\x2D") in .NET 4, it seems to ignore that you're looking for the soft hyphen, returning a starting index of 1 (the index of \x2D). In .NET 3.5, this returns 0.

More fun, if you run this code (so when only looking for the soft hyphen):

string text = "\xAD\x2D"; string shy = "\xAD"; int i1 = text.IndexOf(shy);

then i1 becomes 0, regardless of the .NET version used. The result of text.IndexOf(text); varies indeed, which at a glance looks like a bug to me.

As far as I can track back through the framework, older .NET versions use an InternalCall to IndexOfString() (I can't figure out to which API call that goes), while from .NET 4 a QCall to InternalFindNLSStringEx() is made, which in turn calls FindNLSStringEx().

The issue (I really can't figure out if this is intended behaviour) indeed occurs when calling FindNLSStringEx:

LPCWSTR lpStringSource = L"\xAD\x2D"; LPCWSTR lpStringValue = L"\xAD";  int length;  int i = FindNLSStringEx(     LOCALE_NAME_SYSTEM_DEFAULT,     FIND_FROMSTART,     lpStringSource,     -1,     lpStringValue,     -1,     &length,     NULL,     NULL,     1);  Console::WriteLine(i);  i = FindNLSStringEx(     LOCALE_NAME_SYSTEM_DEFAULT,     FIND_FROMSTART,     lpStringSource,     -1,     lpStringSource,     -1,     &length,     NULL,     NULL,     1);  Console::WriteLine(i);  Console::ReadLine();

Prints 0 and then 1. Note that length, an out parameter indicating the length of the found string, is 0 after the first call and 1 after the second; the soft hyphen is counted as having a length of 0.

The workaround is to use text.IndexOf(text, StringComparison.OrdinalIgnoreCase);, as you've noted. This makes a QCall to InternalCompareStringOrdinalIgnoreCase() which in turn calls FindStringOrdinal(), which returns 0 for both cases.

166

answered Oct 13 '22 10:10

CodeCaster

It seems be a bug in .NET4, and new changes reverted in .NET4 Beta1 to previous version same as .NET 2.0/3.0/3.5.

What's New in the BCL in .NET 4.0 CTP (MSDN blogs):

String Security Changes in .NET 4

The default partial matching overloads on System.String (StartsWith, EndsWith, IndexOf, and LastIndexOf) have been changed to be culture-agnostic (ordinal) by default.

This change affected the behavior of the String.IndexOf method by changing them to perform an ordinal (byte-for-byte) comparison by default an will be changed to use CultureInfo.InvariantCulture instead of CultureInfo.CurrentCulture.

UPDATE for .NET 4 Beta 1

In order to maintain high compatibility between .NET 4 and previous releases, we have decided to revert this change. The behavior of String's default partial matching overloads and String and Char's ToUpper and ToLower methods now behave the same as they did in .NET 2.0/3.0/3.5. The change back to the original behavior is present in .NET 4 Beta 1.

To fix this, change the string comparison method to an overload that accepts the System.StringComparison enumeration as a parameter, and specify either Ordinal or OrdinalIgnoreCase.

// string contains 'unicode dash' \x2D string text = "\xAD\x2D";   // woks in .NET 2.0/3.0/3.5 and .NET 4 Beta 1 and later // but seems be buggy in .NET 4 because of 'culture-sensitive' comparison         int index = text.IndexOf(text);   // fixed version index = text.IndexOf(text, StringComparison.Ordinal);

answered Oct 13 '22 10:10

Ria

Related questions
                            
                                Validate image from file in C#
                            
                                Operator '??' cannot be applied to operands of type 'T' and 'T'
                            
                                Hide Tab Header on C# TabControl
                            
                                Format a number with commas and decimals in C# (asp.net MVC3)
                            
                                C# Image.Clone Out of Memory Exception
                            
                                How to write log file in c#?
                            
                                How do I get %LocalAppData% in c#?
                            
                                Microsoft Visual Studio 2019: The project file cannot be opened. Unable to locate the .NET SDK
                            
                                How can I remove item from querystring in asp.net using c#?
                            
                                Should C# have multiple inheritance? [closed]
                            
                                Changing the color of the title bar in WinForm
                            
                                UDP multicast group on Windows Phone 8
                            
                                How to resolve NuGet dependency hell
                            
                                ASP.NET MVC Architecture : ViewModel by composition, inheritance or duplication?
                            
                                When or if to Dispose HttpResponseMessage when calling ReadAsStreamAsync?
                            
                                Why is NaN (not a number) only available for doubles?
                            
                                Treat all warnings as errors
                            
                                How to route EVERYTHING other than Web API to /index.html
                            
                                How are denormalized floats handled in C#?
                            
                                Confusion about where to put business logic when using Entity framework

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With