Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Could string comparisons really differ based on culture when the string is guaranteed not to change?

I'm reading encrypted credentials/connection strings from a config file. Resharper tells me, "String.IndexOf(string) is culture-specific here" on this line:

if (line.Contains("host=")) {
    _host = line.Substring(line.IndexOf(
        "host=") + "host=".Length, line.Length - "host=".Length);

...and so wants to change it to:

if (line.Contains("host=")) {
    _host = line.Substring(line.IndexOf("host=", System.StringComparison.Ordinal) + "host=".Length, line.Length -   "host=".Length);

The value I'm reading will always be "host=" regardless of where the app may be deployed. Is it really sensible to add this "System.StringComparison.Ordinal" bit?

More importantly, could it hurt anything (to use it)?

like image 371
B. Clay Shannon-B. Crow Raven Avatar asked Jun 07 '12 23:06

B. Clay Shannon-B. Crow Raven


3 Answers

Absolutely. Per MSDN (http://msdn.microsoft.com/en-us/library/d93tkzah.aspx),

This method performs a word (case-sensitive and culture-sensitive) search using the current culture.

So you may get different results if you run it under a different culture (via regional and language settings in Control Panel).

In this particular case, you probably won't have a problem, but throw an i in the search string and run it in Turkey and it will probably ruin your day.

See MSDN: http://msdn.microsoft.com/en-us/library/ms973919.aspx

These new recommendations and APIs exist to alleviate misguided assumptions about the behavior of default string APIs. The canonical example of bugs emerging where non-linguistic string data is interpreted linguistically is the "Turkish-I" problem.

For nearly all Latin alphabets, including U.S. English, the character i (\u0069) is the lowercase version of the character I (\u0049). This casing rule quickly becomes the default for someone programming in such a culture. However, in Turkish ("tr-TR"), there exists a capital "i with a dot," character (\u0130), which is the capital version of i. Similarly, in Turkish, there is a lowercase "i without a dot," or (\u0131), which capitalizes to I. This behavior occurs in the Azeri culture ("az") as well.

Therefore, assumptions normally made about capitalizing i or lowercasing I are not valid among all cultures. If the default overloads for string comparison routines are used, they will be subject to variance between cultures. For non-linguistic data, as in the following example, this can produce undesired results:

    Thread.CurrentThread.CurrentCulture = new CultureInfo("en-US")
Console.WriteLine("Culture = {0}",
   Thread.CurrentThread.CurrentCulture.DisplayName);
Console.WriteLine("(file == FILE) = {0}", 
   (String.Compare("file", "FILE", true) == 0));

Thread.CurrentThread.CurrentCulture = new CultureInfo("tr-TR");
Console.WriteLine("Culture = {0}",
   Thread.CurrentThread.CurrentCulture.DisplayName);
Console.WriteLine("(file == FILE) = {0}", 
   (String.Compare("file", "FILE", true) == 0));

Because of the difference of the comparison of I, results of the comparisons change when the thread culture is changed. This is the output:

Culture = English (United States)
(file == FILE) = True
Culture = Turkish (Turkey)
(file == FILE) = False

Here is an example without case:

var s1 = "é"; //é as one character (ALT+0233)
var s2 = "é"; //'e', plus combining acute accent U+301 (two characters)

Console.WriteLine(s1.IndexOf(s2, StringComparison.Ordinal)); //-1
Console.WriteLine(s1.IndexOf(s2, StringComparison.InvariantCulture)); //0
Console.WriteLine(s1.IndexOf(s2, StringComparison.CurrentCulture)); //0
like image 71
Mark Sowul Avatar answered Nov 09 '22 14:11

Mark Sowul


CA1309: UseOrdinalStringComparison

It doesn't hurt to not use it, but "by explicitly setting the parameter to either the StringComparison.Ordinal or StringComparison.OrdinalIgnoreCase, your code often gains speed, increases correctness, and becomes more reliable.".


What exactly is Ordinal, and why does it matter to your case?

An operation that uses ordinal sort rules performs a comparison based on the numeric value (Unicode code point) of each Char in the string. An ordinal comparison is fast but culture-insensitive. When you use ordinal sort rules to sort strings that start with Unicode characters (U+), the string U+xxxx comes before the string U+yyyy if the value of xxxx is numerically less than yyyy.

And, as you stated... the string value you are reading in is not culture sensitive, so it makes sense to use an Ordinal comparison as opposed to a Word comparison. Just remember, Ordinal means "this isn't culture sensitive".

like image 43
myermian Avatar answered Nov 09 '22 14:11

myermian


To answer your specific question: No, but a static analysis tool is not going to be able to realize that your input value will never have locale-specific information in it.

like image 7
500 - Internal Server Error Avatar answered Nov 09 '22 12:11

500 - Internal Server Error