In C#, you can compare two strings with String.Equals
and supply a StringComparison
.
I've recently been looking to update my archaic method of comparing ToLower()
because I read that it doesn't work on all languages/cultures.
From what I can tell, the comparison types are used to determine order when confronted with a list containing aé
and ae
as to which should appear first (some cultures order things differently).
With string.Equals
, ordering is not important. Therefore is it safe to assume that many of the options are irrelevent, and only [Ordinal] and [Ordinal]IgnoreCase are important?
The MSDN article for String.Equals says
The comparisonType parameter indicates whether the comparison should use the current or invariant culture, honor or ignore the case of the two strings being compared, or use word or ordinal sort rules.
string.Equals(myString, theirString, StringComparison.OrdinalIgnoreCase)
I'd also be interested to know how the sort method works internally, does it use String.Compare
to work out the relative positioning of two strings?
The StringComparison enumeration is used to specify whether a string comparison should use the current culture or the invariant culture, word or ordinal sort rules, and be case-sensitive or case-insensitive. When you call a string comparison method such as String. Compare, String. Equals, or String.
In order to compare two strings, we can use String's strcmp() function. The strcmp() function is a C library function used to compare two strings in a lexicographical manner. The function returns 0 if both the strings are equal or the same.
Ordinal comparisons are string comparisons in which each byte of each string is compared without linguistic interpretation; for example, "windows" does not match "Windows".
Case insensitive comparisons are culture dependent. For example using Turkish culture, i
is not lowercase for I
. With that culture I
is paired with ı
, and İ
is paired with i
. See Dotted and dotless I on Wikipedia.
There are a number of weird effects related to culture sensitive string operations. For example "KonNy".StartsWith("Kon")
can return false
.
So I recommend switching to culture insensitive operations even for seemingly harmless operations.
And even with culture insensitive operations there is plenty of unintuitive behavior in unicode, such as multiple representations of the same glyph, different codepoints that look identical, zero-width characters that are ignored by some operations, but observed by others,...
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With