Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

ToUpperInvariant() – is MSDN wrong on its recommendation?

In Best Practices for Using Strings in the .NET Framework, StringComparison OrdinalIgnoreCase is recommended for case-insensitive file paths. (Let's call it Statement A.)

I can agree with that, because I can create two files in the same directory:

é.txt
é.txt

Their filenames are not the same, second one is composed from e and modifier, so it actually has two letters. (You can try yourself using copy-paste.)

If there was Invariant culture comparison (and not ordinal comparison) in effect, NTFS wouldn't allow these files, because in the same article they explain, that in invariant culture a + ̊ = å

But in article on String.ToUpperInvariant() there is different recommendation: (Statement B.)

If you need the lowercase or uppercase version of an operating system identifier, such as a file name, named pipe, or registry key, use the ToLowerInvariant or ToUpperInvariant methods.

I need to create file path collection (in fact HashSet) to detect duplicates. So if I will obey statement B when creating the map, I could end with false positives, because abovementioned filenames é.txt and é.txt will be considered as one. Am I understanding it correctly that statement B found in MSDN is misleading? Or am I missing something?

I'm about to build a library, preferably without known bugs from start, so I simply don't want to neglect this.

Update:

Statement B seems to have one more issue: ToLowerInvariant() cannot be actually used. Reason (I quote Best practices article): DO: Use ToUpperInvariant rather than ToLowerInvariant when normalizing strings for comparison. Actual reason: There is a small range of characters that do not roundtrip, and going to lowercase will make these characters unavailable. (source)

like image 412
miroxlav Avatar asked Sep 23 '15 13:09

miroxlav


People also ask

What is ToUpperInvariant?

ToUpperInvariant Method is used to get a copy of this String object converted to uppercase using the casing rules of the invariant culture. Here “invariant culture” represents a culture that is culture-insensitive. Syntax: public string ToUpperInvariant ();

Is ToUpper faster than Tolower?

The other three are mostly the same. But in general, ToLowerInvariant is fastest, then ToUpper and then ToUpperInvariant .

Why would it be useful to include the ToUpper method in a comparison?

The ToUpper method is often used to convert a string to uppercase so that it can be used in a case-insensitive comparison. A better method to perform case-insensitive comparison is to call a string comparison method that has a StringComparison parameter whose value you set to StringComparison.


1 Answers

Neither uppercasing nor lowercasing is correct when you want to compare strings for equality case-insensitively. In both variants there are characters that mess this up.

The correct way to compare strings case-insensitively is to use one of the insensitive StringComparison options (you know that).

The right way to use a data structure case-insensitively is to use one of StringComparer.*IgnoreCase. For example:

new HashSet<string>(StringComparer.InvariantCultureIgnoreCase)

Do not uppercase strings before adding them to a data structure. I would fail that in any code review.

If you need the lowercase or uppercase version of an operating system identifier

You do not need such as thing. This statement does not apply to your case.

like image 139
usr Avatar answered Sep 21 '22 07:09

usr