Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Normalization of Strings With String.ToUpperInvariant()

I am currently storing normalized versions of strings in my SQL Server database in lower case. For example, in my Users table, I have a UserName and a LoweredUserName field. Depending on the context, I either use T-SQL's LOWER() function or C#'s String.ToLower() method to generate the lower case version of the user name to fill the LoweredUserName field. According to Microsoft's guidelines and Visual Studio's code analysis rule CA1308, I should be using C#'s String.ToUpperInvariant() instead of ToLower(). According to Microsoft, this is both a performance and globalization issue: converting to upper case is safe, while converting to lower case can cause a loss of information (for example, the Turkish 'I' problem).

If I move to using ToUpperInvariant for string normalization, I will have to change my database schema as well, since my schema is based on Microsoft's ASP.NET Membership framework (see this related question), which normalizes strings to lower case.

Isn't Microsoft contradicting itself by telling us to use upper case normalization in C#, while it's own code in the Membership tables and procedures is using lower case normalization? Should I switch everything to upper case normalization, or just continue using lower case normalization?

like image 701
Kevin Albrecht Avatar asked Apr 21 '09 17:04

Kevin Albrecht


People also ask

What is normalization of string?

The string. normalize() is an inbuilt method in javascript which is used to return a Unicode normalisation form of a given input string. If the given input is not a string, then at first it will be converted into a string then this method will work.

Is ToUpper faster than Tolower?

The other three are mostly the same. But in general, ToLowerInvariant is fastest, then ToUpper and then ToUpperInvariant .

What is normalize in C#?

The C# Normalize() method is used to get a new string whose textual value is same as this string, but whose binary representation is in Unicode normalization form.

What is to upper invariant?

ToUpperInvariant Method is used to get a copy of this String object converted to uppercase using the casing rules of the invariant culture. Here “invariant culture” represents a culture that is culture-insensitive. Syntax: public string ToUpperInvariant ();


2 Answers

According to CA1308, the reason to do this is that some characters cannot be roundtrip converted from upper to lower case. The important thing is that you always move in one direction, so if your standard is to always move to lower case then there is no reason to change it.

like image 112
JoshBerke Avatar answered Oct 12 '22 13:10

JoshBerke


To answer your first question, yes Microsoft is a bit inconsistent. To answer your second question, no do not switch anything until you have confirmed that this is causing a bottleneck in your application.

Think how much forward progress you can make on you project instead of wasting time switching everything. Your development time is much more valuable than the savings you would get from such a change.

Remember:

Premature optimization is the root of all evil (or at least most of it) in programming. - Donald Knuth

like image 28
Andrew Hare Avatar answered Oct 12 '22 13:10

Andrew Hare