I wish to write an HTTP module that converts URLs to lowercase. My first attempt ignored international character sets and works great:
// Convert URL virtual path to lowercase
string lowercase = context.Request.FilePath.ToLowerInvariant();
// If anything changed then issue 301 Permanent Redirect
if (!lowercase.Equals(context.Request.FilePath, StringComparison.Ordinal))
{
context.Response.RedirectPermanent(...lowercase URL...);
}
But what about cultures other than en-US? I referred to the Turkey Test to come up with a test URL:
http://example.com/Iıİi
This little insidious gem destroys any notion that case conversion in URLs is simple! Its lowercase and uppercase versions, respectively, are:
http://example.com/ııii
http://example.com/IIİİ
For case conversion to work with Turkish URLs, I first had to set the current culture of ASP.NET to Turkish:
<system.web>
<globalization culture="tr-TR" />
</system.web>
Next, I had to change my code to use the current culture for the case conversion:
// Convert URL virtual path to lowercase
string lowercase = context.Request.FilePath.ToLower(CultureInfo.CurrentCulture);
// If anything changed then issue 301 Permanent Redirect
if (!lowercase.Equals(context.Request.FilePath, StringComparison.Ordinal))
{
context.Response.RedirectPermanent(...);
}
But wait! Will StringComparison.Ordinal
still work? Or should I use StringComparison.CurrentCulture
? I'm really not certain of either!
Even if the above works, using the current culture for case conversions breaks the NTFS file system! Let's say I have a static file with the name Iıİi.html
:
http://example.com/Iıİi.html
Even though the Windows file system is case-insensitive it does not use language culture. Converting the above URL to lowercase results in a 404 Not Found because the file system doesn't consider the two names as equal:
http://example.com/ııii.html
The MSDN article, Best Practices for Using Strings in the .NET Framework, has a note (about halfway through the article):
Note: The string behavior of the file system, registry keys and values, and environment variables is best represented by StringComparison.OrdinalIgnoreCase.
Huh? Best represented??? Is that the best we can do in C#? So just what is the correct case conversion to match the file system? Who knows?!!? About all we can say is that string comparisons using the above will probably work MOST of the time.
StringComparison.OrdinalIgnoreCase
. And please note there is no string.ToLowerOrdinal()
method so it's very difficult to know exactly what case conversion equates to the OrdinalIgnoreCase
string comparison. Using string.ToLowerInvariant()
is probably the best bet, yet it breaks language culture.string.ToLower(CultureInfo.CurrentCulture)
, but it breaks file system matching and it is somewhat unclear what edge cases exist that may break this strategy.Thus, it appears case conversion first requires detection as to whether a URL is static or dynamic before choosing one of two conversion methods. For static URLs there is uncertainty how to change case without breaking the Windows file system. For dynamic URLs it is questionable if case conversion using culture will similarly break the URL.
Whew! Anyone have a solution to this mess? Or should I just close my eyes and pretend everything is ASCII?
I would challenge the premise here that there is any utility whatsoever in attempting to auto-convert URLs to lowercase.
Whether a full URL is case-sensitive or not depends entirely on the web server, web application framework, and underlying file system.
You're only guaranteed case-insensitivity in the scheme (http://, etc.) and hostname portions of the URL. And remember that not all URL schemes (file
and news
, for example) even include a hostname.
Everything else can be case-sensitive to the server, including paths (/
), filenames, queries (?
), fragments (#
), and authority info (usernames/passwords before the @
in mailto
, http
, ftp
, and some other schemes).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With