Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Invalid CultureInfo no longer throws CultureNotFoundException

Creating a culture info with es-CA, which obviously is incorrect should throw an exception, but no longer does.

This previously threw a CultureNotFoundException: new CultureInfo("es-CA"). It now seem to fall back to es with an "Unknown Locale". Although, doing something like xy-ZZ also works, which it's rather odd?

Why does this no longer throw an exception? Was this changed in a recent version of .NET?

Update 1

The documentation mentions the following:

if the operating system does not support that culture, and if name is not the name of a supplementary or replacement culture, the method throws a CultureNotFoundException exception.

Testing this on Windows 7, it throws CultureNotFoundException but on Windows 10 it does not throw the exception.

like image 513
Filip Ekberg Avatar asked Jan 28 '16 23:01

Filip Ekberg


2 Answers

Now add an answer based on the comments.

Due to the changes in Windows design, now there is no more "invalid culture" if the name matches BCP-47, so instead of throwing an exception .NET Framework/.NET Core accept the new culture.

You can refer to the GitHub discussion, and the below quote,

As the framework depends on the OS for getting the cultures, the OS's is moving to the model any BCP-47 culture name become valid even the OS is not supporting it. for instance, Windows 10 is supporting any well formed culture name even the OS don't have real data for such culture. for example, if trying to create a culture "xx-XXXX" in Windows 10 it will succeed. considering that, it doesn't make sense to have culture enumeration as any set we return doesn't mean these are the only supported cultures. looking at your issue, you workaround is acceptable. if really want to have more better way, we can consider something like CultureInfo.TryGetCulture() but as I said previously moving forward almost any culture will be valid.

like image 150
Lex Li Avatar answered Nov 02 '22 13:11

Lex Li


I'm posting this as an answer to two questions that are almost asked in the OP's question-post:

After Windows 10's breaking changes to support BCP-47...

  1. How can I tell if a given CultureInfo object is a "real" Culture, or a fake/contrived/private CultureInfo created from-scratch in code?
  2. How can I tell if a user-supplied String cultureName value is valid for new CultureInfo(String) and that the runtime environment (.NET and/or OS) has meaningful culture data for that name (more than just the DisplayName)?

Question 1: Validating a given CultureInfo instance:

As per the documentation for CultureTypes, prior to Windows 10, if the CultureInfo.CultureTypes property has the flag UserCustomCulture then it was a custom culture. Since Windows 10, the UserCustomCulture flag indicates custom cultures, but also "system cultures that are not backed by a complete set of cultural data and that do not have unique local identifiers".

So if you want to validate a CultureInfo on Windows 10 identically as though it were on Windows 8.1 or earlier, just check that:

  1. The CultureInfo.CultureTypes does not have the CultureTypes.UserCustomCulture flag set.
  2. If it does have UserCustomCulture, ensure CultureInfo.ThreeLetterWindowsLanguageName != "ZZZ"
    • The "ZZZ" magic-string seems to be within Windows itself, and it only appears on Windows 10 or later.
    • .NET Core's own test-cases includes a test for it, but never explains it beyond the comment ".GetThreeLetterWindowsLanguageName(cultureName) ?? "ZZZ" /* default lang name */;".

So this works for me:

public static Boolean ValidateCultureInfoWithPreWindows10Logic( CultureInfo ci )
{
    Boolean hasUserCustom = ( ci.CultureTypes & CultureTypes.UserCustomCulture ) == CultureTypes.UserCustomCulture;
    if( hasUserCustom )
    {
        if( ci.ThreeLetterWindowsLanguageName == "ZZZ" )
        {
            // Windows doesn't have a name for this language - this CultureInfo is invalid under Windows 8.1 or earlier.
            return false;
        }
        else
        {
            // The `UserCustomCulture` flag means *some* CultureData is missing, but not enough to make them useless.
            // On both Win8 and Win10, the same 8 Neutral Cultures match here: [ jv, jv-Latn, mg, nqo, sn, sn-Latn, zgh, zgh-Tfng ]
            return true;
        }
    }
    else
    {
        // The `UserCustomCulture` flag is not set, which means 100% of the CultureInfo's CultureData exists in the system.
        return true;
    }
}

Question 2: Validating a given String cultureName:

  • Remember that a culture-name is hierarchical, with 3 main levels:

    • Invariant = CultureInfo.InvariantCulture.
    • Neutral = a language-name without a region, e.g. en, fr, etc.
    • Specific = a language-name for a specific region, e.g. en-US, en-GB, fr-CA, fr-FR.
    • Additionally there are some names for sub-specific-regions, e.g. ca-ES-valencia. I've never encountered more than 3 levels of depth, though.
  • Validating a cultureName depends on what your business/domain/application requirements are:

    • If you want to require the name to match an OS-known language and region, then it's sufficient to do ValidateCultureInfoWithPreWindows10Logic( new CultureInfo( cultureName ) ) (after validating that the format of cultureName complies with BCP-47, of course).
    • If you want to require the name to match an OS-known language, but allow any OS-known region to be specified, even if the OS doesn't have Specific CultureData for it (e.g. when using CultureInfo.CreateSpecificCulture("en-FR")) then checking ci.ThreeLetterWindowsLanguageName != "ZZZ" is sufficient.
    • If you want to require the name to match an OS-known language, but allow any region to be specified, even if the OS doesn't even know about the region, then it's complicated...

Here's a table showing results of new CultureInfo vs CultureInfo.CreateSpecificCulture on Windows 10 vs. Server 2012 R2, and .NET 4.8 vs .NET 6:

Expression Windows 10 + .NET 6 Windows 10 + .NET 4.8 Windows 2012 R2 + .NET 4.8
CultureInfo ci1 = new CultureInfo("en-FR")
ci1.DisplayName "English (France)" "Unknown Locale (en-FR)" CultureNotFoundException
ci1.ThreeLetterWindowsLanguageName "ZZZ" "ENU" CultureNotFoundException
ci1.CultureTypes SpecificCultures | UserCustomCulture | InstalledWin32Cultures SpecificCultures | UserCustomCulture CultureNotFoundException
CultureInfo spec = CultureInfo.CreateSpecificCulture("en-FR")
spec.DisplayName "English (France)" "Unknown Locale (en-FR)" "English (United States)"
spec.ThreeLetterWindowsLanguageName "ZZZ" "ENU" "ENU"
spec.CultureTypes SpecificCultures | UserCustomCulture | InstalledWin32Cultures SpecificCultures | UserCustomCulture SpecificCultures | InstalledWin32Cultures | FrameworkCultures

So far, so very inconsistent.

If you want to allow arbitrary language names, even if the OS doesn't know about the language (let alone the region) - be it Neutral or Specific CultureInfo... uhh... I'll have to answer that question later.


Other tips: How to reliably validate cultureName when you want it restricted to OS-supported cultures (Neutral and/or Specific):

A quick-fix is to have this:

public static class KnownCultureInfoNameValidator
{

private static readonly HashSet<String> _preWindows10BuiltInCustomNames = new String[]
{
     "jv", "jv-Latn", "mg", "nqo", "sn", "sn-Latn", "zgh", "zgh-Tfng"
}
    .ToHashSet();

private static readonly HashSet<String> _knownLanguages = BuildHashSet( CultureInfo.GetCultures( CultureTypes.NeutralCultures ) );

private static readonly HashSet<String> _knownSpecific = BuildHashSet( CultureInfo.GetCultures( CultureTypes.SpecificCultures ) );

private static HashSet<String> BuildHashSet( IEnumerable<CultureInfo> cultures )
{
    return cultures
        .Where( ci => ci.ThreeLetterWindowsLanguageName != "ZZZ" )
        .Where( ci => ci.LCID != 127 ) // Exclude InvariantCulture
#if LIKE_PRE_WINDOWS_10
        .Where( ci =>
            _preWindows10BuiltInCustomNames.Contains( ci.Name )
            ||
            ( ci.CultureTypes & CultureTypes.UserCustomCulture ) == 0
        )
#endif
        .Select( ci => ci.Name )
        .ToHashSet();
}

// Only returns true if `cultureName` is an OS-known culture with sufficient OS-provided culture data. This method will return false for partially-known cultuires.
public static Boolean ValidateCultureName( String cultureName, Boolean allowNeutral, Boolean allowSpecific )
{
     if( allowNeutral && _knownLanguages.Contains( cultureName ) ) return true;

     if( allowSpecific && _knownSpecific.Contains( cultureName ) ) return true;

     return false;
}

}

Research:

  • I've been pouring over the internals of .NET's CultureInfo and (internal) CultureData, here's my findings:

    • When a new CultureData instance is created using any of the String name constructors (including internally), a new empty CultureData object is created, and then its sRealName and bUseOverrides fields set with the earlier cultureName and useUserOverride values (respectively) from the CultureInfo's constructor call-site.

    • This CultureData is then passed into a function nativeInitCultureData that's internal to the .NET CLR runtime (i.e. MethodImplOptions.InternalCall).

      • Notice how in the .NET Framework 4.8 code (i.e. the referencesource.microsoft.com link above) code basically assumes that sWindowsName is the official name for a CultureInfo, so provided it's non-null then surely it must be good (e.g. in DoGetLocaleInfo):

        int DoGetLocaleInfoInt(uint lctype)
        {
           // Ask OS for data, note that we presume it returns success, so we have to know that
           // sWindowsName is valid before calling.
           Contract.Assert(this.sWindowsName != null, "[CultureData.DoGetLocaleInfoInt] Expected this.sWindowsName to be populated by COMNlsInfo::nativeInitCultureData already");
           int result = CultureInfo.nativeGetLocaleInfoExInt(this.sWindowsName, lctype);
        
            return result;
        }
        
    • So we can't use CultureInfo.LCID because that's now 4096 == 0x1000 for system-provided, but partial, CultureInfo objects - just as it is for "fake" CultureInfo objects.

    • We can't use CultureInfo.CompareInfo.LCID either, because there's still a lot of "real" (but also incomplete) system-provided cultures with 0x1000 there, such as

    • So because Windows 10 now always returns a non-null string value for sWindowsName whenever any BCP-47-compliant input cultureName is used, that's why there's no instant way to detect "fake" vs. "real" CultureInfo objects in .NET.

  • So that means there's now only 2 ways to check if a given CultureInfo is "fake" vs. "real":

    • Option 1: During program start-up, build your own private immutable HashSet<String> of CultureInfo names from CultureInfo.GetCultures and use that to validate, see KnownCultureInfoNameValidator above.
    • Option 2: Check if CultureInfo.EnglishName starts with "Unknown Locale" and/or CultureInfo.Parent.EnglishName starts with "Unknown Language".
      • While it always feels wrong to do compare magic strings, especially human-readable strings, at least EnglishName is always English and won't break if a user is running a non-English build of Windows, unlike with Exception.Message, for example.
      • There just doesn't seem to be any other documented hints or values that hint if a CultureInfo's data is really system-provided or not. None of the other non-String members of CultureData seem to go with it.
      • Be sure to use a String.StartsWith check, not String.Equals, due to the parenthesized CultureName at the end.
      • I did initially think that ThreeLetterWindowsLanguageName == "ZZZ" might work, but on my computer the CultureInfo.GetCultures method returns 114 neutral cultures and 326 specific cultures with "ZZZ" values for that property, erk.
like image 2
Dai Avatar answered Nov 02 '22 14:11

Dai