Introduced in Java 8, Locale.lookup()
, based on RFC 4647, allows the user to find the best match for a list of Locale
according a priority list of LocaleRange
. Now I don't understand every corner case for this method. The following exposes one particular case I would like to have an explanation for:
// Create a collection of Locale objects to search
Collection<Locale> locales = new ArrayList<>();
locales.add(Locale.forLanguageTag("en-GB"));
locales.add(Locale.forLanguageTag("en"));
// Express the user's preferences with a Language Priority List
String ranges = "en-US;q=1.0,en-GB;q=1.0";
List<Locale.LanguageRange> languageRanges = Locale.LanguageRange.parse(ranges);
// Find the BEST match, and return just one result
Locale result = Locale.lookup(languageRanges,locales);
System.out.println(result.toString());
This prints en
, where I would have intuitively expected en-GB
.
Note that:
"en-GB;q=1.0,en-US;q=1.0"
(GB and US reversed), this will print en-GB
,"en-US;q=0.9,en-GB;q=1.0"
(GB has a higher priority than US), this will print en-GB
.Could someone explain the rationale behind this behavior?
If you provide language alternatives with the same priority, the list order becomes significant. This becomes apparent when you inspect the parsed list of "en-US;q=1.0,en-GB;q=1.0"
. It contains two entries, representing "en-US;q=1.0"
, followed by "en-GB;q=1.0"
See https://www.ietf.org/rfc/rfc4647.txt
3.4. Lookup
Lookup is used to select the single language tag that best matches the language priority list for a given request. When performing lookup, each language range in the language priority list is considered in turn, according to priority. … The first matching tag found, according to the user's priority, is considered the closest match and is the item returned. For example, if the language range is "de-ch", a lookup operation can produce content with the tags "de" or "de-CH" but never content with the tag "de-CH-1996". If no language tag matches the request, the "default" value is returned.
…
In the lookup scheme, the language range is progressively truncated from the end until a matching language tag is located. …
The last sentence describes what has already said by example in the first paragraph, i.e. a language range of de-CH
might match either de-CH
or de
. This lookup with fallback is performed for each item of the list, stopping at the first one for which a match is found.
In other words, specifying "en-US;q=1.0,en-GB;q=1.0"
is like specifying "en-US,en,en-GB,en"
.
Maybe what you want is filtering, see
3.3. Filtering
Filtering is used to select the set of language tags that matches a given language priority list. …
In filtering, each language range represents the least specific language tag (that is, the language tag with fewest number of subtags) that is an acceptable match.
Thus, given your original list of selectable locales
List<Locale> filtered = Locale.filter(
Locale.LanguageRange.parse("en-US;q=1.0,en-GB;q=1.0"), locales);
System.out.println("filtered: "+filtered);
produces [en_GB]
.
whereas
Collection<Locale> locales = Arrays.asList(Locale.forLanguageTag("en"),
Locale.forLanguageTag("en-GB"), Locale.forLanguageTag("en-US"));
List<Locale> filtered = Locale.filter(
Locale.LanguageRange.parse("en-US;q=1.0,en-GB;q=1.0"), locales);
System.out.println("filtered: "+filtered);
produces [en_US, en_GB]
(note the prioritized order and the absence of an en
fallback). So depending on the context you may attempt to select from a filtered list first and only resort to lookup when the filtered list is empty.
At least, the behavior of Java’s implementation is in line with the specification. As you already noted, changing the priority or changing the order (when the priority is equal), changes the result according to the specification.
The steps to get this result are as follows:
en-US
match en-GB
? → noen-US
match en
? → noen-US
to en
en
match en-GB
? → noen
match en
? → yes, matching tag found, return itIt works according to the RFC 4647:
3.4. Lookup
...
The first matching tag found, according to the user's priority, is considered the closest match and is the item returned.
...
In the lookup scheme, the language range is progressively truncated from the end until a matching language tag is located.
The core of the lookup algorithm is implemented in sun.util.locale.LocaleMatcher#lookupTag
. You can check out the source code
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With