Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Inconsistent locale case sensitivity in Java compared to Bash

When I try to format Date in Polish, I get consistent formatting:

new SimpleDateFormat("EEEE", Locale.forLanguageTag("pl-PL")).format(new Date())

results in

wtorek

The same result in bash:

LC_ALL=pl_PL
$ date +"%A %b %d"
wtorek maj 22

Notice both lowercase w in wtorek.

When I do that for Czech language, the result is incosistent:

new SimpleDateFormat("EEEE", Locale.forLanguageTag("cs-CZ").format(new Date())

results in

Pondělí

When run in bash:

$ LC_ALL=cs_CZ 
$ date +"%A %b %d"
pondělí kvě 21

Notice the uppercase P in the Java result. How does this happen? Does it mean SimpleDateFormat doesn't use standard Locales installed on the system?

like image 751
Vojtěch Avatar asked Mar 06 '23 03:03

Vojtěch


2 Answers

Java picks up its locale data (including the names of the days of the week in different locales) from up to four sources. And yes, the host operating system is one of them, but is not the default. To quote the LocaleServiceProvider documentation:

Java Runtime Environment provides the following four locale providers:

  • "CLDR": A provider based on Unicode Consortium's CLDR Project.
  • "COMPAT": represents the locale sensitive services that is compatible with the prior JDK releases up to JDK8 (same as JDK8's "JRE").
  • "SPI": represents the locale sensitive services implementing the subclasses of this LocaleServiceProvider class.
  • "HOST": A provider that reflects the user's custom settings in the underlying operating system. This provider may not be available, depending on the Java Runtime Environment implementation.
  • "JRE": represents a synonym to "COMPAT". This name is deprecated and will be removed in the future release of JDK.

Up to Java 8 JRE was the default. I am using java.time because no one should take the trouble with the outdated SimpleDateFormat:

    DateTimeFormatter dayOfWeekFormatter 
            = DateTimeFormatter.ofPattern("EEEE", Locale.forLanguageTag("cs-CZ"));
    LocalDate date = LocalDate.now(ZoneId.of("Europe/Prague"));
    System.out.println(date.format(dayOfWeekFormatter));

Output running on my Oracle jdk1.8.0_131 agrees with your result (uppercase S):

Středa

We may control the locale data used through a system property. To prefer CLDR, for example, either run the program with VM command line option -Djava.locale.providers=CLDR,COMPAT or insert the following line at the start of the program:

    System.setProperty("java.locale.providers", "CLDR,COMPAT");

středa

Now we get the lowercase s.

My shell on macOS Sierra 10.12.6 just gives Wednesday, so apparently my OS hasn’t got Czech locale data (which sounds weird; probably the issue is somewhere else), and thus is not an option for me. You may try putting HOST in front of the above locale provider string and see if what you get agrees with your bash.

In Java 9 and later CLDR is the default. So running the same snippet on jdk9.0.4 without setting any system property also gives day of week in lowercase:

středa

like image 110
Ole V.V. Avatar answered Mar 11 '23 15:03

Ole V.V.


Does it mean SimpleDateFormat doesn't use standard Locales installed on the system

Yes, system locales are not used and available Locales depend on JVM/JRE vendor. For example check lib\ext\localedata.jar in JRE directory. After extracting you can find file: sun\text\resources\cs\FormatData_cs_CZ.class that decompiles into:

public class FormatData_cs extends ParallelListResourceBundle
{
    @Override
    protected final Object[][] getContents() {
        return new Object[][] { { "MonthNames", 
        { "ledna", "\u00fanora", "b\u0159ezna", "dubna", "kv\u011btna", "\u010dervna", "\u010dervence", "srpna", "z\u00e1\u0159\u00ed", "\u0159\u00edjna", "listopadu", "prosince", "" } }, 
        { "standalone.MonthNames", { "leden", "\u00fanor", "b\u0159ezen", "duben", "kv\u011bten", "\u010derven", "\u010dervenec", "srpen", "z\u00e1\u0159\u00ed", "\u0159\u00edjen", "listopad", "prosinec", "" } }, 
        { "MonthAbbreviations", { "Led", "\u00dano", "B\u0159e", "Dub", "Kv\u011b", "\u010cer", "\u010cvc", "Srp", "Z\u00e1\u0159", "\u0158\u00edj", "Lis", "Pro", "" } }, 
        { "standalone.MonthAbbreviations", { "I", "II", "III", "IV", "V", "VI", "VII", "VIII", "IX", "X", "XI", "XII", "" } }, 
        { "MonthNarrows", { "l", "\u00fa", "b", "d", "k", "\u010d", "\u010d", "s", "z", "\u0159", "l", "p", "" } },
        { "standalone.MonthNarrows", { "l", "\u00fa", "b", "d", "k", "\u010d", "\u010d", "s", "z", "\u0159", "l", "p", "" } }, 
        { "DayNames", { "Ned\u011ble", "Pond\u011bl\u00ed", "\u00dater\u00fd", "St\u0159eda", "\u010ctvrtek", "P\u00e1tek", "Sobota" } }, 
        { "standalone.DayNames", { "ned\u011ble", "pond\u011bl\u00ed", "\u00fater\u00fd", "st\u0159eda", "\u010dtvrtek", "p\u00e1tek", "sobota" } }, 
        { "DayAbbreviations", { "Ne", "Po", "\u00dat", "St", "\u010ct", "P\u00e1", "So" } }, 
        { "standalone.DayAbbreviations", { "ne", "po", "\u00fat", "st", "\u010dt", "p\u00e1", "so" } }, 
        { "DayNarrows", { "N", "P", "\u00da", "S", "\u010c", "P", "S" } }, 
        { "standalone.DayNarrows", { "N", "P", "\u00da", "S", "\u010c", "P", "S" } },
        { "AmPmMarkers", { "dop.", "odp." } }, 
        { "Eras", { "p\u0159.Kr.", "po Kr." } }, 
        { "short.Eras", { "p\u0159. n. l.", "n. l." } }, 
        { "narrow.Eras", { "p\u0159.n.l.", "n. l." } }, 
        { "NumberElements", { ",", " ", ";", "%", "0", "#", "-", "E", "\u2030", "\u221e", "\ufffd" } }, 
        { "TimePatterns", { "H:mm:ss z", "H:mm:ss z", "H:mm:ss", "H:mm" } }, 
        { "DatePatterns", { "EEEE, d. MMMM yyyy", "d. MMMM yyyy", "d.M.yyyy", "d.M.yy" } }, 
        { "DateTimePatterns", { "{1} {0}" } }, 
        { "DateTimePatternChars", "GuMtkHmsSEDFwWahKzZ" } };
    }
}

and contains "Pond\u011bl\u00ed" in "DayNames".

like image 21
user158037 Avatar answered Mar 11 '23 14:03

user158037