Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Parsing french dates in Java [duplicate]

Tags:

java

I am given the following date string

10 juil 2014

Looking up the name of the months of the year in French, I see that juil is an abbreviation for juillet, which refers to July in English.

I try to parse it using SimpleDateFormat with French locale:

System.out.println(new SimpleDateFormat("dd MMM yyyy", Locale.FRENCH).parse("11 juil 2014"));

But it throws an exception

java.text.ParseException: Unparseable date: "11 juil 2014"
    at java.text.DateFormat.parse(DateFormat.java:357)

I then try adding a period right after the month name

System.out.println(new SimpleDateFormat("dd MMM yyyy", Locale.FRENCH).parse("11 juil. 2014"));

And now I get the following output

Fri Jul 11 00:00:00 EDT 2014

So it looks like I need a period, but then when I try to parse a March date (mars), if you add the period, it is not recognized.

How should I parse french dates? I can do it in two passes: first with a period, and then without a period, and hope that one of them will do the trick, but is there a better way?

like image 279
MxLDevs Avatar asked Jul 28 '14 18:07

MxLDevs


1 Answers

In French, abbreviated month names have a period.

See this page at Yale University Library, Abbreviations of the Names of the Months. Lists a few dozen languages.

“mars” is the full name for March (four letters). That name is so short as to not require abbreviating. No abbreviation, so no period. Same for “mai” (May), “juin” (June), and août (August).

Also, as you may have noticed, the first letter is lowercase in French but uppercase in English.

Joda-Time

I tried this in Joda-Time 2.4 in Java 8 on Mac OS X Mountain Lion. [Jump down for java.time, Joda-Time’s replacement]

LocalDate localDate = DateTimeFormat.forPattern( "dd MMM yyyy" ).withLocale( java.util.Locale.FRENCH ).parseLocalDate( "10 juil 2014" );

Same Problem: Missing Period

Both juillet and juil. successfully parse as French, but juil fails and throws an exception. The month abbreviation is expected to have a period terminator.

Workaround: Insert Period

Let's use substring and lastIndexOf to tear apart the string, add a period, and rebuild the string.

Test if the string contains: " janv ", " févr ", " avr ", " juil ", " sept ", " oct ", " nov ", " déc ". Note the use of spaces of both sides in case you get a string with the full month name rather than abbreviation.

String inputRaw = "10 juil 2014";
int indexOfSecondSpace = inputRaw.lastIndexOf( " " );
String input = inputRaw.substring( 0, indexOfSecondSpace ) + "." + inputRaw.substring( indexOfSecondSpace );
DateTimeFormatter formatter = DateTimeFormat.forPattern( "dd MMM yyyy" ).withLocale( java.util.Locale.FRENCH );
LocalDate localDate = formatter.parseLocalDate( input );

System.out.println( inputRaw + " → " + input + " → " + localDate );

When run.

10 juil 2014 → 10 juil. 2014 → 2014-07-10

Or call replace to do a replacement of:

  • " janv " → " janv. "
  • " févr " → " févr. "
  • " avr " → " avr. "
  • " juil " → " juil. "
  • " sept " → " sept. "
  • " oct " → " oct. "
  • " nov " → " nov. "
  • " déc " → " déc. "

Sanity-Check

In the real world, I would add some sanity-checks to ensure the input matches our expectations such as having two spaces in middle and none on the beginning or end.

java.time

Java 8 and later comes with the java.time framework built-in. These new classes supplant the old java.util.Date/.Calendar and related classes that have proven to be poorly designed, confusing, and troublesome. The new java.time classes are inspired by Joda-Time, defined by JSR 310, extended by the ThreeTen-Extra project, explained in the Oracle Tutorial, and backported to Java 6 & 7 as well as backported to Android.

The java.time classes include the handy Month enum. The getDisplayName generates localized name of month.

Similarly the DateTimeFormatter class also generates localized text. Call the ofLocalized… methods.

System.out.println ( "US | Québec | France" );
for ( Month month : Month.values () ) {
    TextStyle style = TextStyle.SHORT;
    String us = month.getDisplayName ( style , Locale.US );
    String quebec = month.getDisplayName ( style , Locale.CANADA_FRENCH );
    String france = month.getDisplayName ( style , Locale.FRANCE );
    System.out.println ( us + " | " + quebec + " | " + france );
}

We get the same behavior in java.time as seen in Joda-Time: In French the abbreviated months have a period. And month names are entirely lowercase.

US | Québec | France
Jan | janv. | janv.
Feb | févr. | févr.
Mar | mars | mars
Apr | avr. | avr.
May | mai | mai
Jun | juin | juin
Jul | juil. | juil.
Aug | août | août
Sep | sept. | sept.
Oct | oct. | oct.
Nov | nov. | nov.
Dec | déc. | déc.
like image 81
Basil Bourque Avatar answered Oct 03 '22 02:10

Basil Bourque