Multi-locale date parsing

Question

I'm trying to write a class, which able to parse multi-format and multi-locale strings into DateTime.

multi-format means that date might be: dd/MM/yyyy, MMM dd yyyy, ... (up to 10 formats)

multi-locale means that date might be: 29 Dec 2015, 29 Dez 2015, dice 29 2015 ... (up to 10 locales, like en, gr, it, jp )

Using the answer Using Joda Date & Time API to parse multiple formats I wrote:

val locales = List(
  Locale.ENGLISH,
  Locale.GERMAN,
  ...
)

val patterns = List(
  "yyyy/MM/dd",
  "yyyy-MM-dd",
  "MMMM dd, yyyy",
  "dd MMMM yyyy",
  "dd MMM yyyy"
)

val parsers = patterns.flatMap(patt => locales.map(locale => DateTimeFormat.forPattern(patt).withLocale(locale).getParser)).toArray
val birthDateFormatter = new DateTimeFormatterBuilder().append(null, parsers).toFormatter

but it doesn't work:

birthDateFormatter.parseDateTime("29 Dec 2015") // ok
birthDateFormatter.parseDateTime("29 Dez 2015") // exception below

Invalid format: "29 Dez 2015" is malformed at "Dez 2015"
java.lang.IllegalArgumentException: Invalid format: "29 Dez 2015" is
malformed at "Dez 2015"

I found what all parsers: List[DateTimeParser] had "lost" their locales after an appending into birthDateFormatter: DateTimeFormatter. And birthDateFormatter has only one locale - en.

I can write:

val birthDateFormatter = locales.map(new DateTimeFormatterBuilder().append(null, parsers).toFormatter.withLocale(_))

and use it like:

birthDateFormatter.map(_.parseDateTime(stringDate))

but it will throw a lots of exceptions. It's terrible.

How can I parse multi-format and multi-locale strings using joda-time? How can I do it any other way?

Adam Michalik · Accepted Answer

That was interesting to investigate. This is a test suite that helped me (in Java, but I hope you'll get the idea):

import java.util.*;
import java.util.stream.Collectors;

import org.joda.time.DateTime;
import org.joda.time.format.*;
import org.junit.Test;

import static org.assertj.core.api.Assertions.*;

public class JodaTimeLocaleTest {

    @Test // fails on both assertions
    public void testTwoLocales() {
        List<Locale> locales = Arrays.asList(Locale.FRENCH, Locale.GERMAN);
        DateTimeParser[] parsers = locales.stream()
                .map(locale -> DateTimeFormat.forPattern("dd MMM yyyy").withLocale(locale).getParser())
                .collect(Collectors.toList())
                .toArray(new DateTimeParser[0]);
        DateTimeFormatter formatter = new DateTimeFormatterBuilder().append(null, parsers).toFormatter();

        DateTime dateTime1 = formatter.parseDateTime("29 déc. 2015");
        DateTime dateTime2 = formatter.parseDateTime("29 Dez 2015");

        assertThat(dateTime1).isEqualTo(new DateTime("2015-12-29T00:00:00"));
        assertThat(dateTime2).isEqualTo(new DateTime("2015-12-29T00:00:00"));
    }

    @Test // passes
    public void testFrench() {
        DateTimeFormatter formatter = DateTimeFormat.forPattern("dd MMM yyyy").withLocale(Locale.FRENCH);

        DateTime dateTime = formatter.parseDateTime("29 déc. 2015");

        assertThat(dateTime).isEqualTo(new DateTime("2015-12-29T00:00:00"));
    }

    @Test // passes
    public void testGerman() {
        DateTimeFormatter formatter = DateTimeFormat.forPattern("dd MMM yyyy").withLocale(Locale.GERMAN);

        DateTime dateTime = formatter.parseDateTime("29 Dez 2015");

        assertThat(dateTime).isEqualTo(new DateTime("2015-12-29T00:00:00"));
    }
}

First of all, your first example

birthDateFormatter.parseDateTime("29 Dec 2015")

passes only because your machine's default locale is English. If it was different, also this case would have failed. That's why I'm using French and German when running on a machine with English locale. In my case, both assertions fail.

It turns out that the locale is not stored in the parser, but in the formatter only. So when you do

DateTimeFormat.forPattern("dd MMM yyyy").withLocale(locale).getParser()

the locale is set on the formatter, but is then lost when creating the parser:

// DateTimeFormatter#withLocale:
public DateTimeFormatter withLocale(Locale locale) {
    if (locale == getLocale() || (locale != null && locale.equals(getLocale()))) {
        return this;
    }
    // Notice how locale does not affect the parser
    return new DateTimeFormatter(iPrinter, iParser, locale,
            iOffsetParsed, iChrono, iZone, iPivotYear, iDefaultYear);
}

Next, when you create a new formatter

new DateTimeFormatterBuilder().append(null, parsers).toFormatter()

it's created with the system's default locale (unless you override it with withLocale()). And that locale is used during parsing:

// DateTimeFormatter#parseDateTime
public DateTime parseDateTime(String text) {
    InternalParser parser = requireParser();

    Chronology chrono = selectChronology(null);
    // Notice how the formatter's locale is used
    DateTimeParserBucket bucket = new DateTimeParserBucket(0, chrono, iLocale, iPivotYear, iDefaultYear);
    int newPos = parser.parseInto(bucket, text, 0);
    // ... snipped
}

So it turns out that although you can have multiple parsers to support multiple formats, still only a single locale can be used per formatter instance.

Meno Hochschild · Answer

Answer to question 1 (How can I parse multi-format and multi-locale strings using joda-time?):

No this is not possible the way you want, see also the good answer of @Adam Michalik. So the only way is just to write a list of multiple Joda-formatters and to try each one for a given input - possibly catching exceptions. You have already found the right workaround so I don't describe the details here.

Answer to question 2 (How can I do it any other way?):

My library Time4J has got a new MultiFormatParser-class since v4.11. However, I discovered some performance issues with its format engine in general (mainly due to autoboxing feature of Java) so I decided to wait with this answer until release v4.12 where I have improved the performance. According to my first benchmarks Time4J-4.12 seems to be quicker than Joda-Time (v2.9.1) because internal exceptions are strongly reduced. So I think you can give that latest version of Time4J a try and report then some feedback if it works for you.

private static final MultiFormatParser<PlainDate> TIME4J;

static {
    ChronoFormatter<PlainDate> f1 = 
      ChronoFormatter.ofDatePattern("dd.MM.uuuu", PatternType.CLDR, Locale.ROOT);
    ChronoFormatter<PlainDate> f2 = 
      ChronoFormatter.ofDatePattern("MM/dd/uuuu", PatternType.CLDR, Locale.ROOT);
    ChronoFormatter<PlainDate> f3 = 
      ChronoFormatter.ofDatePattern("uuuu-MM-dd", PatternType.CLDR, Locale.ROOT);
    ChronoFormatter<PlainDate> f4 = 
      ChronoFormatter.ofDatePattern("uuuuMMdd", PatternType.CLDR, Locale.ROOT);
    ChronoFormatter<PlainDate> f5 = 
      ChronoFormatter.ofDatePattern("d. MMMM uuuu", PatternType.CLDR, Locale.GERMAN);
    ChronoFormatter<PlainDate> f6 = 
      ChronoFormatter.ofDatePattern("d. MMMM uuuu", PatternType.CLDR, Locale.FRENCH);
    ChronoFormatter<PlainDate> f7 = 
      ChronoFormatter.ofDatePattern("MMMM d, uuuu", PatternType.CLDR, Locale.US);
    TIME4J = MultiFormatParser.of(f1, f2, f3, f4, f5, f6, f7);
}

...

static List<PlainDate> parse(List<String> input) {
    ParseLog plog = new ParseLog();
    int n = input.size();
    List<PlainDate> result = new ArrayList<>(n);

    for (int i = 0; i < n; i++){
        String s = input.get(i);
        plog.reset();
        PlainDate date = TIME4J.parse(s, plog);
        if (!plog.isError()) {
            result.add(date);
        } else {
            // log or report error
        }
    }
    return result;
}

Every single parser within MultiFormatParser keeps its own locale.
The order of parser components matters in terms of performance. Prefer those patterns and locales for first positions which are most common in your input.
I strongly recommend to use a static constant for the MultiFormatParser because a) it is immutable and b) constructing formatters is expensive in every library (and Time4J is no exception about this detail).
For interoperability with Joda-Time you can consider this conversion: LocalDate joda = new LocalDate(plainDate.getYear(), plainDate.getMonth(), plainDate.getDayOfMonth()); But keep in mind that every conversion has some extra costs. On the other side, Joda-Time offers less features than Time4J so latter one can do the full job of all date-time-zone relevant tasks, too.
I am not a scala guy but assume that following scala code might compile: val parser = MultiFormatParser.of(patterns.flatMap(patt => locales.map(locale => ChronoFormatter.ofDatePattern(patt, PatternType.CLDR, locale))).toArray)
By the way: The performance of Joda-Time is not so bad since it was a tough task for me to make it better in Time4J-v4.12. Parsing so different patterns and locales is always a complex task. Surprising for me: The new time library built in Java-8 (package java.time) is the worst in terms of performance according to my own experiments (obviously due to internal exception handling).
If you don't work on Java-8-platforms then you can use Time4J-v3.15 (backport to Java-6-platforms).

Multi-locale date parsing

Tags:

scala

jodatime

sheh

2 Answers

Adam Michalik

Meno Hochschild

Recent Activity

Donate For Us

Multi-locale date parsing

Tags:

scala

jodatime

sheh

2 Answers

Adam Michalik

Meno Hochschild

Related questions

Recent Activity

Donate For Us