I'm trying to write a class, which able to parse multi-format and multi-locale strings into DateTime.
multi-format means that date might be: dd/MM/yyyy, MMM dd yyyy, ... (up to 10 formats)
multi-locale means that date might be: 29 Dec 2015, 29 Dez 2015, dice 29 2015 ... (up to 10 locales, like en, gr, it, jp )
Using the answer Using Joda Date & Time API to parse multiple formats I wrote:
val locales = List(
Locale.ENGLISH,
Locale.GERMAN,
...
)
val patterns = List(
"yyyy/MM/dd",
"yyyy-MM-dd",
"MMMM dd, yyyy",
"dd MMMM yyyy",
"dd MMM yyyy"
)
val parsers = patterns.flatMap(patt => locales.map(locale => DateTimeFormat.forPattern(patt).withLocale(locale).getParser)).toArray
val birthDateFormatter = new DateTimeFormatterBuilder().append(null, parsers).toFormatter
but it doesn't work:
birthDateFormatter.parseDateTime("29 Dec 2015") // ok
birthDateFormatter.parseDateTime("29 Dez 2015") // exception below
Invalid format: "29 Dez 2015" is malformed at "Dez 2015"
java.lang.IllegalArgumentException: Invalid format: "29 Dez 2015" is
malformed at "Dez 2015"
I found what all parsers: List[DateTimeParser] had "lost" their locales after an appending into birthDateFormatter: DateTimeFormatter. And birthDateFormatter has only one locale - en.
I can write:
val birthDateFormatter = locales.map(new DateTimeFormatterBuilder().append(null, parsers).toFormatter.withLocale(_))
and use it like:
birthDateFormatter.map(_.parseDateTime(stringDate))
but it will throw a lots of exceptions. It's terrible.
How can I parse multi-format and multi-locale strings using joda-time? How can I do it any other way?
That was interesting to investigate. This is a test suite that helped me (in Java, but I hope you'll get the idea):
import java.util.*;
import java.util.stream.Collectors;
import org.joda.time.DateTime;
import org.joda.time.format.*;
import org.junit.Test;
import static org.assertj.core.api.Assertions.*;
public class JodaTimeLocaleTest {
@Test // fails on both assertions
public void testTwoLocales() {
List<Locale> locales = Arrays.asList(Locale.FRENCH, Locale.GERMAN);
DateTimeParser[] parsers = locales.stream()
.map(locale -> DateTimeFormat.forPattern("dd MMM yyyy").withLocale(locale).getParser())
.collect(Collectors.toList())
.toArray(new DateTimeParser[0]);
DateTimeFormatter formatter = new DateTimeFormatterBuilder().append(null, parsers).toFormatter();
DateTime dateTime1 = formatter.parseDateTime("29 déc. 2015");
DateTime dateTime2 = formatter.parseDateTime("29 Dez 2015");
assertThat(dateTime1).isEqualTo(new DateTime("2015-12-29T00:00:00"));
assertThat(dateTime2).isEqualTo(new DateTime("2015-12-29T00:00:00"));
}
@Test // passes
public void testFrench() {
DateTimeFormatter formatter = DateTimeFormat.forPattern("dd MMM yyyy").withLocale(Locale.FRENCH);
DateTime dateTime = formatter.parseDateTime("29 déc. 2015");
assertThat(dateTime).isEqualTo(new DateTime("2015-12-29T00:00:00"));
}
@Test // passes
public void testGerman() {
DateTimeFormatter formatter = DateTimeFormat.forPattern("dd MMM yyyy").withLocale(Locale.GERMAN);
DateTime dateTime = formatter.parseDateTime("29 Dez 2015");
assertThat(dateTime).isEqualTo(new DateTime("2015-12-29T00:00:00"));
}
}
First of all, your first example
birthDateFormatter.parseDateTime("29 Dec 2015")
passes only because your machine's default locale is English. If it was different, also this case would have failed. That's why I'm using French and German when running on a machine with English locale. In my case, both assertions fail.
It turns out that the locale is not stored in the parser, but in the formatter only. So when you do
DateTimeFormat.forPattern("dd MMM yyyy").withLocale(locale).getParser()
the locale is set on the formatter, but is then lost when creating the parser:
// DateTimeFormatter#withLocale:
public DateTimeFormatter withLocale(Locale locale) {
if (locale == getLocale() || (locale != null && locale.equals(getLocale()))) {
return this;
}
// Notice how locale does not affect the parser
return new DateTimeFormatter(iPrinter, iParser, locale,
iOffsetParsed, iChrono, iZone, iPivotYear, iDefaultYear);
}
Next, when you create a new formatter
new DateTimeFormatterBuilder().append(null, parsers).toFormatter()
it's created with the system's default locale (unless you override it with withLocale()). And that locale is used during parsing:
// DateTimeFormatter#parseDateTime
public DateTime parseDateTime(String text) {
InternalParser parser = requireParser();
Chronology chrono = selectChronology(null);
// Notice how the formatter's locale is used
DateTimeParserBucket bucket = new DateTimeParserBucket(0, chrono, iLocale, iPivotYear, iDefaultYear);
int newPos = parser.parseInto(bucket, text, 0);
// ... snipped
}
So it turns out that although you can have multiple parsers to support multiple formats, still only a single locale can be used per formatter instance.
Answer to question 1 (How can I parse multi-format and multi-locale strings using joda-time?):
No this is not possible the way you want, see also the good answer of @Adam Michalik. So the only way is just to write a list of multiple Joda-formatters and to try each one for a given input - possibly catching exceptions. You have already found the right workaround so I don't describe the details here.
Answer to question 2 (How can I do it any other way?):
My library Time4J has got a new MultiFormatParser-class since v4.11. However, I discovered some performance issues with its format engine in general (mainly due to autoboxing feature of Java) so I decided to wait with this answer until release v4.12 where I have improved the performance. According to my first benchmarks Time4J-4.12 seems to be quicker than Joda-Time (v2.9.1) because internal exceptions are strongly reduced. So I think you can give that latest version of Time4J a try and report then some feedback if it works for you.
private static final MultiFormatParser<PlainDate> TIME4J;
static {
ChronoFormatter<PlainDate> f1 =
ChronoFormatter.ofDatePattern("dd.MM.uuuu", PatternType.CLDR, Locale.ROOT);
ChronoFormatter<PlainDate> f2 =
ChronoFormatter.ofDatePattern("MM/dd/uuuu", PatternType.CLDR, Locale.ROOT);
ChronoFormatter<PlainDate> f3 =
ChronoFormatter.ofDatePattern("uuuu-MM-dd", PatternType.CLDR, Locale.ROOT);
ChronoFormatter<PlainDate> f4 =
ChronoFormatter.ofDatePattern("uuuuMMdd", PatternType.CLDR, Locale.ROOT);
ChronoFormatter<PlainDate> f5 =
ChronoFormatter.ofDatePattern("d. MMMM uuuu", PatternType.CLDR, Locale.GERMAN);
ChronoFormatter<PlainDate> f6 =
ChronoFormatter.ofDatePattern("d. MMMM uuuu", PatternType.CLDR, Locale.FRENCH);
ChronoFormatter<PlainDate> f7 =
ChronoFormatter.ofDatePattern("MMMM d, uuuu", PatternType.CLDR, Locale.US);
TIME4J = MultiFormatParser.of(f1, f2, f3, f4, f5, f6, f7);
}
...
static List<PlainDate> parse(List<String> input) {
ParseLog plog = new ParseLog();
int n = input.size();
List<PlainDate> result = new ArrayList<>(n);
for (int i = 0; i < n; i++){
String s = input.get(i);
plog.reset();
PlainDate date = TIME4J.parse(s, plog);
if (!plog.isError()) {
result.add(date);
} else {
// log or report error
}
}
return result;
}
MultiFormatParser keeps its own locale.MultiFormatParser because a) it is immutable and b) constructing formatters is expensive in every library (and Time4J is no exception about this detail).LocalDate joda = new LocalDate(plainDate.getYear(), plainDate.getMonth(), plainDate.getDayOfMonth()); But keep in mind that every conversion has some extra costs. On the other side, Joda-Time offers less features than Time4J so latter one can do the full job of all date-time-zone relevant tasks, too.val parser = MultiFormatParser.of(patterns.flatMap(patt => locales.map(locale => ChronoFormatter.ofDatePattern(patt, PatternType.CLDR, locale))).toArray)java.time) is the worst in terms of performance according to my own experiments (obviously due to internal exception handling).If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With