Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

SimpleDateFormat leniency leads to unexpected behavior

I have found that SimpleDateFormat::parse(String source)'s behavior is (unfortunatelly) defaultly set as lenient: setLenient(true).

By default, parsing is lenient: If the input is not in the form used by this object's format method but can still be parsed as a date, then the parse succeeds.

If I set the leniency to false, the documentation said that with strict parsing, inputs must match this object's format. I have used paring with SimpleDateFormat without the lenient mode and by mistake, I had a typo in the date (letter o instead of number 0). (Here is the brief working code:)

// PASSED (year 199)
SimpleDateFormat simpleDateFormat = new SimpleDateFormat("dd.mm.yyyy");
System.out.println(simpleDateFormat.parse("03.12.199o"));
simpleDateFormat.setLenient(false);
System.out.println(simpleDateFormat.parse("03.12.199o"));        //WTF?

In my surprise, this has passed and no ParseException has been thrown. I'd go further:

// PASSED (year 1990)
String string = "just a String to mess with SimpleDateFormat";

SimpleDateFormat simpleDateFormat = new SimpleDateFormat("dd.mm.yyyy");
System.out.println(simpleDateFormat.parse("03.12.1990" + string));
simpleDateFormat.setLenient(false);
System.out.println(simpleDateFormat.parse("03.12.1990" + string));

Let's go on:

// FAILED on the 2nd line
SimpleDateFormat simpleDateFormat = new SimpleDateFormat("dd.mm.yyyy");
System.out.println(simpleDateFormat.parse("o3.12.1990"));
simpleDateFormat.setLenient(false);
System.out.println(simpleDateFormat.parse("o3.12.1990"));

Finally, the exception is thrown: Unparseable date: "o3.12.1990". I wonder where is the difference in the leniency and why the last line of my first code snippet has not thrown an exception? The documentation says:

With strict parsing, inputs must match this object's format.

My input clearly doesn't strictly match the format - I expect this parsing to be really strict. Why does this (not) happen?

like image 820
Nikolas Charalambidis Avatar asked Mar 06 '23 01:03

Nikolas Charalambidis


2 Answers

Why does this (not) happen?

It’s not very well explained in the documentation.

With lenient parsing, the parser may use heuristics to interpret inputs that do not precisely match this object's format. With strict parsing, inputs must match this object's format.

The documentation does help a bit, though, by mentioning that it is the Calendar object that the DateFormat uses that is lenient. That Calendar object is not used for the parsing itself, but for interpreting the parsed values into a date and time (I am quoting DateFormat documentation since SimpleDateFormat is a subclass of DateFormat).

  • SimpleDateFormat, no matter if lenient or not, will accept 3-digit year, for example 199, even though you have specified yyyy in the format pattern string. The documentation says about year:

    For parsing, if the number of pattern letters is more than 2, the year is interpreted literally, regardless of the number of digits. So using the pattern "MM/dd/yyyy", "01/11/12" parses to Jan 11, 12 A.D.

  • DateFormat, no matter if lenient or not, accepts and ignores text after the parsed text, like the small letter o in your first example. It objects to unexpected text before or inside the text, as when in your last example you put the letter o in front. The documentation of DateFormat.parse says:

    The method may not use the entire text of the given string.

  • As I indirectly said, leniency makes a difference when interpreting the parsed values into a date and time. So a lenient SimpleDateFormat will interpret 29.02.2019 as 01.03.2019 because there are only 28 days in February 2019. A strict SimpleDateFormat will refuse to do that and will throw an exception. The default lenient behaviour can lead to very surprising and downright inexplicable results. As a simple example, giving the day, month and year in the wrong order: 1990.03.12 will result in August 11 year 17 AD (2001 years ago).

The solution

VGR already in a comment mentioned LocalDate from java.time, the modern Java date and time API. In my experience java.time is so much nicer to work with than the old date and time classes, so let’s give it a shot. Try a correct date string first:

    DateTimeFormatter dateFormatter = DateTimeFormatter.ofPattern("dd.mm.yyyy");
    System.out.println(LocalDate.parse("03.12.1990", dateFormatter));

We get:

java.time.format.DateTimeParseException: Text '03.12.1990' could not be parsed: Unable to obtain LocalDate from TemporalAccessor: {Year=1990, DayOfMonth=3, MinuteOfHour=12},ISO of type java.time.format.Parsed

This is because I used your format pattern string of dd.mm.yyyy, where lowercase mm means minute. When we read the error message closely enough, it does state that the DateTimeFormatter interpreted 12 as minute of hour, which was not what we intended. While SimpleDateFormat tacitly accepted this (even when strict), java.time is more helpful in pointing out our mistake. What the message only indirectly says is that it is missing a month value. We need to use uppercase MM for month. At the same time I am trying your date string with the typo:

    DateTimeFormatter dateFormatter = DateTimeFormatter.ofPattern("dd.MM.yyyy");
    System.out.println(LocalDate.parse("03.12.199o", dateFormatter));

We get:

java.time.format.DateTimeParseException: Text '03.12.199o' could not be parsed at index 6

Index 6 is where is says 199. It objects because we had specified 4 digits and are only supplying 3. The docs say:

The count of letters determines the minimum field width …

It would also object to unparsed text after the date. In short it seems to me that it gives you everything that you had expected.

Links

  • DateFormat.setLenient documentation
  • Oracle tutorial: Date Time explaining how to use java.time.
like image 105
Ole V.V. Avatar answered Mar 18 '23 13:03

Ole V.V.


Leniency is not about whether the entire input matches but whether the format matches. Your input can still be 3.12.1990somecrap and it would work.

The actual parsing is done in parse(String, ParsePosition) which you could use as well. Basically parse(String) will pass a ParsePosition that is set up to start at index 0 and when the parsing is done the current index of that position is checked.

If it's still 0 the start of the input didn't match the format, not even in lenient mode.

However, to the parser 03.12.199 is a valid date and hence it stops at index 8 - which isn't 0 and thus the parsing succeeded. If you want to check whether everything was parsed you'd have to pass your own ParsePosition and check whether the index is matches to the length of the input.

like image 21
Thomas Avatar answered Mar 18 '23 15:03

Thomas