Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Java or Scala fast way to parse dates with many different formats using java.time

I would like to have a generic and fast parser for dates that comes with random format like:

  • 2018
  • 2018-12-31
  • 2018/12/31
  • 2018 dec 31
  • 20181231151617
  • 2018-12-31T15:16:17
  • 2018-12-31T15:16:17.123456
  • 2018-12-31T15:16:17.123456Z
  • 2018-12-31T15:16:17.123456 UTC
  • 2018-12-31T15:16:17.123456+01:00
  • ... so many possibilities

Is there a nice way a or "magic" function do that?

Currently I am planning to use something like this:

val formatter = new DateTimeFormatterBuilder()
  .appendPattern("[yyyy-MM-dd'T'HH:mm:ss]")
  .appendPattern("[yyyy-MM-dd]")
  .appendPattern("[yyyy]")
  // add so many things here
  .parseDefaulting(ChronoField.MONTH_OF_YEAR, 1)
  .parseDefaulting(ChronoField.DAY_OF_MONTH, 1)
  .parseDefaulting(ChronoField.HOUR_OF_DAY, 0)
  .parseDefaulting(ChronoField.MINUTE_OF_HOUR, 0)
  .parseDefaulting(ChronoField.SECOND_OF_MINUTE, 0)
  .parseDefaulting(ChronoField.MICRO_OF_SECOND, 0)
  .toFormatter()


val temporalAccessor = formatter.parse("2018")
val localDateTime = LocalDateTime.from(temporalAccessor)
localDateTime.getHour
val zonedDateTime = ZonedDateTime.of(localDateTime, ZoneId.systemDefault)
val result = Instant.from(zonedDateTime)

But is there a smarter way than specifying hundreds of formats?

Most of answers I found are outdated (pre Java8) or do not focus on performance and a lot of different formats.

like image 278
Benjamin Avatar asked Nov 06 '22 21:11

Benjamin


1 Answers

No, there is no nice/magic way to do this, for two main reasons:

  1. There are variations and ambiguities in data formats that make a generic parser very difficult. e.g. 11/11/11

  2. You are looking for very high performance, which rules out any brute-force methods. 1us per date means only a few thousand instructions to do the full parsing.

At some level you are going to have to specify what formats are valid and how to interpret them. The best way to do this is probably one or more regular expressions that extract the appropriate fields from all the allowable combinations of characters that might form a date, and then much simpler validation of the individual fields.

Here is an example that deals with all dates you listed:

val DateMatch = """(\d\d\d\d)[-/ ]?((?:\d\d)|(?:\w\w\w))?[-/ ]?(\d\d)?T?(\d\d)?:?(\d\d)?:?(\d\d)?[\.]*(\d+)?(.*)?""".r

date match {
  case DateMatch(year, month, day, hour, min, sec, usec, timezone) =>
    (year, Option(month).getOrElse("1"), Option(day).getOrElse(1), Option(hour).getOrElse(0), Option(min).getOrElse(0), Option(sec).getOrElse(0), Option(usec).getOrElse(0), Option(timezone).getOrElse(""))
  case _ =>
    throw InvalidDateException
}

As you can see it is going to get very hairy once all the possible dates are included. But if the regex engine can handle it then it should be efficient because the regex should compile to a state machine that looks at each character once.

like image 77
Tim Avatar answered Nov 14 '22 22:11

Tim