Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to retrieve all kinds of dates and temporal values from text

I wanted to retrieve dates and other temporal entities from a set of Strings. Can this be done without parsing the string for dates in JAVA as most parsers deal with a limited scope of input patterns. But input is a manual entry which here and hence ambiguous.

Inputs can be like:

12th Sep |mid-March |12.September.2013

Sep 12th |12th September| 2013

Sept 13 |12th, September |12th,Feb,2013

I've gone through many answers on finding date in Java but most of them don't deal with such a huge scope of input patterns.

I've tried using SimpleDateFormat class and using some parse() functions to check if parse function breaks which mean its not a date. I've tried using regex but I'm not sure if it falls fit in this scenario. I've also used ClearNLP to annotate the dates but it doesn't give a reliable annotation set.

The closest approach to getting these values could be using a Chain of responsibility as mentioned below. Is there a library that has a set of patterns for date. I can use that maybe?

like image 665
Identity1 Avatar asked Sep 27 '22 12:09

Identity1


1 Answers

A clean and modular approach to this problem would be to use a chain, every element of the chain tries to match the input string against a regex, if the regex matches the input string than you can convert the input string to something that can feed a SimpleDateFormat to convert it to the data structure you prefer (Date? or a different temporal representation that better suits your needs) and return it, if the regexp doesn't matches the chain element just delegates to the next element in the chain.

The responsibility of every element of the chain is just to test the regex against the string, give a result or ask the next element of the chain to give it a try.

The chain can be created and composed easily without having to change the implementation of every element of the chain.

In the end the result is the same as in @KirkoR response, with a 'bit' (:D) more code but a modular approach. (I prefer the regex approach to the try/catch one)

Some reference: https://en.wikipedia.org/wiki/Chain-of-responsibility_pattern

like image 192
Emanuele Ivaldi Avatar answered Sep 30 '22 06:09

Emanuele Ivaldi