Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I extract datetime from freeform text?

I'm trying to come up with something along the lines of Google Calendar (or even some gmail messages), where freeform text will be parsed and converted to specific dates/times.

Some examples (assume for simplicity that right now is January 01, 2013 at 1am):

"I should call Mom tomorrow to wish her a happy birthday" -> "tomorrow" = "2013-01-02"
"The super bowl is on Feb 3rd at 6:30pm" -> "Feb 3rd at 6:30" => "2013-02-03T06:30:00Z"
"Remind me to take out the trash on Friday" => "Friday" => "2013-01-04"

First of all I'll ask this - are there any already existing open source libraries that this (or part of this). If not, what sort of approaches do you think I should take?

I am thinking of a few different possibilities:

  1. Lots of regular expressions, as many as I can come up with for each different use case
  2. Some sort of Bayesian Net that looks at n-grams and categorizes them into different scenarios like "relative date", "relative day of week", "specific date", "date and time", and then runs it through a rules engine (maybe more regex) to figure out the actual date.
  3. Sending it to a Google search and try to extract meaningful information from the search results (this one is probably not realistic)
like image 927
Paul Avatar asked Dec 29 '12 18:12

Paul


1 Answers

You can use this library: https://github.com/wanasit/chrono

Demo:

inputs = ["I should call Mom tomorrow to with her a happy birthday",
"The super bowl is on Feb 3rd at 6:30pm", "Remind me to take out the trash on Friday"];

for(var i = 0; i < inputs.length; i++) {
    var input = inputs[i];
    var parsed = chrono.parse(input);
    console.log(input + " parsed as: " + JSON.stringify(parsed.map(function(p) { return [p.text, p.startDate]; })));
}
​

Output:

I should call Mom tomorrow to with her a happy birthday parsed as: [["tomorrow","2012-12-31T06:30:00.000Z"]]
The super bowl is on Feb 3rd at 6:30pm parsed as: [["Feb 3rd at 6:30pm","2013-02-03T13:00:00.000Z"]]
Remind me to take out the trash on Friday parsed as: [["Friday","2013-01-04T06:30:00.000Z"]] 

http://jsfiddle.net/TXX3Z/

like image 54
Dogbert Avatar answered Sep 20 '22 16:09

Dogbert