Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Temporal Extraction (i.e. Extract date/time entities from free form text) - How?

Has anyone found a simple, but effective way to extract date references from text? I've done a fair amount of searching for temporal extraction tools, but there isn't a lot out there. There are a few white papers, but it seems to fall into a subset of the whole semantic web thingy but not given much attention.

I'm just looking for something that is 80% effective. There is no need to capture things like "the month after Jan 2009", but basic common dates entities would be nice.

I'm open to all suggestions, even fancy regex expressions.

Fire away!

(and thanks - Henry)

like image 772
henry74 Avatar asked Jul 16 '09 00:07

henry74


2 Answers

  1. If the target temporal expressions in your data are only in limited format, use regular expression and iterative approach to refine your system

  2. Otherwise, use Stanford NLP toolkit, SUTime, which might be an over-kill but definitely meet your demands

like image 192
JXITC Avatar answered Sep 30 '22 05:09

JXITC


One way I have done this is to just look for anything that is 4 numbers and convert it to a number. If the number falls within the range of years you are interested in, you probably have a year you can use. If you are interested in any matching months and days you could check adjacent words to see if they are a month name or a number between 1 and 31. I am confident this would satisfy your 80% requirement.

Regex for years: [0-9]{4} - you will need to convert to a number and see if it's within the range of years you consider valid.

Regex for months: jan|january|feb|february ... etc for each month

Regex for days of the month: [0-9]{1,2} - you would need to convert to a number and see if it is 1-31

like image 30
jjxtra Avatar answered Sep 30 '22 05:09

jjxtra