Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Extract Dates and events associated with the date from Text corpus

I am currently running a python code that runs through every line of the text file and parses the line for Dates. If it does find the date in the line, the line is copied to a new Output file. I am repeating this process on 100 documents and at the end, I get an output file containing lines that have Dates Like "2013, August 2014, 01-11-1987 and so on."

The problem with this is, that it does not give accurate information about the events associated with some Dates.

Is there a more elegant approach to this problem? Below is the file in which I am trying to extract events for the date December 2010

Taipei is the most competitive place among all major cities and counties, according to a study published by a local magazine yesterday. Taipei came in first in each of the categories - economy, employment, education, environmental protection, public safety, medical care and local finances - evaluated in the study by Global View Magazine. In terms of overall competitiveness, Taipei is therefore number one, followed by Hsinchu City, Chiayi City and New Taipei. Taipei, with more than six decades of privileged development heavily funded by the central government, will remain unchallenged in the foreseeable future, Global View commented. Taipei and New Taipei are two of the country's five Cabinet-level special municipalities, but the other three - Taichung, Tainan and Kaohsiung - failed to receive good ratings in the study though they have more resources than most other local governments. Taichung ranks seventh, Tainan 12th and Kaohsiung 15th of all 19 local governments graded in the study. The three special municipalities grew to the present size by merging neighboring counties in December 2010. But Global View said the mergers crippled their competitiveness. But all five special municipalities are in the top-10 in terms of economic competitiveness. At the bottom is the agricultural Pingtung County. But another agricultural county, Taitung, made it to the top-10, occupying the eighth place mainly because of its low crime rate, the magazine said.

As you can see when I parse the line containing December 2010 I don't really get any meaningful information But actually, there is one major event which is the merging of neighboring counties. This is not captured. Hence I need to know is there any algorithm/library which can help me capture events that have occurred on a particular date.

like image 654
Sriram Avatar asked Feb 10 '15 17:02

Sriram


People also ask

What is event extraction in NLP?

Abstract—Event extraction is a critical technique to apprehend the essential content of events promptly. With the rapid development of deep learning technology, event extraction technology based on deep learning has become a research hotspot.

Is the process of Extracting meaningful information from text data?

The task of Information Extraction (IE) involves extracting meaningful information from unstructured text data and presenting it in a structured format.


1 Answers

I suggest you to try out the NLTK library for python. You could get it here, also here is some basic manual for it: http://www.nltk.org/book/ch07.html

It has tons of algorithms for extraction of meaning from text. Also it has some of modules which allow you to:

1) Extract entities 2) Extract dates 3) Establish relationship between extracted entities and dates.

I suggest you to pay attention to timex.py module in NLTK library: https://github.com/nltk/nltk_contrib/blob/master/nltk_contrib/timex.py

It is mainly built to tokenize dates and times in text.

And here is guide to extracting entity relationship: http://www.nltk.org/howto/relextract.html

So I beleive you could extract interesting entities from your text (like the event you mentioned), you could extract dates as another set of entities, and using NLTK you could establish relationship between these extracted entities. As there result you should get what you need - what happened when.

like image 99
Maksim Khaitovich Avatar answered Oct 03 '22 10:10

Maksim Khaitovich