Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Causal Sentences Extraction Using NLTK python

Tags:

nlp

nltk

I am extracting causal sentences from the accident reports on water. I am using NLTK as a tool here. I manually created my regExp grammar by taking 20 causal sentence structures [see examples below]. The constructed grammar is of the type

grammar = r'''Cause: {<DT|IN|JJ>?<NN.*|PRP|EX><VBD><NN.*|PRP|VBD>?<.*>+<VBD|VBN>?<.*>+}'''

Now the grammar has 100% recall on the test set ( I built my own toy dataset with 50 causal and 50 non causal sentences) but a low precision. I would like to ask about:

  1. How to train NLTK to build the regexp grammar automatically for extracting particular type of sentences.
  2. Has any one ever tried to extract causal sentences. Example causal sentences are:

    • There was poor sanitation in the village, as a consequence, she had health problems.

    • The water was impure in her village, For this reason, she suffered from parasites.

    • She had health problems because of poor sanitation in the village. I would want to extract only the above type of sentences from a large text.

like image 475
Santosh Tirunagari Avatar asked Oct 25 '12 12:10

Santosh Tirunagari


1 Answers

Had a brief discussion with the author of the book: "Python Text Processing with NLTK 2.0 Cookbook", Mr.Jacob Perkins. He said, "a generalized grammar for sentences is pretty hard. I would instead see if you can find common tag patterns, and use those. But then you're essentially do classification by regexp matching. Parsing is usually used to extract phrases within a sentence, or to produce deep parse trees of a sentence, but you're just trying to identify/extract sentences, which is why I think classification is a much better approach. Consider including tagged words as features when you try this, since the grammar could be significant." taking his suggestions I looked at the causal sentences I had and I found out that these sentences have words like

consequently
as a result
Therefore
as a consequence
For this reason
For all these reasons
Thus
because
since
because of
on account of
due to
for the reason
so, that

These words are indeed connecting cause and effect in a sentence. Using these connectors it is now easy to extract causal sentences. A detailed report can be found on arxiv: https://arxiv.org/pdf/1507.02447.pdf

like image 190
Santosh Tirunagari Avatar answered Oct 31 '22 14:10

Santosh Tirunagari