Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to detect if a event/action occurred from a text?

I was wondering if there's a NLP/ML technique for this.

Suppose given a set of sentences,

  1. I watched the movie.
  2. Heard the movie is great, have to watch it.
  3. Got the tickets for the movie.
  4. I am at the movie.

If i have to assign a probability to each of these sentences, that they have "actually" watched the movie, i would assign it in decreasing order of 1,4,3,2.

Is there a way to do this automatically, using some classifier or rules? Any paper/link would help.

like image 416
excray Avatar asked Apr 21 '12 16:04

excray


3 Answers

These are common issues in textual entailment. I'll refer you to some papers. While their motivation is for textual entailment, I believe your problem should be easier than that.

Determining Modality and Factuality for Textual Entailment

Learning to recognize features of valid textual entailments

Some of these suggestions should help you decide on some features/keywords to consider when ranking.

like image 180
Kenston Choi Avatar answered Oct 21 '22 16:10

Kenston Choi


Except 1, none of the other statements necessarily imply that the person has watched the movie. They could have bought the tickets for somebody else (3) and might be the person who sells popcorn outside the halls (4). I don't think there is any clever system out there that will read between the lines for each sentence and return an answer that exactly agrees with your intuitions (which might be different from that of other people for the same sentence btw).

If this strangely is the only case that you care about (which is possible if you are explicitly working with movie reviews), then it might be worth your time to come with a large number of heuristics patched together that yields a function that near exactly agrees with your intuitions about this.

Otherwise look for context available in all the other sentences these sentences originate from to find relevant clues. Somebody who has actually watched the movie may comment on how they liked it, express opinions about specific scenes, characters and actors from the movie, etc. So if the text contains a lot of high sentiment sentences and refers to words and phrases from the movie, then the person has probably watched the movie. If a lot of it is in future tense, then maybe not.

like image 31
Aditya Mukherji Avatar answered Oct 21 '22 18:10

Aditya Mukherji


If you are working with an specific domain, such as "watched the movie or not", or maybe more generally "attended to an event or not", it's basically a case of the Text Classification task.

The common approach in NLP is to use a large amount of sentences tagged as watched or didn't watch to train a machine learning based classifier. The most commonly used features are the presence/absence of keywords, bigrams (sequences of 2 words) and maybe trigrams (sequences of 3 words).

Since you talked about probability, things may get a little more complex. As adi92 noted, in 3 of your sentences the answer is not clear. A way to represent that in the training data could be that a sentence with 0.3 probability of watched appear 3 times tagged as watched and 7 as didn't watch. Most classifiers can have their output easily turned into probabilities.

Anyway, I believe that the main difficulty would be creating a training dataset for the task.

like image 41
erickrf Avatar answered Oct 21 '22 18:10

erickrf