Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Does an algorithm exist to help detect the "primary topic" of an English sentence?

I'm trying to find out if there is a known algorithm that can detect the "key concept" of a sentence.

The use case is as follows:

  1. User enters a sentence as a query (Does chicken taste like turkey?)
  2. Our system identifies the concepts of the sentence (chicken, turkey)
  3. And it runs a search of our corpus content

The area that we're lacking in is identifying what the core "topic" of the sentence is really about. The sentence "Does chicken taste like turkey" has a primary topic of "chicken", because the user is asking about the taste of chicken. While "turkey" is a helper topic of less importance.

So... I'm trying to find out if there is an algorithm that will help me identify the primary topic of a sentence... Let me know if you are aware of any!!!

like image 738
rockit Avatar asked Apr 04 '11 21:04

rockit


2 Answers

I actually did a research project on this and won two competitions and am competing in nationals.

There are two steps to the method:

  1. Parse the sentence with a Context-Free Grammar
  2. In the resulting parse trees, find all nouns which are only subordinate to Noun-Phrase-like constituents

For example, "I ate pie" has 2 nouns: "I" and "pie". Looking at the parse tree, "pie" is inside of a Verb Phrase, so it cannot be a subject. "I", however, is only inside of NP-like constituents. being the only subject candidate, it is the subject. Find an early copy of this program on http://www.candlemind.com. Note that the vocabulary is limited to basic singular words, and there are no verb conjugations, so it has "man" but not "men", has "eat" but not "ate." Also, the CFG I used was hand-made an limited. I will be updating this program shortly.

Anyway, there are limitations to this program. My mentor pointed out in its currents state, it cannot recognize sentences with subjects that are "real" NPs (what grammar actually calls NPs). For example, "that the moon is flat is not a debate any longer." The subject is actually "that the moon is flat." However, the program would recognize "moon" as the subject. I will be fixing this shortly.

Anyway, this is good enough for most sentences...

My research paper can be found there too. Go to page 11 of it to read the methods.

Hope this helps.

like image 159
Michael Avatar answered Sep 27 '22 22:09

Michael


Most of your basic NLP parsing techniques will be able to extract the basic aspects of the sentence - i.e., that chicken and turkey a NPs and they are linked by and adjective 'like', etc. Getting these to a 'topic' or 'concept' is more difficult

Technique such as Latent Semantic Analysis and its many derivatives transform this information into a vector (some have methods of retaining in some part the hierarchy/relations between parts of speech) and then compares them to existing, usually pre-classified by concept, vectors. See http://en.wikipedia.org/wiki/Latent_semantic_analysis to get started.

Edit Here's an example LSA app you can play around with to see if you might want to pursue it further . http://lsi.research.telcordia.com/lsi/demos.html

like image 38
dfb Avatar answered Sep 28 '22 00:09

dfb