How to extract keywords from a block of text in Haskell

Question

So I know this is a kind of a large topic, but I need to accept a chunk of text, and extract the most interesting keywords from it. The text comes from TV captions, so the subject can range from news to sports to pop culture references. It is possible to provide the type of show the text came from.

I have an idea to match the text against a dictionary of terms I know to be interesting somehow.

Which libraries for Haskell can help me with this?

Assuming I do have a dictionary of interesting terms, and a database to store them in, is there a particular approach you'd recommend to matching keywords within the text?

Is there an obvious approach I'm not thinking of?

bpgergo · Accepted Answer

I'd stem the words in the chunks and then search for all terms in the dict just two random libs:

stem http://hackage.haskell.org/packages/archive/stemmer/0.2/doc/html/NLP-Stemmer-C.html

search http://hackage.haskell.org/packages/archive/sphinx/0.2.1/doc/html/Text-Search-Sphinx.html

Gene T · Answer

To expand on bpgergo answer (but I don't have any haskell-specific info), it's pretty straightforward to enter documents into a relational database and index them with SOLR/lucene or sphinx, either of which should have a stemmer in their default/suggested configuration. And then you can search on which docs have pairs, triples, etc of your list of "interesting terms"

You might look at Named entity recognition, statistically unusual Phrase Detection, auto-tag generation, topics like that. Lingpipe is a good place to start, also these books:

http://alias-i.com/lingpipe/demos/tutorial/read-me.html

http://www.manning.com/marmanis/excerpt_contents.html

http://www.manning.com/alag/excerpt_contents.html

How to extract keywords from a block of text in Haskell

Tags:

haskell

nlp

Sean Clark Hess

2 Answers

bpgergo

Gene T

Recent Activity

Donate For Us

How to extract keywords from a block of text in Haskell

Tags:

haskell

nlp

Sean Clark Hess

2 Answers

bpgergo

Gene T

Related questions

Recent Activity

Donate For Us