Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Term extraction: Generatings tags out of text

Tags:

How to get the same results as http://developer.yahoo.com/search/content/V1/termExtraction.html

This question has been asked quite a few times before.

  • best approach to analyze text in PHP?

  • What is a good keyword extraction web service?

  • What is a simple way to generate keywords from a text?

Trying to approach this problem with existing solutions I stumbled upon "Text Analysis" Solr performs on the document before indexing as described in http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters - which includes stemming as well.

So the final index will consist mostly of terms used to describe the document.

Is there a solution that provides analyzers, tokenizers, and token filters for direct use? If solr is the way out, what is the best way get this data from solr's index?

like image 755
Sukumar Avatar asked Jul 08 '09 21:07

Sukumar


People also ask

What we call the process of finding keywords in a text?

The process of finding these keywords is called Keyword Research.


1 Answers

Solr is a way to create a custom search engine. It does not seem to be the right tool for the job. The Wikipedia article about term extraction lists in its "external links" section several web applications for term extraction. OpenNLP has a list of tools which may be useful. Its Chunker may be helpful.

like image 181
Yuval F Avatar answered Oct 12 '22 01:10

Yuval F