Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

java tools for extracting relevant keywords/tags from articles

I'm looking for java based tools for extracting relevant tags from a given article. I need a tool that will basically try and identify what are the main subjects and terms a given article is related to. Thanks for helping.

like image 833
tomermes Avatar asked May 17 '26 04:05

tomermes


2 Answers

Check the following key words/topics extraction software/tools:

  • Kea - key word extraction
  • Tmt - Stanford topic detection toolkit (integration with Excel, scripts written in Scala), it supports a semi-automatic topic detection mode (with user's feedback).
  • maui

If you would like to develop your own topic detection system, you should take a look on LDA implementation in mallet (link to a working LDA sample, the one on mallet homepage does not work with the newest mallet version).

like image 185
Skarab Avatar answered May 18 '26 17:05

Skarab


You can use HtmlUnit to parse the article's HTML and query for the parts of the document you are interested in searching. Then you can apply a simple algorithm of your own design to determine tags/keywords.

Like for instance, split() the text on whitespace and then count how many times each word occurs. The words that occur the most (ignoring things like "and", "the", "if", etc.) are good candidates for keywords.

like image 37
aroth Avatar answered May 18 '26 16:05

aroth



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!