Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What's an effective library for suggesting keywords for content?

Currently designing a CMS for use on my website. I am wondering if there were any free libraries available for creating tags based on the content.

Example

I like trees. Trees are plants that have leaves. Leaves on tree can be multi-colored.

Would produce the tags trees and leaves.

The library should be PHP or JS.

EDIT 1:

I have found a simple library for half my task - http://www.cafewebmaster.com/get-top-100-words-keywords-text-php

I have edited what the library specifications should be (thanks to guidance from @NullUserException)-

  • Count all words (ignoring case and inflections), throw out stop words and pick the ones with the highest frequency

  • Edit text to make words that are more specific to the genre (may have a lower frequency), be of higher value. For example in the example - 'multi-colored' should become higher value because it is more specific to the subject. However it should include a prefix indicating it relates to the subject (it would become leaves-multi-colored).

EDIT 2:

Algorithm should remove words that have less than 3 characters unless they are in capitals or formatted otherwise

like image 491
liamzebedee Avatar asked Sep 11 '11 02:09

liamzebedee


1 Answers

Are the tags on your CMS already defined? If yes you could index your text in memory and search using all known tags against your text. Pick the highest scoring tags and present to the user.

Indexing and searching could be done with http://lucene.apache.org/solr/

Edit: Note that I do suggest that your tags/keywords are defined and manageable from an administration panel (like for example in wordpress). Otherwise you'd end up with thousands of keywords generated from your articles which would never help the end user.

like image 101
cherouvim Avatar answered Oct 26 '22 20:10

cherouvim