Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Algorithm for computing the relevance of a keyword to a short text (50 - 100 words)

I want to compute the relevance of a keyword to a short description text. What would be the best approach in terms of efficiency and ease of implementation. I am using C++?

like image 413
fgungor Avatar asked Dec 28 '10 12:12

fgungor


2 Answers

Simple solution: Count the occurrences of the word in the text.

To do a good job though is a hard problem that companies like Google have been working on for years. If possible, you might want to take a look at using their technology

To expand, try the following:

  • Use a dictionary (e.g. WordNet to replace all synonyms with a common word
  • Detect similar words using Levenshtein distance

That's still only going to get you so far. You'll need to perform some natural language processing to truly understand what the description is about to distinguish between multiple texts containing the keyword the same number of times.

like image 118
moinudin Avatar answered Sep 21 '22 04:09

moinudin


Refer to these previous Stack Overflow questions:

  • What are Useful Ranking Algorithms for Documents without Links (e.g. PDF, MS Documents, etc…)?

  • Algorithm for generating a 'top list' using word frequency.

like image 30
Leniel Maccaferri Avatar answered Sep 20 '22 04:09

Leniel Maccaferri