I have a simple text analyzer with generates keywords for a given input text. Until now I have been doing a manual evaluation of it, i.e., manually selecting keywords of a text and comparing them against the ones generated by the analyzer.
Is there any way in which I can automate this? I tried googling a lot for some free keyword generators which can help in this evaluation but have not found any till now. I will appreciate any suggestions on how to go about this.
Testing keyword generation is a difficult problem. In the past, I have used the following method to evaluate it.
Identify the popular association-rule generation methods like Confidence, Jaccard, Lift, Chi-Squared, Mutual Information etc. There are many papers that compare such measures.
Implementing these measures is fairly simple. They all involve some simple algebraic expression using one or more of term frequencies, document frequencies and co-occurrence frequencies.
Generate related keywords using all of these measures and compute their union. Call this set TOTAL.
Compute the intersection of the keywords generated by your algorithm with the above TOTAL-set. When viewed as a fraction (intersection/TOTAL), it is a rough indicator of how powerful your measure is.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With