I've built a content aggregator and would like to add a tag cloud representing the current trends.
Unfortunately this is quite complex, as I have to look for keywords that represent the context of each article.
For example words such as I, was, the, amazing, nice have no relation to context.
Help would be much appreciated! :)
Use NLTK, and in particular its Stopwords corpus:
Besides regular content words, there is another class of words called stop words that perform important grammatical functions, but are unlikely to be interesting by themselves. These include prepositions, complementizers, and determiners. NLTK comes bundled with the Stopwords corpus, a list of 2400 stop words across 11 different languages (including English).
NLTK can help you analyze the content in order to pick out relevant terms.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With