I'm writing an RSS reader in python as a learning exercise, and I would really like to be able to tag individual entries with keywords for searching. Unfortunately, most real-world feeds don't include keyword metadata. I currently have about 60,000 entries in my test database from about 600 feeds, so manually tagging is not going to be effective. So far I have only been able to find two solutions:
1: Use Natural Language Toolkit to extract keywords:
2: Use the Google Adwords API to fetch keyword suggestions from the article url:
Can anyone offer any suggestions? Are my fears about getting my adwords account banned unfounded?
There are a number of free and commercial text annotation tools/services you might consider, depending on your specific needs, listed under:
Is there a better tool than OpenCalais?.
A number of these provide entities, some provide a measure of keyword relevance, and others provide topic tags.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With