Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Automatically pick tags from context using Python

Tags:

python

tags

How can I pick tags from an article or a user's post using Python?

Is the following method ok?

  1. Build a list of word frequency from the text and sort them.

  2. Remove some common words and pick the top 10 words remained in the list as the tags.

If the above method is ok, what library can detect if which words are common, like "the, if, you, etc" and which are descriptive words?

like image 881
jack Avatar asked Dec 30 '22 10:12

jack


2 Answers

Here's an article on removing stop words. The link to the stop word list in the article is broken but here's another one.

like image 145
ʞɔıu Avatar answered Jan 13 '23 09:01

ʞɔıu


The Natural Language Toolkit offers a broad variety of methods for this kind of stuff. I can't give you hands-on advice as I'm not familiar with this subject, but I think it's worth the effort to read a few articles about this topic first before you start: just picking words from the text directly won't get you very far I think, you should probably try to find similar words to the ones for that tags already exist. And of course you need to filter out the common words of the language like "the" and stuff. Again, this Python library can help you with this, at least for a few common languages.

like image 36
paprika Avatar answered Jan 13 '23 09:01

paprika