What is a simple way to generate keywords from a text?

Tags:

I suppose I could take a text and remove high frequency English words from it. By keywords, I mean that I want to extract words that are most the characterizing of the content of the text (tags ) . It doesn't have to be perfect, a good approximation is perfect for my needs.

Has anyone done anything like that? Do you known a Perl or Python library that does that?

Lingua::EN::Tagger is exactly what I asked however I needed a library that could work for french text too.

399

asked Jan 21 '09 15:01

Emmanuel Caradec

2 Answers

The name for the "high frequency English words" is stop words and there are many lists available. I'm not aware of any python or perl libraries, but you could encode your stop word list in a binary tree or hash (or you could use python's frozenset), then as you read each word from the input text, check if it is in your 'stop list' and filter it out.

Note that after you remove the stop words, you'll need to do some stemming to normalize the resulting text (remove plurals, -ings, -eds), then remove all the duplicate "keywords".

answered Sep 20 '22 06:09

florin

You could try using the perl module Lingua::EN::Tagger for a quick and easy solution.

A more complicated module Lingua::EN::Semtags::Engine uses Lingua::EN::Tagger with a WordNet database to get a more structured output. Both are pretty easy to use, just check out the documentation on CPAN or use perldoc after you install the module.

answered Sep 19 '22 06:09

andymurd

Related questions
                            
                                Scraping ajax pages using python
                            
                                How to delete all entities for NDB Model in Google App Engine for python?
                            
                                Cannot complete Flask-Migration
                            
                                numpy np.apply_along_axis function speed up?
                            
                                How to make matplotlib graphs look professionally done like this? [closed]
                            
                                Creating a RESTful API using Flask?
                            
                                'IOError: [Errno 5] Input/output error' while using SMBus for analog reading through RPi
                            
                                Difference between list(dict) and dict.keys()?
                            
                                How to enable port 5000 on AWS ubuntu [closed]
                            
                                Nested List to Pandas Dataframe with headers
                            
                                Upload file via sftp with python
                            
                                Unable to import a module from Python notebook in Jupyter
                            
                                Feature Importance with XGBClassifier
                            
                                heroku: no default language could be detected for this app
                            
                                MacOS: How to downgrade homebrew Python?
                            
                                What column type does SQLAlchemy use for "Text" on MySQL?
                            
                                How to drop DataFrame columns based on dtype
                            
                                Docker compose script complaining about a python module import
                            
                                Python at AWS Lambda: `requests` from botocore.vendored deprecated, but `requests` not available
                            
                                Correct way to detect sequence parameter?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

What is a simple way to generate keywords from a text?

Tags:

python

metadata

perl

Emmanuel Caradec

People also ask

2 Answers

florin

andymurd

Recent Activity

Donate For Us