How to stem words in python list?

Tags:

I have python list like below

documents = ["Human machine interface for lab abc computer applications",
             "A survey of user opinion of computer system response time",
             "The EPS user interface management system",
             "System and human system engineering testing of EPS",
             "Relation of user perceived response time to error measurement",
             "The generation of random binary unordered trees",
             "The intersection graph of paths in trees",
             "Graph minors IV Widths of trees and well quasi ordering",
             "Graph minors A survey"]

Now i need to stem it (each word) and get another list. How do i do that ?

220

asked Feb 18 '12 18:02

ChamingaD

2 Answers

from stemming.porter2 import stem

documents = ["Human machine interface for lab abc computer applications",
             "A survey of user opinion of computer system response time",
             "The EPS user interface management system",
             "System and human system engineering testing of EPS",
             "Relation of user perceived response time to error measurement",
             "The generation of random binary unordered trees",
             "The intersection graph of paths in trees",
             "Graph minors IV Widths of trees and well quasi ordering",
             "Graph minors A survey"]

documents = [[stem(word) for word in sentence.split(" ")] for sentence in documents]

What we are doing here is using a list comprehension to loop through each string inside the main list, splitting that into a list of words. Then we loop through that list, stemming each word as we go, returning the new list of stemmed words.

Please note I haven't tried this with stemming installed - I have taken that from the comments, and have never used it myself. This is, however, the basic concept for splitting the list into words. Note that this will produce a list of lists of words, keeping the original separation.

If do not want this separation, you can do:

documents = [stem(word) for sentence in documents for word in sentence.split(" ")]

Instead, which will leave you with one continuous list.

If you wish to join the words back together at the end, you can do:

documents = [" ".join(sentence) for sentence in documents]

or to do it in one line:

documents = [" ".join([stem(word) for word in sentence.split(" ")]) for sentence in documents]

Where keeping the sentence structure, or

documents = " ".join(documents)

Where ignoring it.

183

answered Sep 18 '22 01:09

Gareth Latty

You might want to have a look at the NLTK (Natural Language ToolKit). It has a module nltk.stem which contains various different stemmers.

Thomas

Related questions
                            
                                how to keep return value when logging in scala
                            
                                Simplest TBB example
                            
                                Recommended method for handling UnsupportedEncodingException from String.getBytes("UTF-8")
                            
                                Rename small part of multiple files in middle of name using Bash?
                            
                                Call Nested Function in Python
                            
                                Making it Pythonic: create a sqlite3 database if it doesn't exist?
                            
                                Get random numbers in a specific range in java [duplicate]
                            
                                maximum value for type float in c#
                            
                                Excel VBA: On Error Goto statement not working inside For-Loop
                            
                                Play Framework 2: Read the application version defined in Build.scala
                            
                                Is it possible to change all values of an array without a loop in php?
                            
                                In Scala, how do I get the *name* of an `object` (not an instance of a class)?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With