Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Most efficient way to do multiple list comprehensions in Python

Given these three list comprehensions, is there a more efficient way to do this rather than three deliberate sets? I believe that for loops in this case would probably be bad form but if I were to iterate over a large number of lines in rowsaslist I feel like what I have below is not that efficient.

cachedStopWords = stopwords.words('english')

rowsaslist = [x.lower() for x in rowsaslist]
rowsaslist = [''.join(c for c in s if c not in string.punctuation) for s in rowsaslist]
rowsaslist = [' '.join([word for word in p.split() if word not in cachedStopWords]) for p in rowsaslist]

Is combining these all into one comprehension statement more efficient? I know from a readability standpoint it would probably be a mess of code.

like image 673
Sean Avatar asked May 05 '26 00:05

Sean


1 Answers

Instead of iterating 3 times on the same list, you could simply define 2 functions and use them in one single list comprehension:

cachedStopWords = stopwords.words('english')


def remove_punctuation(text):
    return ''.join(c for c in text.lower() if c not in string.punctuation)

def remove_stop_words(text):
    return ' '.join([word for word in p.split() if word not in cachedStopWords])

rowsaslist = [remove_stop_words(remove_punctuation(text)) for text in rowsaslist]

I've never used stopwords. If it returns a list, you'd better convert it to a set first to speed up the word not in cachedStopWords test.

Finally, the NLTK package might help you process text. See @alvas' answer.

like image 175
Eric Duminil Avatar answered May 07 '26 13:05

Eric Duminil



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!