I have a list of approximately 300 words and a huge amount of text that I want to scan to know how many times each word appears.
I am using the re module from python:
for word in list_word:
search = re.compile(r"""(\s|,)(%s).?(\s|,|\.|\))""" % word)
occurrences = search.subn("", text)[1]
but I want to know if there is a more efficient or more elegant way of doing this?
If you have a huge amount of text, I wouldn't use regexps in this case but simply split text:
words = {"this": 0, "that": 0}
for w in text.split():
if w in words:
words[w] += 1
words will give you the frequency for each word
Try stripping all the punctuation from your text and then splitting on whitespace. Then simply do
for word in list_word:
occurence = strippedText.count(word)
Or if you're using python 3.0 I think you could do:
occurences = {word: strippedText.count(word) for word in list_word}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With