I have a list of lists as follows.
sentences = [
["my", "first", "question", "in", "stackoverflow", "is", "my", "favorite"],
["my", "favorite", "language", "is", "python"]
]
I want to get the count of each word in the sentences
list. So, my output should look as follows.
{
'stackoverflow': 1,
'question': 1,
'is': 2,
'language': 1,
'first': 1,
'in': 1,
'favorite': 2,
'python': 1,
'my': 3
}
I am currently doing it as follows.
frequency_input = [item for sublist in sentences for item in sublist]
frequency_output = dict(
(x,frequency_input.count(x))
for x in set(frequency_input)
)
However, it is not efficient at all for long lists. I have a really long list with about 1 million sentences in the list. It took me two days to run it, and it is still running.
In that case I would like to make my programme more efficient. My current first line of the code is O(n^2)
and my second line is O(n)
. Please let me know if there is a more efficient way of doing this in python. It would be reaaly ideal if I could run it with lesser time than now. I am not worried about space complexity.
I am happy to provide more details if needed.
A simpler and more performant approach would be to flatten the lists using itertools.chain
, and to count the strings with collections.Counter
:
from collections import Counter
from itertools import chain
Counter(chain.from_iterable(sentences))
Counter({'my': 3,
'first': 1,
'question': 1,
'in': 1,
'stackoverflow': 1,
'is': 2,
'favorite': 2,
'language': 1,
'python': 1})
You can use Counter class from collections module.
if you want to learn the number of words in each sentence separately you can do as follows
from collections import Counter
sentences = [["my", "first", "question", "in", "stackoverflow", "is", "my", "favorite"], ["my", "favorite", "language", "is", "python"]]
counter_list = [dict(Counter(sentence)) for sentence in sentences]
print(counter_list)
Output:
[{'my': 2, 'first': 1, 'question': 1, 'in': 1, 'stackoverflow': 1, 'is': 1, 'favorite': 1}, {'my': 1, 'favorite': 1, 'language': 1, 'is': 1, 'python': 1}]
Or if you want total word counts you can use chain method from itertools module.
import itertools
from collections import Counter
sentences = [["my", "first", "question", "in", "stackoverflow", "is", "my", "favorite"], ["my", "favorite", "language", "is", "python"]]
sentences = list(itertools.chain.from_iterable(sentences))
word_counts = Counter(sentences)
print(word_counts)
Output:
Counter({'my': 3, 'is': 2, 'favorite': 2, 'first': 1, 'question': 1, 'in': 1, 'stackoverflow': 1, 'language': 1, 'python': 1})
The complexity of Counter object as documentation show, Counter is a dict subclass for counting hashable objects. So constructing counter object from an iterable has the time complexity of O(n)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With