Count strings in nested list

Question

I have a list of lists as follows.

sentences = [
    ["my", "first", "question", "in", "stackoverflow", "is", "my", "favorite"], 
    ["my", "favorite", "language", "is", "python"]
]

I want to get the count of each word in the sentences list. So, my output should look as follows.

{
    'stackoverflow': 1,
     'question': 1,
     'is': 2,
     'language': 1,
     'first': 1,
     'in': 1,
     'favorite': 2,
     'python': 1,
     'my': 3
}

I am currently doing it as follows.

frequency_input = [item for sublist in sentences for item in sublist]
frequency_output = dict(
    (x,frequency_input.count(x)) 
    for x in set(frequency_input)
)

However, it is not efficient at all for long lists. I have a really long list with about 1 million sentences in the list. It took me two days to run it, and it is still running.

In that case I would like to make my programme more efficient. My current first line of the code is O(n^2) and my second line is O(n). Please let me know if there is a more efficient way of doing this in python. It would be reaaly ideal if I could run it with lesser time than now. I am not worried about space complexity.

I am happy to provide more details if needed.

yatu · Accepted Answer

A simpler and more performant approach would be to flatten the lists using itertools.chain, and to count the strings with collections.Counter:

from collections import Counter
from itertools import chain

Counter(chain.from_iterable(sentences))

Counter({'my': 3,
         'first': 1,
         'question': 1,
         'in': 1,
         'stackoverflow': 1,
         'is': 2,
         'favorite': 2,
         'language': 1,
         'python': 1})

EmreAydin · Answer

You can use Counter class from collections module.

if you want to learn the number of words in each sentence separately you can do as follows

from collections import Counter

sentences = [["my", "first", "question", "in", "stackoverflow", "is", "my", "favorite"], ["my", "favorite", "language", "is", "python"]]

counter_list = [dict(Counter(sentence)) for sentence in sentences]
print(counter_list)

Output:

[{'my': 2, 'first': 1, 'question': 1, 'in': 1, 'stackoverflow': 1, 'is': 1, 'favorite': 1}, {'my': 1, 'favorite': 1, 'language': 1, 'is': 1, 'python': 1}]

Or if you want total word counts you can use chain method from itertools module.

import itertools
from collections import Counter

sentences = [["my", "first", "question", "in", "stackoverflow", "is", "my", "favorite"], ["my", "favorite", "language", "is", "python"]]

sentences = list(itertools.chain.from_iterable(sentences))
word_counts = Counter(sentences)
print(word_counts)

Output:

Counter({'my': 3, 'is': 2, 'favorite': 2, 'first': 1, 'question': 1, 'in': 1, 'stackoverflow': 1, 'language': 1, 'python': 1})

The complexity of Counter object as documentation show, Counter is a dict subclass for counting hashable objects. So constructing counter object from an iterable has the time complexity of O(n)

Count strings in nested list

Tags:

python

list

EmJ

2 Answers

yatu

EmreAydin

Recent Activity

Donate For Us

Count strings in nested list

Tags:

python

list

EmJ

2 Answers

yatu

EmreAydin

Related questions

Recent Activity

Donate For Us