Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Calculate count of all the elements in nested list

I have list of lists and would like to create data frame with count of all unique elements. Here is my test data:

test = [["P1", "P1", "P1", "P2", "P2", "P1", "P1", "P3"],
        ["P1", "P1", "P1"],
        ["P1", "P1", "P1", "P2"],
        ["P4"],
        ["P1", "P4", "P2"],
        ["P1", "P1", "P1"]]

I can do something like this using Counter with for loop as:

from collections import Counter
for item in test:
     print(Counter(item))

But how can I have result of this loop summed up into new data frame ?

Expected output as data frame:

P1 P2 P3 P4
15 4  1  2
like image 912
ThomasJohnson Avatar asked Feb 14 '18 13:02

ThomasJohnson


People also ask

How do you count elements in a nested list in Python?

Use List comprehension to count elements in list of lists. Iterate over the list of lists using List comprehension. Build a new list of sizes of internal lists. Then pass the list to sum() to get total number of elements in list of lists i.e.

How do you count all the elements in a list?

The most straightforward way to get the number of elements in a list is to use the Python built-in function len() . As the name function suggests, len() returns the length of the list, regardless of the types of elements in it.

How do you sum a nested list?

We can find sum of each column of the given nested list using zip function of python enclosing it within list comprehension. Another approach is to use map(). We apply the sum function to each element in a column and find sum of each column accordingly.

How do I count multiple items in a list?

If you want to count multiple items in a list, you can call count() in a loop. This approach, however, requires a separate pass over the list for every count() call; which can be catastrophic for performance. Use couter() method from class collections , instead.


2 Answers

Here is one way.

from collections import Counter
from itertools import chain

test = [["P1", "P1", "P1", "P2", "P2", "P1", "P1", "P3"],
        ["P1", "P1", "P1"],
        ["P1", "P1", "P1", "P2"],
        ["P4"],
        ["P1", "P4", "P2"],
        ["P1", "P1", "P1"]]

c = Counter(chain.from_iterable(test))

for k, v in c.items():
    print(k, v)

# P1 15
# P2 4
# P3 1
# P4 2    

For output as dataframe:

df = pd.DataFrame.from_dict(c, orient='index').transpose()

#    P1 P2 P3 P4
# 0  15  4  1  2
like image 161
jpp Avatar answered Oct 23 '22 08:10

jpp


In terms of better performance, you should be either using:

  • collections.Counter with itertools.chain.from_iterable as:

    >>> from collections import Counter
    >>> from itertools import chain
    
    >>> Counter(chain.from_iterable(test))
    Counter({'P1': 15, 'P2': 4, 'P4': 2, 'P3': 1})
    
  • OR, yo should be using collections.Counter with list comprehension (requires one less import of itertools with same performance) as:

    >>> from collections import Counter
    
    >>> Counter([x for a in test for x in a])
    Counter({'P1': 15, 'P2': 4, 'P4': 2, 'P3': 1})
    

Keep reading for more alternative solutions and the performance comparison. (skip otherwise)


Approach 1: Concatenate your sublists to create the single list and find the count using collections.Counter.

  • Solution 1: Concatenate list using itertools.chain.from_iterable and find the count using collections.Counter as:

    test = [
        ["P1", "P1", "P1", "P2", "P2", "P1", "P1", "P3"],
        ["P1", "P1", "P1"],
        ["P1", "P1", "P1", "P2"],
        ["P4"],
        ["P1", "P4", "P2"],
        ["P1", "P1", "P1"]
    ]
    
    from itertools import chain 
    from collections import Counter
    
    my_counter = Counter(chain.from_iterable(test)) 
    
  • Solution 2: Combine list using list comprehension as:

    from collections import Counter
    
    my_counter = Counter([x for a in my_list for x in a])
    
  • Solution 3: Concatenate list using sum

    from collections import Counter
    
    my_counter = Counter(sum(test, []))
    

Approach 2: Calculate count of elements in each sublist using collections.Counter and then sum the Counter objects in the list.

  • Solution 4: Count objects of each sublist using collections.Counter and map as:

    from collections import Counter
    
    my_counter = sum(map(Counter, test), Counter())
    
  • Solution 5: Count objects of each sublist using list comprehension as:

    from collections import Counter
    
    my_counter = sum([Counter(t) for t in test], Counter())
    

In all the solutions above, my_counter will hold the value:

>>> my_counter
Counter({'P1': 15, 'P2': 4, 'P4': 2, 'P3': 1})

Performance Comparison

Below is the timeit comparison on Python 3 for the list of 1000 sublist and 100 elements in each sublist:

  1. Fastest using chain.from_iterable (17.1 msec)

    mquadri$ python3 -m timeit "from collections import Counter; from itertools import chain; my_list = [list(range(100)) for i in range(1000)]" "Counter(chain.from_iterable(my_list))"
    100 loops, best of 3: 17.1 msec per loop 
    
  2. Second on the list is using list comprehension to combine the list and then do the Count (similar result as above but without the additional import of itertools) (18.36 msec)

    mquadri$ python3 -m timeit "from collections import Counter; my_list = [list(range(100)) for i in range(1000)]" "Counter([x for a in my_list for x in a])"
    100 loops, best of 3: 18.36 msec per loop
    
  3. Third in terms of performance is using Counter on sublists within list comprehension : (162 msec)

    mquadri$ python3 -m timeit "from collections import Counter; my_list = [list(range(100)) for i in range(1000)]" "sum([Counter(t) for t in my_list], Counter())"
    10 loops, best of 3: 162 msec per loop
    
  4. Fourth on the list is via using Counter with map (results are quite similar to the one using list comprehension above) (176 msec)

    mquadri$ python3 -m timeit "from collections import Counter; my_list = [list(range(100)) for i in range(1000)]" "sum(map(Counter, my_list), Counter())"
    10 loops, best of 3: 176 msec per loop
    
  5. Solution using sum to concatenate the list is too slow (526 msec)

    mquadri$ python3 -m timeit "from collections import Counter; my_list = [list(range(100)) for i in range(1000)]" "Counter(sum(my_list, []))"
    10 loops, best of 3: 526 msec per loop
    
like image 5
Moinuddin Quadri Avatar answered Oct 23 '22 07:10

Moinuddin Quadri