Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Count the frequency of a recurring list -- inside a list of lists

I have a list of lists in python and I need to find how many times each sub-list has occurred. Here is a sample,

from collections import Counter
list1 = [[ 1., 4., 2.5], [ 1., 2.66666667, 1.33333333], 
         [ 1., 2., 2.], [ 1., 2.66666667, 1.33333333], [ 1., 4., 2.5],
         [ 1., 2.66666667, 1.33333333]]   
c = Counter(x for x in iter(list1))
print c

I above code will work, if the elements of the list were hashable (say int), but in this case they are lists and I get an error

TypeError: unhashable type: 'list'

How can I count these lists so I get something like

[ 1., 2.66666667, 1.33333333], 3
[ 1., 4., 2.5], 2
[ 1., 2., 2.], 1
like image 295
rambalachandran Avatar asked Dec 11 '22 18:12

rambalachandran


2 Answers

Just convert the lists to tuple:

>>> c = Counter(tuple(x) for x in iter(list1))
>>> c
Counter({(1.0, 2.66666667, 1.33333333): 3, (1.0, 4.0, 2.5): 2, (1.0, 2.0, 2.0): 1})

Remember to do the same for lookup:

>>> c[tuple(list1[0])]
2
like image 118
tobias_k Avatar answered Apr 21 '23 17:04

tobias_k


Counter returns a dictionary like object which it's keys must be hashable. And since lists are not hashable you can convert them to tuple using map function:

>>> Counter(map(tuple, list1))
Counter({(1.0, 2.66666667, 1.33333333): 3, (1.0, 4.0, 2.5): 2, (1.0, 2.0, 2.0): 1})

Note that using map will perform slightly better than a generator expression because by passing a generator expression to Counter() python will get the values from generator function by itself, since using built-in function map has more performance in terms of execution time1.

# Use generator expression
~ $ python -m timeit --setup "list1 = [[ 1., 4., 2.5], [ 1., 2.66666667, 1.33333333],[ 1., 2., 2.], [ 1., 2.66666667, 1.33333333], [ 1., 4., 2.5],[ 1., 2.66666667, 1.33333333]] ;from collections import Counter" "Counter(tuple(x) for x in iter(list1))"
100000 loops, best of 3: 9.86 usec per loop
# Use map
~ $ python -m timeit --setup "list1 = [[ 1., 4., 2.5], [ 1., 2.66666667, 1.33333333],[ 1., 2., 2.], [ 1., 2.66666667, 1.33333333], [ 1., 4., 2.5],[ 1., 2.66666667, 1.33333333]] ;from collections import Counter" "Counter(map(tuple, list1))"
100000 loops, best of 3: 7.92 usec per loop

From PEP 0289 -- Generator Expressions:

The semantics of a generator expression are equivalent to creating an anonymous generator function and calling it. For example:

g = (x**2 for x in range(10))
print g.next()

is equivalent to:

def __gen(exp):
    for x in exp:
        yield x**2
g = __gen(iter(range(10)))
print g.next()

Note that since generator expressions are better in terms of memory use, if you are dealing with large data you'd better use generator expression instead of map function.

like image 39
Mazdak Avatar answered Apr 21 '23 17:04

Mazdak