Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Can Python's list comprehensions (ideally) do the equivalent of 'count(*)...group by...' in SQL?

I think list comprehensions may give me this, but I'm not sure: any elegant solutions in Python (2.6) in general for selecting unique objects in a list and providing a count?

(I've defined an __eq__ to define uniqueness on my object definition).

So in RDBMS-land, something like this:

CREATE TABLE x(n NUMBER(1));
INSERT INTO x VALUES(1);
INSERT INTO x VALUES(1);
INSERT INTO x VALUES(1);
INSERT INTO x VALUES(2);

SELECT COUNT(*), n FROM x
GROUP BY n;

Which gives:

COUNT(*) n
==========
3        1
1        2

So , here's my equivalent list in Python:

[1,1,1,2]

And I want the same output as the SQL SELECT gives above.

EDIT: The example I gave here was simplified, I'm actually processing lists of user-defined object-instances: just for completeness I include the extra code I needed to get the whole thing to work:

import hashlib

def __hash__(self):
    md5=hashlib.md5()
    [md5.update(i) for i in self.my_list_of_stuff]
    return int(md5.hexdigest(),16)

The __hash__ method was needed to get the set conversion to work (I opted for the list-comprehension idea that works in 2.6 [despite the fact that I learnt that involves an inefficiency (see comments) - my data set is small enough for that not be an issue]). my_list_of_stuff above is a list of (Strings) on my object definition.

like image 877
monojohnny Avatar asked Jan 27 '10 16:01

monojohnny


People also ask

How do list comprehensions work in Python?

A Python list comprehension consists of brackets containing the expression, which is executed for each element along with the for loop to iterate over each element in the Python list. Python List comprehension provides a much more short syntax for creating a new list based on the values of an existing list.

What are the advantages of using list comprehensions in Python?

One main benefit of using a list comprehension in Python is that it's a single tool that you can use in many different situations. In addition to standard list creation, list comprehensions can also be used for mapping and filtering. You don't have to use a different approach for each scenario.

Why are list comprehensions faster in Python?

List comprehensions are faster than for loops to create lists. But, this is because we are creating a list by appending new elements to it at each iteration.

Does list comprehension exist in Python?

List comprehensions are a syntactic form in Python allowing the programmer to loop over and transform an iterable in one line.


4 Answers

Lennart Regebro provided a nice one-liner that does what you want:

>>> values = [1,1,1,2]
>>> print [(x,values.count(x)) for x in set(values)]
[(1, 3), (2, 1)]

As S.Lott mentions, a defaultdict can do the same thing.

like image 71
mechanical_meat Avatar answered Oct 13 '22 01:10

mechanical_meat


>>> from collections import Counter
>>> Counter([1,1,1,2])
Counter({1: 3, 2: 1})

Counter only available in py3.1, inherits from the dict.

like image 38
SilentGhost Avatar answered Oct 13 '22 00:10

SilentGhost


Not easily doable as a list comprehension.

from collections import defaultdict
def group_by( someList ):
    counts = defaultdict(int)
    for value in someList:
        counts[value.aKey] += 1
    return counts

This is a very Pythonic solution. But not a list comprehension.

like image 45
S.Lott Avatar answered Oct 13 '22 00:10

S.Lott


You can use groupby from the itertools module:

Make an iterator that returns consecutive keys and groups from the iterable. The key is a function computing a key value for each element. If not specified or is None, key defaults to an identity function and returns the element unchanged. Generally, the iterable needs to already be sorted on the same key function.

>>> a = [1,1,1,2]
>>> [(len(list(v)), key) for (key, v) in itertools.groupby(sorted(a))]
[(3, 1), (1, 2)]

I would assume its runtime is worse than the dict-based solutions by SilentGhost or S.Lott since it has to sort the input sequence, but you should time that yourself. It is a list comprehension, though. It should be faster than Adam Bernier's solution, since it doesn't have to do repeated linear scans of the input sequence. If needed, the sorted call can be avoided by sorting the input sequence in-line.

like image 40
Torsten Marek Avatar answered Oct 13 '22 00:10

Torsten Marek