Given the following list: <pre class="prettyprint"><code>[ ('A', '', Decimal('4.0000000000'), 1330, datetime.datetime(2012, 6, 8, 0, 0)), ('B', '', Decimal('31.0000000000'), 1330, datetime.datetime(2012, 6, 4, 0, 0)), ('AA', 'C', Decimal('31.0000000000'), 1330, datetime.datetime(2012, 5, 31, 0, 0)), ('B', '', Decimal('7.0000000000'), 1330, datetime.datetime(2012, 5, 24, 0, 0)), ('A', '', Decimal('21.0000000000'), 1330, datetime.datetime(2012, 5, 14, 0, 0)) ] </code></pre> I would like to group these by the first, second, fourth and fifth columns in the tuple and sum the 3rd. For this example I'll name the columns as col1, col2, col3, col4, col5. In SQL I would do something like this: <pre class="prettyprint"><code>select col1, col2, sum(col3), col4, col5 from my table group by col1, col2, col4, col5 </code></pre> Is there a "cool" way to do this or is it all a manual loop?

You want <code>itertools.groupby</code>. Note that <code>groupby</code> expects the input to be sorted, so you may need to do that before hand: <pre class="prettyprint"><code>keyfunc = lambda t: (t[0], t[1], t[3], t[4]) data.sort(key=keyfunc) for key, rows in itertools.groupby(data, keyfunc): print key, sum(r[2] for r in rows) </code></pre>

Python - Group by and sum a list of tuples

Tags:

python

group-by

list-comprehension

Given the following list:

[
    ('A', '', Decimal('4.0000000000'), 1330, datetime.datetime(2012, 6, 8, 0, 0)),
    ('B', '', Decimal('31.0000000000'), 1330, datetime.datetime(2012, 6, 4, 0, 0)),
    ('AA', 'C', Decimal('31.0000000000'), 1330, datetime.datetime(2012, 5, 31, 0, 0)),
    ('B', '', Decimal('7.0000000000'), 1330, datetime.datetime(2012, 5, 24, 0, 0)),
    ('A', '', Decimal('21.0000000000'), 1330, datetime.datetime(2012, 5, 14, 0, 0))
]

I would like to group these by the first, second, fourth and fifth columns in the tuple and sum the 3rd. For this example I'll name the columns as col1, col2, col3, col4, col5.

In SQL I would do something like this:

select col1, col2, sum(col3), col4, col5 from my table
group by col1, col2, col4, col5

Is there a "cool" way to do this or is it all a manual loop?

348

asked Jun 15 '12 20:06

jbassking10

1 Answers

You want itertools.groupby.

Note that groupby expects the input to be sorted, so you may need to do that before hand:

keyfunc = lambda t: (t[0], t[1], t[3], t[4])
data.sort(key=keyfunc)
for key, rows in itertools.groupby(data, keyfunc):
    print key, sum(r[2] for r in rows)

128

answered Sep 22 '22 21:09

David Wolever

Related questions
                            
                                Loading a large dictionary using python pickle
                            
                                Python PIL: How to draw an ellipse in the middle of an image?
                            
                                Make map() return a dictionary
                            
                                How to pass the remote IP to a proxied service? - Nginx
                            
                                How do I measure the memory usage of an object in python?
                            
                                Finding the most popular words in a list
                            
                                Why does pickle __getstate__ accept as a return value the very instance it required __getstate__ to pickle in the first place?
                            
                                Python zipfile module: difference between zipfile.ZIP_DEFLATED and zipfile.ZIP_STORED
                            
                                Listing Related Fields in Django ModelAdmin
                            
                                Python Sockets: Enabling Promiscuous Mode in Linux
                            
                                What is usage of the last comma in this code?
                            
                                Python pattern-matching. Match 'c[any number of consecutive a's, b's, or c's or b's, c's, or a's etc.]t'
                            
                                how do I specify extended ascii (i.e. range(256)) in the python magic encoding specifier line?
                            
                                Generating python CLI man page
                            
                                Matching 2 regular expressions in Python
                            
                                exporting from/importing to numpy, scipy in SQLite and HDF5 formats
                            
                                How does SWIG wrap a map<string,string> in Python?
                            
                                python make class iterable by returning embedded iterable
                            
                                Build in function for computing covariance
                            
                                My first web app (Python): use CGI, or a framework like Django?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With