I have a list of objects of type C, where type C consists of properties X,Y,Z, e.g., c.X, c.Y, c.Z Now I want to perform the following task: <ul> <li>Sum on the property Z of those objects that has the same value for property Y</li> <li>Output a list of tuples (Y, sum of Zs with this Y)</li> </ul> What's the most concise way?

The <code>defaultdict</code> approach is probably better, assuming <code>c.Y</code> is hashable, but here's another way: <pre class="prettyprint"><code>from itertools import groupby from operator import attrgetter get_y = attrgetter('Y') tuples = [(y, sum(c.Z for c in cs_with_y) for y, cs_with_y in groupby(sorted(cs, key=get_y), get_y)] </code></pre> To be a little more concrete about the differences: <ul> <li> This approach requires making a sorted copy of <code>cs</code>, which takes O(n log n) time and O(n) extra space. Alternatively, you can do <code>cs.sort(key=get_y)</code> to sort <code>cs</code> in-place, which doesn't need extra space but does modify the list <code>cs</code>. Note that <code>groupby</code> returns an iterator so there's not any extra overhead there. If the <code>c.Y</code> values aren't hashable, though, this does work, whereas the <code>defaultdict</code> approach will throw a <code>TypeError</code>. But watch out -- in recent Pythons it'll raise <code>TypeError</code> if there are any complex numbers in there, and maybe in other cases. It might be possible to make this work with an appropriate <code>key</code> function -- <code>key=lambda e: (e.real, e.imag) if isinstance(e, complex) else e</code> seems to be working for anything I've tried against it right now, though of course custom classes that override the <code>__lt__</code> operator to raise an exception are still no go. Maybe you could define a more complicated key function that tests for this, and so on. Of course, all we care about here is that equal things are next to each other, not so much that it's actually sorted, and you could write an O(n^2) function to do that rather than sort if you so desired. Or a function that's O(num_hashable + num_nonhashable^2). Or you could write an O(n^2) / O(num_hashable + num_nonhashable^2) version of <code>groupby</code> that does the two together. </li> <li>sblom's answer works for hashable <code>c.Y</code> attributes, with minimal extra space (because it computes the sums directly).</li> <li>philhag's answer is basically the same as sblom's, but uses more auxiliary memory by making a list of each of the <code>c</code>s -- effectively doing what <code>groupby</code> does, but with hashing instead of assuming it's sorted and with actual lists instead of iterators.</li> </ul> So, if you know your <code>c.Y</code> attribute is hashable and only need the sums, use sblom's; if you know it's hashable but want them grouped for something else as well, use philhag's; if they might not be hashable, use this one (with extra worrying as noted if they might be complex or a custom type that overrides <code>__lt__</code>).

What's the most concise way in Python to group and sum a list of objects by the same property

1 Answers

The defaultdict approach is probably better, assuming c.Y is hashable, but here's another way:

from itertools import groupby
from operator import attrgetter
get_y = attrgetter('Y')
tuples = [(y, sum(c.Z for c in cs_with_y) for y, cs_with_y in 
           groupby(sorted(cs, key=get_y), get_y)]

To be a little more concrete about the differences:

This approach requires making a sorted copy of cs, which takes O(n log n) time and O(n) extra space. Alternatively, you can do cs.sort(key=get_y) to sort cs in-place, which doesn't need extra space but does modify the list cs. Note that groupby returns an iterator so there's not any extra overhead there. If the c.Y values aren't hashable, though, this does work, whereas the defaultdict approach will throw a TypeError.

But watch out -- in recent Pythons it'll raise TypeError if there are any complex numbers in there, and maybe in other cases. It might be possible to make this work with an appropriate key function -- key=lambda e: (e.real, e.imag) if isinstance(e, complex) else e seems to be working for anything I've tried against it right now, though of course custom classes that override the __lt__ operator to raise an exception are still no go. Maybe you could define a more complicated key function that tests for this, and so on.

Of course, all we care about here is that equal things are next to each other, not so much that it's actually sorted, and you could write an O(n^2) function to do that rather than sort if you so desired. Or a function that's O(num_hashable + num_nonhashable^2). Or you could write an O(n^2) / O(num_hashable + num_nonhashable^2) version of groupby that does the two together.
sblom's answer works for hashable c.Y attributes, with minimal extra space (because it computes the sums directly).
philhag's answer is basically the same as sblom's, but uses more auxiliary memory by making a list of each of the cs -- effectively doing what groupby does, but with hashing instead of assuming it's sorted and with actual lists instead of iterators.

So, if you know your c.Y attribute is hashable and only need the sums, use sblom's; if you know it's hashable but want them grouped for something else as well, use philhag's; if they might not be hashable, use this one (with extra worrying as noted if they might be complex or a custom type that overrides __lt__).

175

answered Sep 18 '22 04:09

Danica

Related questions
                            
                                Most Pythonic way equivalent for: while ((x = next()) != END)
                            
                                Can I use a ForeignKey in __unicode__ return?
                            
                                How to do this join query in Django
                            
                                Create PyQt menu from a list of strings
                            
                                Method's default parameter values are evaluated *once*
                            
                                Python things which are neither True nor False
                            
                                Python error: ImportError: cannot import name Akismet
                            
                                How to write modern Python tests?
                            
                                Drawing semi-transparent polygons in PIL
                            
                                Is it possible to detect duplicate image files?
                            
                                Mongodb - are reliability issues significant still?
                            
                                Number of floats between two floats
                            
                                Replace non-ascii chars from a unicode string in Python
                            
                                Python/Numpy: Convert list of bools to unsigned int
                            
                                Python3 sleep() problem
                            
                                Why is RabbitMQ not persisting messages on a durable queue?
                            
                                List comprehension to extract a list of tuples from dictionary
                            
                                Creating a function object from a string
                            
                                Query when parameter is none django
                            
                                Django - redirect to version with www

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

What's the most concise way in Python to group and sum a list of objects by the same property

Tags:

python

KFL

People also ask

1 Answers

Danica

Recent Activity

Donate For Us