Pythonic way to aggregate object properties in memory efficient way?

Question

For example we have large list of objects like this:

class KeyStatisticEntry:
    def __init__(self, value=""):
        self.usedBytes = len(value)
        self.encoding = get_string_encoding(value)

    @property
    def total(self):
        overhead = get_object_overhead(self.usedBytes)
        if self.encoding == 'some value':
            return overhead
        else:
            return self.usedBytes + overhead
    @property
    def aligned(self):
        return some_func_with(self.usedBytes)

    # Here is lots of calculated properties on basis of existing properties

And we need to agregate lots of metrix about this obejct - min, max, sum, mean, stdev values of it propertirs. Currently i do it with code like this:

used_bytes = [] 
total_bytes = []
aligned_bytes = []
encodings = []

for obj in keys.items():
    used_bytes.append(obj.usedBytes)
    total_bytes.append(obj.total)
    aligned_bytes.append(obj.aligned)
    encodings.append(obj.encoding)

total_elements = len(used_bytes)
used_user = sum(used_bytes)
used_real = sum(total_bytes)
aligned = sum(aligned_bytes)
mean = statistics.mean(used_bytes)

Question:

Is here is more "pythonic" way with better perfomance and memory usage?

Mazdak · Accepted Answer

You can use operator.attrgetter in order to get multiple attribute of your objects then use itertools.zip_longest (itertools.izip_longest in Python 2.X ) to attach the relative attributes together.

from operator import attrgetter
all_result = [attrgetter('usedBytes','total','aligned','encoding')(obj) for obj in keys.items()]

Or use a generator expression to create a generator instead of a list :

all_result = (attrgetter('usedBytes','total','aligned','encoding')(obj) for obj in keys.items())

Then use zip_longest:

used_bytes, total_bytes, aligned_bytes, encodings = zip_longest(*all_results)

Then use map function to apply the sum function on iterables for which you need the sum:

used_user, used_real, aligned = map(sum,(used_bytes, total_bytes, aligned_bytes))

And separately for len and mean:

total_elements = len(used_bytes)
mean = statistics.mean(used_bytes)

And if you want to handle all the sub lists as generator (which is more optimized in terms of memory use and less performance in terms of runtime) you can use a new class in order to calculate the desire result separately using generators :

from itertools import tee
class Aggregator:
    def __init__(self, all_obj):
        self.obj = all_obj
        self.used_user, self.mean = self.getTotalBytesAndMean()
        self.total_elements = len(self.all_obj)
        self.aligned = self.getAligned()

    def getTotalBytesAndMean(self):
        iter_1, iter_2 = tee((obj.usedBytes for obj in self.all_obj))
        return sum(iter_1), statistics.mean(iter_2)

    def getTotal(self):
        return sum(obj.total for obj in self.all_obj)

    def getAligned(self):
        return sum(obj.aligned for obj in self.all_obj)

    def getEncoding(self):
        return (obj.encoding for obj in self.all_obj)

Then you can do :

Agg = Aggregator(keys.items())

# And simply access to attributes
Agg.used_user

Pythonic way to aggregate object properties in memory efficient way?

Tags:

python

list

aggregate

Question:

Nick Bondarenko

1 Answers

Mazdak

Recent Activity

Donate For Us

Pythonic way to aggregate object properties in memory efficient way?

Tags:

python

list

aggregate

Question:

Nick Bondarenko

1 Answers

Mazdak

Related questions

Recent Activity

Donate For Us