Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pythonic way to aggregate object properties in memory efficient way?

For example we have large list of objects like this:

class KeyStatisticEntry:
    def __init__(self, value=""):
        self.usedBytes = len(value)
        self.encoding = get_string_encoding(value)

    @property
    def total(self):
        overhead = get_object_overhead(self.usedBytes)
        if self.encoding == 'some value':
            return overhead
        else:
            return self.usedBytes + overhead
    @property
    def aligned(self):
        return some_func_with(self.usedBytes)

    # Here is lots of calculated properties on basis of existing properties

And we need to agregate lots of metrix about this obejct - min, max, sum, mean, stdev values of it propertirs. Currently i do it with code like this:

used_bytes = [] 
total_bytes = []
aligned_bytes = []
encodings = []

for obj in keys.items():
    used_bytes.append(obj.usedBytes)
    total_bytes.append(obj.total)
    aligned_bytes.append(obj.aligned)
    encodings.append(obj.encoding)

total_elements = len(used_bytes)
used_user = sum(used_bytes)
used_real = sum(total_bytes)
aligned = sum(aligned_bytes)
mean = statistics.mean(used_bytes)

Question:

Is here is more "pythonic" way with better perfomance and memory usage?

like image 979
Nick Bondarenko Avatar asked Feb 02 '16 15:02

Nick Bondarenko


1 Answers

You can use operator.attrgetter in order to get multiple attribute of your objects then use itertools.zip_longest (itertools.izip_longest in Python 2.X ) to attach the relative attributes together.

from operator import attrgetter
all_result = [attrgetter('usedBytes','total','aligned','encoding')(obj) for obj in keys.items()]

Or use a generator expression to create a generator instead of a list :

all_result = (attrgetter('usedBytes','total','aligned','encoding')(obj) for obj in keys.items())

Then use zip_longest:

used_bytes, total_bytes, aligned_bytes, encodings = zip_longest(*all_results)

Then use map function to apply the sum function on iterables for which you need the sum:

used_user, used_real, aligned = map(sum,(used_bytes, total_bytes, aligned_bytes))

And separately for len and mean:

total_elements = len(used_bytes)
mean = statistics.mean(used_bytes)

And if you want to handle all the sub lists as generator (which is more optimized in terms of memory use and less performance in terms of runtime) you can use a new class in order to calculate the desire result separately using generators :

from itertools import tee
class Aggregator:
    def __init__(self, all_obj):
        self.obj = all_obj
        self.used_user, self.mean = self.getTotalBytesAndMean()
        self.total_elements = len(self.all_obj)
        self.aligned = self.getAligned()

    def getTotalBytesAndMean(self):
        iter_1, iter_2 = tee((obj.usedBytes for obj in self.all_obj))
        return sum(iter_1), statistics.mean(iter_2)

    def getTotal(self):
        return sum(obj.total for obj in self.all_obj)

    def getAligned(self):
        return sum(obj.aligned for obj in self.all_obj)

    def getEncoding(self):
        return (obj.encoding for obj in self.all_obj)

Then you can do :

Agg = Aggregator(keys.items())

# And simply access to attributes
Agg.used_user
like image 179
Mazdak Avatar answered Nov 05 '22 17:11

Mazdak