For example we have large list of objects like this:
class KeyStatisticEntry:
def __init__(self, value=""):
self.usedBytes = len(value)
self.encoding = get_string_encoding(value)
@property
def total(self):
overhead = get_object_overhead(self.usedBytes)
if self.encoding == 'some value':
return overhead
else:
return self.usedBytes + overhead
@property
def aligned(self):
return some_func_with(self.usedBytes)
# Here is lots of calculated properties on basis of existing properties
And we need to agregate lots of metrix about this obejct - min, max, sum, mean, stdev values of it propertirs. Currently i do it with code like this:
used_bytes = []
total_bytes = []
aligned_bytes = []
encodings = []
for obj in keys.items():
used_bytes.append(obj.usedBytes)
total_bytes.append(obj.total)
aligned_bytes.append(obj.aligned)
encodings.append(obj.encoding)
total_elements = len(used_bytes)
used_user = sum(used_bytes)
used_real = sum(total_bytes)
aligned = sum(aligned_bytes)
mean = statistics.mean(used_bytes)
Is here is more "pythonic" way with better perfomance and memory usage?
You can use operator.attrgetter
in order to get multiple attribute of your objects then use itertools.zip_longest
(itertools.izip_longest
in Python 2.X ) to attach the relative attributes together.
from operator import attrgetter
all_result = [attrgetter('usedBytes','total','aligned','encoding')(obj) for obj in keys.items()]
Or use a generator expression to create a generator instead of a list :
all_result = (attrgetter('usedBytes','total','aligned','encoding')(obj) for obj in keys.items())
Then use zip_longest
:
used_bytes, total_bytes, aligned_bytes, encodings = zip_longest(*all_results)
Then use map
function to apply the sum
function on iterables for which you need the sum:
used_user, used_real, aligned = map(sum,(used_bytes, total_bytes, aligned_bytes))
And separately for len
and mean
:
total_elements = len(used_bytes)
mean = statistics.mean(used_bytes)
And if you want to handle all the sub lists as generator (which is more optimized in terms of memory use and less performance in terms of runtime) you can use a new class in order to calculate the desire result separately using generators :
from itertools import tee
class Aggregator:
def __init__(self, all_obj):
self.obj = all_obj
self.used_user, self.mean = self.getTotalBytesAndMean()
self.total_elements = len(self.all_obj)
self.aligned = self.getAligned()
def getTotalBytesAndMean(self):
iter_1, iter_2 = tee((obj.usedBytes for obj in self.all_obj))
return sum(iter_1), statistics.mean(iter_2)
def getTotal(self):
return sum(obj.total for obj in self.all_obj)
def getAligned(self):
return sum(obj.aligned for obj in self.all_obj)
def getEncoding(self):
return (obj.encoding for obj in self.all_obj)
Then you can do :
Agg = Aggregator(keys.items())
# And simply access to attributes
Agg.used_user
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With