I have a list of dictionaries and each dictionary contains exactly the same keys. I want to find the average value for each key and I would like to know how to do it using reduce (or if not possible with another more elegant way than using nested for
s).
Here is the list:
[
{
"accuracy": 0.78,
"f_measure": 0.8169374016795885,
"precision": 0.8192088044235794,
"recall": 0.8172222222222223
},
{
"accuracy": 0.77,
"f_measure": 0.8159133315763016,
"precision": 0.8174754717495807,
"recall": 0.8161111111111111
},
{
"accuracy": 0.82,
"f_measure": 0.8226353934130455,
"precision": 0.8238175920455686,
"recall": 0.8227777777777778
}, ...
]
I would like to get back I dictionary like this:
{
"accuracy": 0.81,
"f_measure": 0.83,
"precision": 0.84,
"recall": 0.83
}
Here is what I had so far, but I don't like it:
folds = [ ... ]
keys = folds[0].keys()
results = dict.fromkeys(keys, 0)
for fold in folds:
for k in keys:
results[k] += fold[k] / len(folds)
print(results)
As an alternative, if you're going to be doing such calculations on data, then you may wish to use pandas (which will be overkill for a one off, but will greatly simplify such tasks...)
import pandas as pd
data = [
{
"accuracy": 0.78,
"f_measure": 0.8169374016795885,
"precision": 0.8192088044235794,
"recall": 0.8172222222222223
},
{
"accuracy": 0.77,
"f_measure": 0.8159133315763016,
"precision": 0.8174754717495807,
"recall": 0.8161111111111111
},
{
"accuracy": 0.82,
"f_measure": 0.8226353934130455,
"precision": 0.8238175920455686,
"recall": 0.8227777777777778
}, # ...
]
result = pd.DataFrame.from_records(data).mean().to_dict()
Which gives you:
{'accuracy': 0.79000000000000004,
'f_measure': 0.8184953755563118,
'precision': 0.82016728940624295,
'recall': 0.81870370370370382}
Here you go, a solution using reduce()
:
from functools import reduce # Python 3 compatibility
summed = reduce(
lambda a, b: {k: a[k] + b[k] for k in a},
list_of_dicts,
dict.fromkeys(list_of_dicts[0], 0.0))
result = {k: v / len(list_of_dicts) for k, v in summed.items()}
This produces a starting point with 0.0
values from the keys of the first dictionary, then sums all values (per key) into a final dictionary. The sums are then divided to produce an average.
Demo:
>>> from functools import reduce
>>> list_of_dicts = [
... {
... "accuracy": 0.78,
... "f_measure": 0.8169374016795885,
... "precision": 0.8192088044235794,
... "recall": 0.8172222222222223
... },
... {
... "accuracy": 0.77,
... "f_measure": 0.8159133315763016,
... "precision": 0.8174754717495807,
... "recall": 0.8161111111111111
... },
... {
... "accuracy": 0.82,
... "f_measure": 0.8226353934130455,
... "precision": 0.8238175920455686,
... "recall": 0.8227777777777778
... }, # ...
... ]
>>> summed = reduce(
... lambda a, b: {k: a[k] + b[k] for k in a},
... list_of_dicts,
... dict.fromkeys(list_of_dicts[0], 0.0))
>>> summed
{'recall': 2.4561111111111114, 'precision': 2.4605018682187287, 'f_measure': 2.4554861266689354, 'accuracy': 2.37}
>>> {k: v / len(list_of_dicts) for k, v in summed.items()}
{'recall': 0.8187037037037038, 'precision': 0.820167289406243, 'f_measure': 0.8184953755563118, 'accuracy': 0.79}
>>> from pprint import pprint
>>> pprint(_)
{'accuracy': 0.79,
'f_measure': 0.8184953755563118,
'precision': 0.820167289406243,
'recall': 0.8187037037037038}
You could use a Counter
to do the summing elegantly:
from itertools import Counter
summed = sum((Counter(d) for d in folds), Counter())
averaged = {k: v/len(folds) for k, v in summed.items()}
If you really feel like it, it can even be turned into a oneliner:
averaged = {
k: v/len(folds)
for k, v in sum((Counter(d) for d in folds), Counter()).items()
}
In any case, I consider either more readable than a complicated reduce()
; sum()
itself is an appropriately specialized version of that.
An even simpler oneliner that doesn't require any imports:
averaged = {
k: sum(d[k] for d in folds)/len(folds)
for k in folds[0]
}
Interestingly, it's considerably faster (even than pandas
?!), and also the statistic is easier to change.
I tried replacing the manual calculation by statistics.mean()
function in Python 3.5, but that makes it over 10 times slower.
Here is a terrible one liner using list comprehension. You probably are better off not using this.
final = dict(zip(lst[0].keys(), [n/len(lst) for n in [sum(i) for i in zip(*[tuple(x1.values()) for x1 in lst])]]))
for key, value in final.items():
print (key, value)
#Output
recall 0.818703703704
precision 0.820167289406
f_measure 0.818495375556
accuracy 0.79
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With