Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

elegant way to reduce a list of dictionaries?

I have a list of dictionaries and each dictionary contains exactly the same keys. I want to find the average value for each key and I would like to know how to do it using reduce (or if not possible with another more elegant way than using nested fors).

Here is the list:

[
  {
    "accuracy": 0.78,
    "f_measure": 0.8169374016795885,
    "precision": 0.8192088044235794,
    "recall": 0.8172222222222223
  },
  {
    "accuracy": 0.77,
    "f_measure": 0.8159133315763016,
    "precision": 0.8174754717495807,
    "recall": 0.8161111111111111
  },
  {
    "accuracy": 0.82,
    "f_measure": 0.8226353934130455,
    "precision": 0.8238175920455686,
    "recall": 0.8227777777777778
  }, ...
]

I would like to get back I dictionary like this:

{
  "accuracy": 0.81,
  "f_measure": 0.83,
  "precision": 0.84,
  "recall": 0.83
}

Here is what I had so far, but I don't like it:

folds = [ ... ]

keys = folds[0].keys()
results = dict.fromkeys(keys, 0)

for fold in folds:
    for k in keys:
        results[k] += fold[k] / len(folds)

print(results)
like image 281
Christos Baziotis Avatar asked Jul 11 '16 13:07

Christos Baziotis


4 Answers

As an alternative, if you're going to be doing such calculations on data, then you may wish to use pandas (which will be overkill for a one off, but will greatly simplify such tasks...)

import pandas as pd

data = [
  {
    "accuracy": 0.78,
    "f_measure": 0.8169374016795885,
    "precision": 0.8192088044235794,
    "recall": 0.8172222222222223
  },
  {
    "accuracy": 0.77,
    "f_measure": 0.8159133315763016,
    "precision": 0.8174754717495807,
    "recall": 0.8161111111111111
  },
  {
    "accuracy": 0.82,
    "f_measure": 0.8226353934130455,
    "precision": 0.8238175920455686,
    "recall": 0.8227777777777778
  }, # ...
]

result = pd.DataFrame.from_records(data).mean().to_dict()

Which gives you:

{'accuracy': 0.79000000000000004,
 'f_measure': 0.8184953755563118,
 'precision': 0.82016728940624295,
 'recall': 0.81870370370370382}
like image 194
Jon Clements Avatar answered Oct 17 '22 03:10

Jon Clements


Here you go, a solution using reduce():

from functools import reduce  # Python 3 compatibility

summed = reduce(
    lambda a, b: {k: a[k] + b[k] for k in a},
    list_of_dicts,
    dict.fromkeys(list_of_dicts[0], 0.0))
result = {k: v / len(list_of_dicts) for k, v in summed.items()}

This produces a starting point with 0.0 values from the keys of the first dictionary, then sums all values (per key) into a final dictionary. The sums are then divided to produce an average.

Demo:

>>> from functools import reduce
>>> list_of_dicts = [
...   {
...     "accuracy": 0.78,
...     "f_measure": 0.8169374016795885,
...     "precision": 0.8192088044235794,
...     "recall": 0.8172222222222223
...   },
...   {
...     "accuracy": 0.77,
...     "f_measure": 0.8159133315763016,
...     "precision": 0.8174754717495807,
...     "recall": 0.8161111111111111
...   },
...   {
...     "accuracy": 0.82,
...     "f_measure": 0.8226353934130455,
...     "precision": 0.8238175920455686,
...     "recall": 0.8227777777777778
...   }, # ...
... ]
>>> summed = reduce(
...     lambda a, b: {k: a[k] + b[k] for k in a},
...     list_of_dicts,
...     dict.fromkeys(list_of_dicts[0], 0.0))
>>> summed
{'recall': 2.4561111111111114, 'precision': 2.4605018682187287, 'f_measure': 2.4554861266689354, 'accuracy': 2.37}
>>> {k: v / len(list_of_dicts) for k, v in summed.items()}
{'recall': 0.8187037037037038, 'precision': 0.820167289406243, 'f_measure': 0.8184953755563118, 'accuracy': 0.79}
>>> from pprint import pprint
>>> pprint(_)
{'accuracy': 0.79,
 'f_measure': 0.8184953755563118,
 'precision': 0.820167289406243,
 'recall': 0.8187037037037038}
like image 45
Martijn Pieters Avatar answered Oct 17 '22 05:10

Martijn Pieters


You could use a Counter to do the summing elegantly:

from itertools import Counter

summed = sum((Counter(d) for d in folds), Counter())
averaged = {k: v/len(folds) for k, v in summed.items()}

If you really feel like it, it can even be turned into a oneliner:

averaged = {
    k: v/len(folds)
    for k, v in sum((Counter(d) for d in folds), Counter()).items()
}

In any case, I consider either more readable than a complicated reduce(); sum() itself is an appropriately specialized version of that.

An even simpler oneliner that doesn't require any imports:

averaged = {
    k: sum(d[k] for d in folds)/len(folds)
    for k in folds[0]
}

Interestingly, it's considerably faster (even than pandas?!), and also the statistic is easier to change.

I tried replacing the manual calculation by statistics.mean() function in Python 3.5, but that makes it over 10 times slower.

like image 27
Thijs van Dien Avatar answered Oct 17 '22 05:10

Thijs van Dien


Here is a terrible one liner using list comprehension. You probably are better off not using this.

final =  dict(zip(lst[0].keys(), [n/len(lst) for n in [sum(i) for i in zip(*[tuple(x1.values()) for x1 in lst])]]))

for key, value in final.items():
    print (key, value)

#Output
recall 0.818703703704
precision 0.820167289406
f_measure 0.818495375556
accuracy 0.79
like image 1
Jeremy Avatar answered Oct 17 '22 05:10

Jeremy