I'm trying to merge logs from several servers. Each log is a list of tuples (date
, count
). date
may appear more than once, and I want the resulting dictionary to hold the sum of all counts from all servers.
Here's my attempt, with some data for example:
from collections import defaultdict a=[("13.5",100)] b=[("14.5",100), ("15.5", 100)] c=[("15.5",100), ("16.5", 100)] input=[a,b,c] output=defaultdict(int) for d in input: for item in d: output[item[0]]+=item[1] print dict(output)
Which gives:
{'14.5': 100, '16.5': 100, '13.5': 100, '15.5': 200}
As expected.
I'm about to go bananas because of a colleague who saw the code. She insists that there must be a more Pythonic and elegant way to do it, without these nested for loops. Any ideas?
Doesn't get simpler than this, I think:
a=[("13.5",100)] b=[("14.5",100), ("15.5", 100)] c=[("15.5",100), ("16.5", 100)] input=[a,b,c] from collections import Counter print sum( (Counter(dict(x)) for x in input), Counter())
Note that Counter
(also known as a multiset) is the most natural data structure for your data (a type of set to which elements can belong more than once, or equivalently - a map with semantics Element -> OccurrenceCount. You could have used it in the first place, instead of lists of tuples.
Also possible:
from collections import Counter from operator import add print reduce(add, (Counter(dict(x)) for x in input))
Using reduce(add, seq)
instead of sum(seq, initialValue)
is generally more flexible and allows you to skip passing the redundant initial value.
Note that you could also use operator.and_
to find the intersection of the multisets instead of the sum.
The above variant is terribly slow, because a new Counter is created on every step. Let's fix that.
We know that Counter+Counter
returns a new Counter
with merged data. This is OK, but we want to avoid extra creation. Let's use Counter.update
instead:
update(self, iterable=None, **kwds) unbound collections.Counter method
Like dict.update() but add counts instead of replacing them. Source can be an iterable, a dictionary, or another Counter instance.
That's what we want. Let's wrap it with a function compatible with reduce
and see what happens.
def updateInPlace(a,b): a.update(b) return a print reduce(updateInPlace, (Counter(dict(x)) for x in input))
This is only marginally slower than the OP's solution.
Benchmark: http://ideone.com/7IzSx (Updated with yet another solution, thanks to astynax)
(Also: If you desperately want an one-liner, you can replace updateInPlace
by lambda x,y: x.update(y) or x
which works the same way and even proves to be a split second faster, but fails at readability. Don't :-))
from collections import Counter a = [("13.5",100)] b = [("14.5",100), ("15.5", 100)] c = [("15.5",100), ("16.5", 100)] inp = [dict(x) for x in (a,b,c)] count = Counter() for y in inp: count += Counter(y) print(count)
output:
Counter({'15.5': 200, '14.5': 100, '16.5': 100, '13.5': 100})
Edit: As duncan suggested you can replace these 3 lines with a single line:
count = Counter() for y in inp: count += Counter(y)
replace by : count = sum((Counter(y) for y in inp), Counter())
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With