I have a list of dictionaries which I need to aggregate in Python:
data = [{"startDate": 123, "endDate": 456, "campaignName": "abc", "campaignCfid": 789, "budgetImpressions": 10},
{"startDate": 123, "endDate": 456, "campaignName": "abc", "campaignCfid": 789, "budgetImpressions": 50},
{"startDate": 456, "endDate": 789, "campaignName": "def", "campaignCfid": 123, "budgetImpressions": 80}]
and I'm looking to aggregate based on budgetImpressions.
So the final result should be:
data = [{"startDate": 123, "endDate": 456, "campaignName": "abc", "campaignCfid": 789, "budgetImpressions": 60},
{"startDate": 456, "endDate": 789, "campaignName": "def", "campaignCfid": 123, "budgetImpressions": 80}]
Note every entry with a certain campaignName will always have the same corresponding campaignCfid, startDate and endDate.
Can this be done in Python? I've tried using itertools without much success. Would it be a better approach to use Pandas?
Just to demonstrate that sometimes python is perfectly fine to do this kind of stuff in:
In [11]: from collections import Counter
from itertools import groupby
In [12]: data = [{"startDate": 123, "endDate": 456, "campaignName": "abc", "campaignCfid": 789, "budgetImpressions": 10}, {"startDate": 123, "endDate": 456, "campaignName": "abc", "campaignCfid": 789, "budgetImpressions": 50}, {"startDate": 456, "endDate": 789, "campaignName": "def", "campaignCfid": 123, "budgetImpressions": 80}]
In [13]: g = groupby(data, lambda x: x.pop('campaignName'))
In [14]: d = {}
for campaign, campaign_data in g:
c = Counter()
for row in campaign_data: c.update(row)
d[campaign] = c # if you want a dict rather than Counter, return dict(c) here
In [15]: d
Out[15]:
{'abc': Counter({'campaignCfid': 1578, 'endDate': 912, 'startDate': 246, 'budgetImpressions': 60}),
'def': Counter({'endDate': 789, 'startDate': 456, 'campaignCfid': 123, 'budgetImpressions': 80})}
If you already have this collection of lists/dicts, it doesn't really make sense to promote this to a DataFrame, it's often cheaper to stay in pure python.
Yes, use pandas. It's great. You can use the groupby
functionality and aggregate by sums, then convert the output to a list of dicts if that is exactly what you want.
import pandas as pd
data = [{"startDate": 123, "endDate": 456, "campaignName": 'abc',
"campaignCfid": 789, "budgetImpressions": 10},
{"startDate": 123, "endDate": 456, "campaignName": 'abc',
"campaignCfid": 789, "budgetImpressions": 50},
{"startDate": 456, "endDate": 789, "campaignName": 'def',
"campaignCfid": 123, "budgetImpressions": 80}]
df = pd.DataFrame(data)
grouped = df.groupby(['startDate', 'endDate', 'campaignCfid',
'campaignName']).agg(sum)
print grouped.reset_index().to_dict('records')
This prints:
[{'startDate': 123L, 'campaignCfid': 789L, 'endDate': 456L, 'budgetImpressions': 60L, 'campaignName': 'abc'}, {'startDate': 456L, 'campaignCfid': 123L, 'endDate': 789L, 'budgetImpressions': 80L, 'campaignName': 'def'}]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With