I am reading from an API which returns JSON I am using
from pandas.io.json import json_normalize
flatten = json_normalize(data['results'])
To flatten the JSON and now the output is like
breakdowns metric time value
0 [{u'key': u'platform', u'value': u'ios'}] fb_ad_network_imp 2018-08-29T07:00:00+0000 12
1 [{u'key': u'platform', u'value': u'android'}] fb_ad_network_imp 2018-08-29T07:00:00+0000 32
2 [{u'key': u'platform', u'value': u'ios'}] fb_ad_network_request 2018-08-29T07:00:00+0000 33
3 [{u'key': u'platform', u'value': u'android'}] fb_ad_network_request 2018-08-29T07:00:00+0000 132
now I want to squash these 4 rows into 2 based on the platform, something like this:
platform date clicks impressions
0 ios 2018-08-29 33 12
1 android 2018-08-29 132 32
I have also mapped these names:
fb_ad_network_request -> clicks
fb_ad_network_imp -> impressions
what's the best way to do that?
You can using pivot_table after flatten the dict
dddd['platform']=pd.concat([pd.DataFrame(x) for x in dddd.breakdowns]).value.values
dddd.pivot_table(index=['platform','time'],columns='metric',values='value',aggfunc=sum).reset_index()
Out[237]:
metric platform time fb_ad_network_imp fb_ad_network_request
0 android 2018-08-29 32 132
1 ios 2018-08-29 12 33
Setup
tmp = pd.Series([i[0].get('value', None) for i in df.breakdowns]).rename('platform')
mapping = {
'columns': {
'fb_ad_network_request': 'clicks',
'fb_ad_network_imp': 'impressions',
'time': 'date',
}
}
groupby and unstack:(df.join(tmp).groupby(['platform', df.time.dt.date, 'metric'])
.value.sum().unstack().reset_index().rename(**mapping))
metric platform date impressions clicks
0 android 2018-08-29 32 132
1 ios 2018-08-29 12 33
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With