Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas squash data frame based on a column

Tags:

python

pandas

I am reading from an API which returns JSON I am using

from pandas.io.json import json_normalize 
flatten = json_normalize(data['results'])

To flatten the JSON and now the output is like

                                     breakdowns                 metric                  time         value   
0      [{u'key': u'platform', u'value': u'ios'}]      fb_ad_network_imp    2018-08-29T07:00:00+0000  12
1  [{u'key': u'platform', u'value': u'android'}]      fb_ad_network_imp    2018-08-29T07:00:00+0000  32
2      [{u'key': u'platform', u'value': u'ios'}]  fb_ad_network_request    2018-08-29T07:00:00+0000  33    
3  [{u'key': u'platform', u'value': u'android'}]  fb_ad_network_request    2018-08-29T07:00:00+0000  132 

now I want to squash these 4 rows into 2 based on the platform, something like this:

           platform    date         clicks     impressions
0          ios         2018-08-29   33         12
1          android     2018-08-29   132        32

I have also mapped these names:

fb_ad_network_request -> clicks
fb_ad_network_imp -> impressions

what's the best way to do that?

like image 227
Am1rr3zA Avatar asked Jun 09 '26 23:06

Am1rr3zA


2 Answers

You can using pivot_table after flatten the dict

dddd['platform']=pd.concat([pd.DataFrame(x) for x in dddd.breakdowns]).value.values
dddd.pivot_table(index=['platform','time'],columns='metric',values='value',aggfunc=sum).reset_index()
Out[237]: 
metric platform        time  fb_ad_network_imp  fb_ad_network_request
0       android  2018-08-29                 32                    132
1           ios  2018-08-29                 12                     33
like image 76
BENY Avatar answered Jun 11 '26 13:06

BENY


Setup

tmp = pd.Series([i[0].get('value', None) for i in df.breakdowns]).rename('platform')

mapping = {
    'columns': {
        'fb_ad_network_request': 'clicks',
        'fb_ad_network_imp': 'impressions',
        'time': 'date',
    }
}

Using groupby and unstack:

(df.join(tmp).groupby(['platform', df.time.dt.date, 'metric'])
    .value.sum().unstack().reset_index().rename(**mapping))

metric platform        date  impressions  clicks
0       android  2018-08-29           32     132
1           ios  2018-08-29           12      33
like image 41
user3483203 Avatar answered Jun 11 '26 14:06

user3483203



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!