Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Aggregating Rows Pandas

I am quite new to pandas. I need to aggregate 'Names' if they have the same name and then make an average for 'Rating' and 'NumsHelpful' (without counting NaN). 'Review' should get concatenated whilst 'Weight(Pounds)'should remain untouched:

col names: ['Brand', 'Name', 'NumsHelpful', 'Rating', 'Weight(Pounds)', 'Review']

Name             'Brand'                             'Name'
1534             Zing Zang                Zing Zang Bloody Mary Mix, 32 fl oz   
1535             Zing Zang                Zing Zang Bloody Mary Mix, 32 fl oz   
1536             Zing Zang                Zing Zang Bloody Mary Mix, 32 fl oz   
1537             Zing Zang                Zing Zang Bloody Mary Mix, 32 fl oz   
1538             Zing Zang                Zing Zang Bloody Mary Mix, 32 fl oz   
1539             Zing Zang                Zing Zang Bloody Mary Mix, 32 fl oz   
1540             Zing Zang                Zing Zang Bloody Mary Mix, 32 fl oz   

        'NumsHelpful'     'Rating'       'Weight'
1534          NaN            2              4.5   
1535          NaN            2              4.5   
1536          NaN            NaN            4.5   
1537          NaN            NaN            4.5   
1538          2              NaN            4.5   
1539          3              5              4.5   
1540          5              NaN            4.5   

                        'Review'
1534                                     Yummy - Delish  
1535  The best Bloody Mary mix! - The best Bloody Ma...  
1536  Best Taste by far - I've tried several if not ...  
1537  Best bloody mary mix ever - This is also good ...  
1538  Outstanding - Has a small kick to it but very ...  
1539   OMG! So Good! - Spicy, terrific Bloody Mary mix!  
1540                      Good stuff - This is the best  

So the output should be something like this:

 'Brand'                'Name'                   'NumsHelpful'    'Rating' 
Zing Zang    Zing Zang Bloody Mary Mix, 32 fl oz     3.33             3

 'Weight'               'Review'
   4.5      Review1 / Review2 / ... / ReviewN

How shall I procede? Thanks.

like image 315
Stefano Pozzi Avatar asked Jul 08 '18 09:07

Stefano Pozzi


Video Answer


1 Answers

Use DataFrameGroupBy.agg with dictionary of columns and aggregated functions - columns Weight and Brand are agregated by first - it means first values per groups:

d = {'NumsHelpful':'mean', 
     'Review':'/'.join, 
     'Weight':'first',
     'Brand':'first', 
     'Rating':'mean'}
df = df.groupby('Name').agg(d).reset_index()
print (df)
                                  Name  NumsHelpful  \
0  Zing Zang Bloody Mary Mix, 32 fl oz     3.333333   

                                              Review  Weight      Brand  \
0  Yummy - Delish/The best Bloody Mary mix! - The...     4.5  Zing Zang   

   Rating  
0     3.0  

Also in pandas 0.23.1 pandas version get:

FutureWarning: 'Name' is both an index level and a column label. Defaulting to column, but this will raise an ambiguity error in a future version

Solution is remove index name Name:

df.index.name = None

Or:

df = df.rename_axis(None)

Another possible solution is not aggregate by first, but add these column to groupby:

d = {'NumsHelpful':'mean',  'Review':'/'.join, 'Rating':'mean'}
df = df.groupby(['Name', 'Weight','Brand']).agg(d).reset_index()

Both solutions return same output if per groups there are same values.

EDIT:

If need convert string (object) column to numeric first try convert by astype:

df['Weight(Pounds)'] = df['Weight(Pounds)'].astype(float)

And if it failed use to_numeric with parameter errors='coerce' for convert non parseable strings to NaNs:

df['Weight(Pounds)'] = pd.to_numeric(df['Weight(Pounds)'], errors='coerce')
like image 71
jezrael Avatar answered Oct 04 '22 12:10

jezrael