Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Better alternative to a groupby with a merge [duplicate]

I was wondering if anyone knew of a better method to what I am currently doing. Here is an example data set:

ID  Number
a   1
a   2
a   3
b   4
c   5
c   6
c   7
c   8

Example: if I wanted to get a count of Numbers by ID column in the table above. I would first do a groupby ID and do a count on Number, then merge the results back to the original table like so:

df2 = df.groupby('ID').agg({'Number':'count'}).reset_index()

df2 = df2.rename(columns = {'Number':'Number_Count'})

df = pd.merge(df, df2, on = ['ID'])

This results in:

enter image description here

It feels like a roundabout way of doing this, does anyone know a better alternative? The reason I ask is because when working with large data sets, this method can chew up a lot of memory (by creating another table and then merging them).

like image 832
Brian Avatar asked Oct 15 '25 14:10

Brian


1 Answers

You can do that quite simply with this:

import pandas as pd

df = pd.DataFrame({'ID': list('aaabcccc'),
                   'Number': range(1,9)})

df['Number_Count'] = df.groupby('ID').transform('count')

df

#  ID  Number  Number_Count
#0  a       1             3
#1  a       2             3
#2  a       3             3
#3  b       4             1
#4  c       5             4
#5  c       6             4
#6  c       7             4
#7  c       8             4
like image 123
zipa Avatar answered Oct 18 '25 07:10

zipa