I was wondering if anyone knew of a better method to what I am currently doing. Here is an example data set:
ID Number
a 1
a 2
a 3
b 4
c 5
c 6
c 7
c 8
Example: if I wanted to get a count of Numbers by ID column in the table above. I would first do a groupby ID and do a count on Number, then merge the results back to the original table like so:
df2 = df.groupby('ID').agg({'Number':'count'}).reset_index()
df2 = df2.rename(columns = {'Number':'Number_Count'})
df = pd.merge(df, df2, on = ['ID'])
This results in:
It feels like a roundabout way of doing this, does anyone know a better alternative? The reason I ask is because when working with large data sets, this method can chew up a lot of memory (by creating another table and then merging them).
You can do that quite simply with this:
import pandas as pd
df = pd.DataFrame({'ID': list('aaabcccc'),
'Number': range(1,9)})
df['Number_Count'] = df.groupby('ID').transform('count')
df
# ID Number Number_Count
#0 a 1 3
#1 a 2 3
#2 a 3 3
#3 b 4 1
#4 c 5 4
#5 c 6 4
#6 c 7 4
#7 c 8 4
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With