The DataFrame named df
is shown as follows.
import pandas as pd
df = pd.DataFrame({'id': [1, 1, 3]})
Input:
id
0 1
1 1
2 3
I want to count the number of each id
, and take the result as a new column count
.
Expected:
id count
0 1 2
1 1 2
2 3 1
pd.factorize
and np.bincount
My favorite. factorize
does not sort and has time complexity of O(n)
. For big data sets, factorize
should be preferred over np.unique
i, u = df.id.factorize()
df.assign(Count=np.bincount(i)[i])
id Count
0 1 2
1 1 2
2 3 1
np.unique
and np.bincount
u, i = np.unique(df.id, return_inverse=True)
df.assign(Count=np.bincount(i)[i])
id Count
0 1 2
1 1 2
2 3 1
Assign the new count
column to the dataframe by grouping on id
and then transforming that column with value_counts
(or size
).
>>> f.assign(count=f.groupby('id')['id'].transform('value_counts'))
id count
0 1 2
1 1 2
2 3 1
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With