Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to count the element in a column and take the result as a new column?

The DataFrame named df is shown as follows.

import pandas as pd 
df = pd.DataFrame({'id': [1, 1, 3]})

Input:

   id
0   1
1   1
2   3

I want to count the number of each id, and take the result as a new column count.

Expected:

    id  count
0   1       2
1   1       2
2   3       1
like image 376
rosefun Avatar asked Dec 24 '22 04:12

rosefun


2 Answers

pd.factorize and np.bincount

My favorite. factorize does not sort and has time complexity of O(n). For big data sets, factorize should be preferred over np.unique

i, u = df.id.factorize()
df.assign(Count=np.bincount(i)[i])

   id  Count
0   1      2
1   1      2
2   3      1

np.unique and np.bincount

u, i = np.unique(df.id, return_inverse=True)
df.assign(Count=np.bincount(i)[i])

   id  Count
0   1      2
1   1      2
2   3      1
like image 72
piRSquared Avatar answered Apr 06 '23 01:04

piRSquared


Assign the new count column to the dataframe by grouping on id and then transforming that column with value_counts (or size).

>>> f.assign(count=f.groupby('id')['id'].transform('value_counts'))
   id  count
0   1      2
1   1      2
2   3      1
like image 29
Alexander Avatar answered Apr 06 '23 00:04

Alexander