I have to following df:
Col1 Col2
test Something
test2 Something
test3 Something
test Something
test2 Something
test5 Something
I want to get
Col1 Col2 Occur
test Something 2
test2 Something 2
test3 Something 1
test Something 2
test2 Something 2
test5 Something 1
I've tried to use:
df["Occur"] = df["Col1"].value_counts()
But it didn't help. I've got Occur column full of 'NaN'
We can count by using the value_counts() method. This function is used to count the values present in the entire dataframe and also count values in a particular column.
Using DataFrame. insert() method, we can add new columns at specific position of the column name sequence. Although insert takes single column name, value as input, but we can use it repeatedly to add multiple columns to the DataFrame.
You can count the number of duplicate rows by counting True in pandas. Series obtained with duplicated() . The number of True can be counted with sum() method.
You can also use GroupBy
+ transform
with size
:
df['Occur'] = df.groupby('Col1')['Col1'].transform('size')
print(df)
Col1 Col2 Occur
0 test Something 2
1 test2 Something 2
2 test3 Something 1
3 test Something 2
4 test2 Something 2
5 test5 Something 1
groupby
on 'col1' and then apply transform
on Col2
to return a Series with its index aligned to the original df so you can add it as a column:
In [3]:
df['Occur'] = df.groupby('Col1')['Col2'].transform(pd.Series.value_counts)
df
Out[3]:
Col1 Col2 Occur
0 test Something 2
1 test2 Something 2
2 test3 Something 1
3 test Something 2
4 test2 Something 2
5 test5 Something 1
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With