I have a DF in Pandas, which looks like:
Letters Numbers
A 1
A 3
A 2
A 1
B 1
B 2
B 3
C 2
C 2
I'm looking to count the number of similar rows and save the result in a third column. For example, the output I'm looking for:
Letters Numbers Events
A 1 2
A 2 1
A 3 1
B 1 1
B 2 1
B 3 1
C 2 2
An example of what I'm looking to do is here. The best idea I've come up with is to use count_values()
, but I think this is just for one column. Another idea is to use duplicated()
, anyway I don't want construct any for
-loop. I'm pretty sure, that a Pythonic alternative to a for loop exists.
You can groupby these two columns and then calculate the sizes of the groups:
In [16]: df.groupby(['Letters', 'Numbers']).size()
Out[16]:
Letters Numbers
A 1 2
2 1
3 1
B 1 1
2 1
3 1
C 2 2
dtype: int64
To get a DataFrame like in your example output, you can reset the index with reset_index
.
You can use a combination of groupby
, transform
and then drop_duplicates
In [84]:
df['Events'] = df.groupby('Letters')['Numbers'].transform(pd.Series.value_counts)
df.drop_duplicates()
Out[84]:
Letters Numbers Events
0 A 1 2
1 A 3 1
2 A 2 1
4 B 1 1
5 B 2 1
6 B 3 1
7 C 2 2
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With