I have this data frame:
df = pd.DataFrame({"A": ["Used", "Not used", "Not used", "Not used", "Used",
"Not used", "Used", "Used", "Used", "Not used"],
"B": ["Used", "Used", "Used", "Not used", "Not used",
"Used", "Not used", "Not used", "Used", "Not used"]})
I would like to find the quickest, cleanest way to find out the following:
I am new to Python and pandas (and coding in general), so I am sure this is very simple, but any guidance would be appreciated. I have tried groupby().aggregate(sum) but I did not get the result I needed (I would imagine because these are characters rather than integers.
If need all values percentages use value_counts
with normalize=True
, for multiple columns groupby
with size
for lengths of all pairs and divide it by length of df
(same as length of index):
print (100 * df['A'].value_counts(normalize=True))
Not used 50.0
Used 50.0
Name: A, dtype: float64
print (100 * df['B'].value_counts(normalize=True))
Not used 50.0
Used 50.0
Name: B, dtype: float64
print (100 * df.groupby(['A','B']).size() / len(df.index))
A B
Not used Not used 20.0
Used 30.0
Used Not used 30.0
Used 20.0
dtype: float64
If need filter values create mask and get mean
- True
s are processed like 1
s:
print (100 * df['A'].eq('Used').mean())
#alternative
#print (100 * (df['B'] == 'Used').mean())
50.0
print (100 * df['B'].eq('Used').mean())
#alternative
#print (100 * (df['B'] == 'Used').mean())
50.0
print (100 * (df['A'].eq('Used') & df['B'].eq('Used')).mean())
20.0
Use
1) Used A
In [4929]: 100.*df.A.eq('Used').sum()/df.shape[0]
Out[4929]: 50.0
2) Used B
In [4930]: 100.*df.B.eq('Used').sum()/df.shape[0]
Out[4930]: 50.0
3) Used A and Used B
In [4931]: 100.*(df.B.eq('Used') & df.A.eq('Used')).sum()/df.shape[0]
Out[4931]: 20.0
1) is same as
In [4933]: 100.*(df['A'] == 'Used').sum()/len(df.index)
Out[4933]: 50.0
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With