Get percentage of rows (strings) that fulfil a certain condition in a pandas data frame

Question

I have this data frame:

df = pd.DataFrame({"A": ["Used", "Not used", "Not used", "Not used", "Used",
                         "Not used", "Used", "Used", "Used", "Not used"],
                   "B": ["Used", "Used", "Used", "Not used", "Not used",
                        "Used", "Not used", "Not used", "Used", "Not used"]})

I would like to find the quickest, cleanest way to find out the following:

The percentage of rows of all the rows that have used A.
The percentage of rows of all the rows that have used B.
The percentage of rows of all the rows that have used A and B.

I am new to Python and pandas (and coding in general), so I am sure this is very simple, but any guidance would be appreciated. I have tried groupby().aggregate(sum) but I did not get the result I needed (I would imagine because these are characters rather than integers.

jezrael · Accepted Answer

If need all values percentages use value_counts with normalize=True, for multiple columns groupby with size for lengths of all pairs and divide it by length of df (same as length of index):

print (100 * df['A'].value_counts(normalize=True))
Not used    50.0
Used        50.0
Name: A, dtype: float64

print (100 * df['B'].value_counts(normalize=True))
Not used    50.0
Used        50.0
Name: B, dtype: float64

print (100 * df.groupby(['A','B']).size() / len(df.index))
A         B       
Not used  Not used    20.0
          Used        30.0
Used      Not used    30.0
          Used        20.0
dtype: float64

If need filter values create mask and get mean - Trues are processed like 1s:

print (100 * df['A'].eq('Used').mean())
#alternative
#print (100 * (df['B'] == 'Used').mean())
50.0

print (100 * df['B'].eq('Used').mean())
#alternative
#print (100 * (df['B'] == 'Used').mean())
50.0

print (100 * (df['A'].eq('Used') & df['B'].eq('Used')).mean())
20.0

Zero · Answer

Use

1) Used A

In [4929]: 100.*df.A.eq('Used').sum()/df.shape[0]
Out[4929]: 50.0

2) Used B

In [4930]: 100.*df.B.eq('Used').sum()/df.shape[0]
Out[4930]: 50.0

3) Used A and Used B

In [4931]: 100.*(df.B.eq('Used') & df.A.eq('Used')).sum()/df.shape[0]
Out[4931]: 20.0

1) is same as

In [4933]: 100.*(df['A'] == 'Used').sum()/len(df.index)
Out[4933]: 50.0

Get percentage of rows (strings) that fulfil a certain condition in a pandas data frame

Tags:

python

pandas

pandas-groupby

BadAtCoding

2 Answers

jezrael

Zero

Recent Activity

Donate For Us

Get percentage of rows (strings) that fulfil a certain condition in a pandas data frame

Tags:

python

pandas

pandas-groupby

BadAtCoding

2 Answers

jezrael

Zero

Related questions

Recent Activity

Donate For Us