Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Get percentage of rows (strings) that fulfil a certain condition in a pandas data frame

I have this data frame:

df = pd.DataFrame({"A": ["Used", "Not used", "Not used", "Not used", "Used",
                         "Not used", "Used", "Used", "Used", "Not used"],
                   "B": ["Used", "Used", "Used", "Not used", "Not used",
                        "Used", "Not used", "Not used", "Used", "Not used"]})

I would like to find the quickest, cleanest way to find out the following:

  • The percentage of rows of all the rows that have used A.
  • The percentage of rows of all the rows that have used B.
  • The percentage of rows of all the rows that have used A and B.

I am new to Python and pandas (and coding in general), so I am sure this is very simple, but any guidance would be appreciated. I have tried groupby().aggregate(sum) but I did not get the result I needed (I would imagine because these are characters rather than integers.

like image 458
BadAtCoding Avatar asked Sep 29 '17 11:09

BadAtCoding


2 Answers

If need all values percentages use value_counts with normalize=True, for multiple columns groupby with size for lengths of all pairs and divide it by length of df (same as length of index):

print (100 * df['A'].value_counts(normalize=True))
Not used    50.0
Used        50.0
Name: A, dtype: float64

print (100 * df['B'].value_counts(normalize=True))
Not used    50.0
Used        50.0
Name: B, dtype: float64

print (100 * df.groupby(['A','B']).size() / len(df.index))
A         B       
Not used  Not used    20.0
          Used        30.0
Used      Not used    30.0
          Used        20.0
dtype: float64

If need filter values create mask and get mean - Trues are processed like 1s:

print (100 * df['A'].eq('Used').mean())
#alternative
#print (100 * (df['B'] == 'Used').mean())
50.0

print (100 * df['B'].eq('Used').mean())
#alternative
#print (100 * (df['B'] == 'Used').mean())
50.0

print (100 * (df['A'].eq('Used') & df['B'].eq('Used')).mean())
20.0
like image 99
jezrael Avatar answered Oct 10 '22 01:10

jezrael


Use

1) Used A

In [4929]: 100.*df.A.eq('Used').sum()/df.shape[0]
Out[4929]: 50.0

2) Used B

In [4930]: 100.*df.B.eq('Used').sum()/df.shape[0]
Out[4930]: 50.0

3) Used A and Used B

In [4931]: 100.*(df.B.eq('Used') & df.A.eq('Used')).sum()/df.shape[0]
Out[4931]: 20.0

1) is same as

In [4933]: 100.*(df['A'] == 'Used').sum()/len(df.index)
Out[4933]: 50.0
like image 30
Zero Avatar answered Oct 10 '22 00:10

Zero