Pandas: for all set of duplicate entries in a particular column, grab some information

Question

I have a large Dataframe that looks similar to this:

     ID_Code    Status1    Status2
0      A          Done       Not
1      A          Done       Done
2      B          Not        Not
3      B          Not        Done
4      C          Not        Not
5      C          Not        Not
6      C          Done       Done

What I want to do is calculate is for each of the set of duplicate ID codes, find out the percentage of Not-Not entries are present. (i.e. [# of Not-Not/# of total entries] * 100)

I'm struggling to do so using groupby and can't seem to get the right syntax to perform this.

sacuL · Accepted Answer

I may have misunderstood the question, but you appear to be referring to when values of Status1 and Status2 are both Not, correct? If that's the case, you can do something like:

df.groupby('ID_Code').apply(lambda x: (x[['Status1','Status2']] == 'Not').all(1).sum()/len(x)*100)

ID_Code
A     0.000000
B    50.000000
C    66.666667
dtype: float64

BENY · Answer

IIUC using crosstab

pd.crosstab(df['ID_Code'],(df['Status1'].eq('Not'))&(df['Status2'].eq('Not')),normalize ='index')
Out[713]: 
col_0       False     True 
ID_Code                    
A        1.000000  0.000000
B        0.500000  0.500000
C        0.333333  0.666667



#pd.crosstab(df['ID_Code'],(df['Status1'].eq('Not'))&(df['Status2'].eq('Not')),normalize ='index')[True]

Pandas: for all set of duplicate entries in a particular column, grab some information

Tags:

python

pandas

pandas-groupby

iby.helmy

2 Answers

sacuL

BENY

Recent Activity

Donate For Us

Pandas: for all set of duplicate entries in a particular column, grab some information

Tags:

python

pandas

pandas-groupby

iby.helmy

2 Answers

sacuL

BENY

Related questions

Recent Activity

Donate For Us