Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas: Selecting and modifying dataframe based on even more complex criteria

Tags:

python

pandas

I was looking at this and this threads, and though my question is not so different, it has a few differences. I have a dataframe full of floats, that I want to replace by strings. Say:

      A     B       C
 A    0     1.5     13
 B    0.5   100.2   7.3
 C    1.3   34      0.01

To this table I want to replace by several criteria, but only the first replacement works:

df[df<1]='N' # Works
df[(df>1)&(df<10)]#='L' # Doesn't work
df[(df>10)&(df<50)]='M'  # Doesn't work
df[df>50]='H'  # Doesn't work

If I instead do the selection for the 2nd line based on float, still doesn't work:

((df.applymap(type)==float) & (df<10) & (df>1)) #Doesn't work

I was wondering how to apply pd.DataFrame().mask in here, or any other way. How should I solve this?

Alternatively, I know I may read column by column and apply the substitutions on each series, but this seems a bit counter productive

Edit: Could anyone explain why the 4 simple assignments above do not work?

like image 681
Sos Avatar asked May 30 '18 13:05

Sos


2 Answers

Use numpy.select with DataFrame constructor:

m1 = df < 1
m2 = (df>1)&(df<10)
m3 = (df>10)&(df<50)
m4 = df>5

vals = list('NLMH')

df = pd.DataFrame(np.select([m1,m2,m3,m4], vals), index=df.index, columns=df.columns)
print (df)
   A  B  C
A  N  L  M
B  N  H  L
C  L  M  N
like image 156
jezrael Avatar answered Oct 28 '22 23:10

jezrael


By using pd.cut

pd.cut(df.stack(),[-1,1,10,50,np.inf],labels=list('NLMH')).unstack()
Out[309]: 
   A  B  C
A  N  L  M
B  N  H  L
C  L  M  N
like image 35
BENY Avatar answered Oct 29 '22 00:10

BENY