Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Replacing values in a pandas dataframe based on multiple conditions

I have a fairly simple question based on this sample code:

x1 = 10*np.random.randn(10,3)
df1 = pd.DataFrame(x1)

I am looking for a single DataFrame derived from df1 where positive values are replaced with "up", negative values are replaced with "down", and 0 values, if any, are replaced with "zero". I have tried using the .where() and .mask() methods but could not obtain the desired result.

I have seen other posts which filter according to multiple conditions at once, but they do not show how to replace values according to different conditions.

like image 226
laszlopanaflex Avatar asked Nov 29 '22 13:11

laszlopanaflex


2 Answers

For multiple conditions ie. (df['employrate'] <=55) & (df['employrate'] > 50)

use this:

df['employrate'] = np.where(
   (df['employrate'] <=55) & (df['employrate'] > 50) , 11, df['employrate']
   )

or you can do it this way as well,

gm.loc[(gm['employrate'] <55) & (gm['employrate'] > 50),'employrate']=11

here informal syntax can be:

<dataset>.loc[<filter1> & (<filter2>),'<variable>']='<value>'

out[108]:
       country  employrate alcconsumption
0  Afghanistan   55.700001            .03
1      Albania   11.000000           7.29
2      Algeria   11.000000            .69
3      Andorra         nan          10.17
4       Angola   75.699997           5.57

therefore syntax we used here is:

 df['<column_name>'] = np.where((<filter 1> ) & (<filter 2>) , <new value>, df['column_name'])

for single condition, ie. ( 'employrate'] > 70 )

       country        employrate alcconsumption
0  Afghanistan  55.7000007629394            .03
1      Albania  51.4000015258789           7.29
2      Algeria              50.5            .69
3      Andorra                            10.17
4       Angola  75.6999969482422           5.57

use this:

df.loc[df['employrate'] > 70, 'employrate'] = 7

       country  employrate alcconsumption
0  Afghanistan   55.700001            .03
1      Albania   51.400002           7.29
2      Algeria   50.500000            .69
3      Andorra         nan          10.17
4       Angola    7.000000           5.57

therefore syntax here is:

df.loc[<mask>(here mask is generating the labels to index) , <optional column(s)> ]

like image 20
Harshit Jain Avatar answered Dec 18 '22 07:12

Harshit Jain


df1.apply(np.sign).replace({-1: 'down', 1: 'up', 0: 'zero'})

Output:

      0     1     2
0  down    up    up
1    up  down  down
2    up  down  down
3  down  down    up
4  down  down    up
5  down    up    up
6  down    up  down
7    up  down  down
8    up    up  down
9  down    up    up

P.S. Getting exactly zero with randn is pretty unlikely, of course

like image 64
perl Avatar answered Dec 18 '22 06:12

perl