Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas: Change values in multiple columns according to boolean condition

Tags:

python

pandas

This should be a simple thing to do but somehow I can't wrap my head around all the different ways of selecting and masking things in Pandas yet.

So for a big dataframe (read in from a csv file) I want to change the values of a list of columns according to some boolean condition (tested on the same selected columns).

I tried something like this already, which doesn't work because of a mismatch of dimensions:

df.loc[df[my_cols]>0, my_cols] = 1

This also doesn't work (because I'm trying to change values in the wrong columns I think):

df[df[my_cols]>0] = 1

And this doesn't work because I'm only changing a copy of the dataframe:

df[my_cols][df[my_cols]>0] = 1

Here is the output of df.info:

Int64Index: 186171 entries, 0 to 186170
Columns: 737 entries, id to 733:zorg
dtypes: float64(734), int64(1), object(2)
memory usage: 1.0+ GB

Can some more advanced Pandas user help? Thank you.

like image 320
aurora Avatar asked Jul 31 '15 13:07

aurora


2 Answers

So here is how I finally got the desired result, but I feel there must be a more pandas-ish solution for this task.

for col in my_cols:
    df.loc[df[col]>0, col] = 1 
like image 114
aurora Avatar answered Nov 03 '22 01:11

aurora


Try pandas.DataFrame.where

Return an object of same shape as self and whose corresponding entries are from self where cond is True and otherwise are from other.

In your case this would become:

df[my_cols] = df[my_cols].where(~(df[my_cols]>0),other=1)
like image 43
jonaz Avatar answered Nov 03 '22 02:11

jonaz