This should be a simple thing to do but somehow I can't wrap my head around all the different ways of selecting and masking things in Pandas yet.
So for a big dataframe (read in from a csv file) I want to change the values of a list of columns according to some boolean condition (tested on the same selected columns).
I tried something like this already, which doesn't work because of a mismatch of dimensions:
df.loc[df[my_cols]>0, my_cols] = 1
This also doesn't work (because I'm trying to change values in the wrong columns I think):
df[df[my_cols]>0] = 1
And this doesn't work because I'm only changing a copy of the dataframe:
df[my_cols][df[my_cols]>0] = 1
Here is the output of df.info
:
Int64Index: 186171 entries, 0 to 186170
Columns: 737 entries, id to 733:zorg
dtypes: float64(734), int64(1), object(2)
memory usage: 1.0+ GB
Can some more advanced Pandas user help? Thank you.
So here is how I finally got the desired result, but I feel there must be a more pandas-ish solution for this task.
for col in my_cols:
df.loc[df[col]>0, col] = 1
Try pandas.DataFrame.where
Return an object of same shape as self and whose corresponding entries are from self where cond is True and otherwise are from other.
In your case this would become:
df[my_cols] = df[my_cols].where(~(df[my_cols]>0),other=1)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With