I have a dataset that looks like this:
id col1 col2 col3
123 10 0 82
456 0 90 16
987 0 0 0
I'd like to replace all non-zero values in the columns that are not id with 1.
I've tried:
df.col1 = df.where(df.col1 != 0, 1)
But that replaces the non-zeroes with 1s and the zeros with the id.
Then I tried setting a new df to this function:
df2 = df.col1 = df.where(df.col1 != 0, 1)
And that works except it changes the id column to a 1 where it changed the non-zero value to a 1.
Any help troubleshooting the codes that are not working or another approach would be greatly appreciated!
Option 1
astype
df
col1 col2 col3
id
123 10 0 82
456 0 90 16
987 0 0 0
df.astype(bool).astype(int)
col1 col2 col3
id
123 1 0 1
456 0 1 1
987 0 0 0
Or, create a mask with gt:
df.gt(0).astype(int)
col1 col2 col3
id
123 1 0 1
456 0 1 1
987 0 0 0
If id isn't the index, set it first!:
df = df.set_index('id')
Option 2
mask
df.mask(df > 0, 1)
col1 col2 col3
id
123 1 0 1
456 0 1 1
987 0 0 0
Option 3
df.where (note the difference from your approach)
df.where(df == 0, 1)
col1 col2 col3
id
123 1 0 1
456 0 1 1
987 0 0 0
Similar solution with np.where:
pd.DataFrame(np.where(df > 0, 1, 0), index=df.index, columns=df.columns)
col1 col2 col3
id
123 1 0 1
456 0 1 1
987 0 0 0
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With