I'm trying to replace all numbers greater than 1 with 1 while keeping the original 1s and 0s untouched in the entire dataframe with the minimal effort. Any support is appreciated!!
My dataframe looks something like this but contains way more columns and rows.
Report No Apple Orange Lemon Grape Pear
One 5 0 2 1 1
Two 1 1 0 3 2
Three 0 0 2 1 3
Four 1 1 3 0 0
Five 4 0 0 1 1
Six 1 3 1 2 0
Desired Output:
Report No Apple Orange Lemon Grape Pear
One 1 0 1 1 1
Two 1 1 0 1 1
Three 0 0 1 1 1
Four 1 1 1 0 0
Five 1 0 0 1 1
Six 1 1 1 1 0
You can try this.
df.set_index('Report No',inplace=True)
df[df > 1] = 1
df.reset_index()
Report No Apple Orange Lemon Grape Pear
One 1 0 1 1 1
Two 1 1 0 1 1
Three 0 0 1 1 1
Four 1 1 1 0 0
Five 1 0 0 1 1
Six 1 1 1 1 0
Or use this if you have some non numeric columns. No need to use set_index
and reset_index
. This is equivalent to df.select_dtypes('number')
val = df._get_numeric_data()
val[val > 1] = 1
df
Report No Apple Orange Lemon Grape Pear
One 1 0 1 1 1
Two 1 1 0 1 1
Three 0 0 1 1 1
Four 1 1 1 0 0
Five 1 0 0 1 1
Six 1 1 1 1 0
df.mask
df.set_index('Report No',inplace=True)
df.mask(df>1,1).reset_index()
Report No Apple Orange Lemon Grape Pear
One 1 0 1 1 1
Two 1 1 0 1 1
Three 0 0 1 1 1
Four 1 1 1 0 0
Five 1 0 0 1 1
Six 1 1 1 1 0
np.where
df[df.columns[1:]] = df.iloc[:,1:].where(df.iloc[:,1:] >1 ,1)
np.select
This can be helpful when dealing with multiple conditions. If you want to convert values less than 0 to 0 and values greater than 1 to 1.
df.set_index('Report No', inplace=True)
condlist = [df >= 1, df <= 0] #you can have more conditions and add choices accordingly.
choice = [1, 0] #len(condlist) should be equal to len(choice).
df.loc[:] = np.select(condlist, choice)
Like Jan mentioned use df.clip
Not recommended but you can try this for fun. Using df.astype
.
df.set_index('Report No',inplace=True)
df.astype('bool').astype('int')
NOTE: This will only convert falsy values to False
and truthy values to True
i.e. this will convert 0
to False
and anything other than 0
is True
even negative numbers.
s = pd.Series([1,-1,0])
s.astype('bool')
0 True
1 True
2 False
dtype: bool
s.astype('bool').astype('int')
0 1
1 1
2 0
dtype: int32
np.sign
When values present are between [0, n]
i.e no negative values.
df.loc[:] = np.sign(df)
Use pandas.DataFrame.clip
:
new_df = df.clip(0, 1)
EDIT: To exclude the first column by name (this will edit the DataFrame in-place)
mask = df.columns != "Report No"
df.loc[:, mask] = df.loc[:, mask].clip(0, 1)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With