I'm trying to replace all numbers greater than 1 with 1 while keeping the original 1s and 0s untouched in the entire dataframe with the minimal effort. Any support is appreciated!!
My dataframe looks something like this but contains way more columns and rows.
Report No   Apple   Orange   Lemon   Grape   Pear
One           5       0        2       1      1
Two           1       1        0       3      2
Three         0       0        2       1      3
Four          1       1        3       0      0
Five          4       0        0       1      1
Six           1       3        1       2      0
Desired Output:
Report No   Apple   Orange   Lemon   Grape   Pear
One           1       0        1       1      1
Two           1       1        0       1      1
Three         0       0        1       1      1
Four          1       1        1       0      0
Five          1       0        0       1      1
Six           1       1        1       1      0
                You can try this.
df.set_index('Report No',inplace=True)
df[df > 1] = 1
df.reset_index()
Report No   Apple   Orange   Lemon   Grape   Pear
One           1       0        1       1      1
Two           1       1        0       1      1
Three         0       0        1       1      1
Four          1       1        1       0      0
Five          1       0        0       1      1
Six           1       1        1       1      0
Or use this if you have some non numeric columns. No need to use set_index and reset_index. This is equivalent to df.select_dtypes('number')
val = df._get_numeric_data()
val[val > 1] = 1
df
Report No   Apple   Orange   Lemon   Grape   Pear
One           1       0        1       1      1
Two           1       1        0       1      1
Three         0       0        1       1      1
Four          1       1        1       0      0
Five          1       0        0       1      1
Six           1       1        1       1      0
df.maskdf.set_index('Report No',inplace=True)
df.mask(df>1,1).reset_index()
Report No   Apple   Orange   Lemon   Grape   Pear
One           1       0        1       1      1
Two           1       1        0       1      1
Three         0       0        1       1      1
Four          1       1        1       0      0
Five          1       0        0       1      1
Six           1       1        1       1      0
np.wheredf[df.columns[1:]] = df.iloc[:,1:].where(df.iloc[:,1:] >1 ,1)
np.selectThis can be helpful when dealing with multiple conditions. If you want to convert values less than 0 to 0 and values greater than 1 to 1.
df.set_index('Report No', inplace=True)
condlist = [df >= 1, df <= 0] #you can have more conditions and add choices accordingly.
choice = [1, 0] #len(condlist) should be equal to len(choice).
df.loc[:] = np.select(condlist, choice)
Like Jan mentioned use df.clip
Not recommended but you can try this for fun. Using df.astype.
df.set_index('Report No',inplace=True)
df.astype('bool').astype('int')
NOTE: This will only convert falsy values to False and truthy values to True i.e. this will convert 0 to False and anything other than 0 is True even negative numbers.
s = pd.Series([1,-1,0])
s.astype('bool')
0     True
1     True
2    False
dtype: bool
s.astype('bool').astype('int')
0    1
1    1
2    0
dtype: int32
np.signWhen values present are between [0, n] i.e no negative values.
df.loc[:] = np.sign(df)
                        Use pandas.DataFrame.clip:
new_df = df.clip(0, 1)
EDIT: To exclude the first column by name (this will edit the DataFrame in-place)
mask = df.columns != "Report No"
df.loc[:, mask] = df.loc[:, mask].clip(0, 1)
                        If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With