Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Replacing values greater 1 in a large pandas dataframe

I'm trying to replace all numbers greater than 1 with 1 while keeping the original 1s and 0s untouched in the entire dataframe with the minimal effort. Any support is appreciated!!

My dataframe looks something like this but contains way more columns and rows.

Report No   Apple   Orange   Lemon   Grape   Pear
One           5       0        2       1      1
Two           1       1        0       3      2
Three         0       0        2       1      3
Four          1       1        3       0      0
Five          4       0        0       1      1
Six           1       3        1       2      0

Desired Output:

Report No   Apple   Orange   Lemon   Grape   Pear
One           1       0        1       1      1
Two           1       1        0       1      1
Three         0       0        1       1      1
Four          1       1        1       0      0
Five          1       0        0       1      1
Six           1       1        1       1      0
like image 428
Matthias Gallagher Avatar asked Mar 02 '23 10:03

Matthias Gallagher


2 Answers

You can try this.

Using boolean mask

df.set_index('Report No',inplace=True)
df[df > 1] = 1
df.reset_index()

Report No   Apple   Orange   Lemon   Grape   Pear
One           1       0        1       1      1
Two           1       1        0       1      1
Three         0       0        1       1      1
Four          1       1        1       0      0
Five          1       0        0       1      1
Six           1       1        1       1      0

Or use this if you have some non numeric columns. No need to use set_index and reset_index. This is equivalent to df.select_dtypes('number')

val = df._get_numeric_data()
val[val > 1] = 1
df
Report No   Apple   Orange   Lemon   Grape   Pear
One           1       0        1       1      1
Two           1       1        0       1      1
Three         0       0        1       1      1
Four          1       1        1       0      0
Five          1       0        0       1      1
Six           1       1        1       1      0

df.mask

df.set_index('Report No',inplace=True)
df.mask(df>1,1).reset_index()
Report No   Apple   Orange   Lemon   Grape   Pear
One           1       0        1       1      1
Two           1       1        0       1      1
Three         0       0        1       1      1
Four          1       1        1       0      0
Five          1       0        0       1      1
Six           1       1        1       1      0

np.where

df[df.columns[1:]] = df.iloc[:,1:].where(df.iloc[:,1:] >1 ,1)

np.select

This can be helpful when dealing with multiple conditions. If you want to convert values less than 0 to 0 and values greater than 1 to 1.

df.set_index('Report No', inplace=True)
condlist = [df >= 1, df <= 0] #you can have more conditions and add choices accordingly.
choice = [1, 0] #len(condlist) should be equal to len(choice).
df.loc[:] = np.select(condlist, choice)

Like Jan mentioned use df.clip


Not recommended but you can try this for fun. Using df.astype.

df.set_index('Report No',inplace=True)
df.astype('bool').astype('int')

NOTE: This will only convert falsy values to False and truthy values to True i.e. this will convert 0 to False and anything other than 0 is True even negative numbers.

s = pd.Series([1,-1,0])
s.astype('bool')
0     True
1     True
2    False
dtype: bool

s.astype('bool').astype('int')
0    1
1    1
2    0
dtype: int32

np.sign

When values present are between [0, n] i.e no negative values.

df.loc[:] = np.sign(df)
like image 150
Ch3steR Avatar answered Mar 05 '23 15:03

Ch3steR


Use pandas.DataFrame.clip:

new_df = df.clip(0, 1)

EDIT: To exclude the first column by name (this will edit the DataFrame in-place)

mask = df.columns != "Report No"
df.loc[:, mask] = df.loc[:, mask].clip(0, 1)
like image 40
Jan Christoph Terasa Avatar answered Mar 05 '23 17:03

Jan Christoph Terasa