Replacing values greater 1 in a large pandas dataframe

Question

I'm trying to replace all numbers greater than 1 with 1 while keeping the original 1s and 0s untouched in the entire dataframe with the minimal effort. Any support is appreciated!!

My dataframe looks something like this but contains way more columns and rows.

Report No   Apple   Orange   Lemon   Grape   Pear
One           5       0        2       1      1
Two           1       1        0       3      2
Three         0       0        2       1      3
Four          1       1        3       0      0
Five          4       0        0       1      1
Six           1       3        1       2      0

Desired Output:

Report No   Apple   Orange   Lemon   Grape   Pear
One           1       0        1       1      1
Two           1       1        0       1      1
Three         0       0        1       1      1
Four          1       1        1       0      0
Five          1       0        0       1      1
Six           1       1        1       1      0

Ch3steR · Accepted Answer

You can try this.

Using boolean mask

df.set_index('Report No',inplace=True)
df[df > 1] = 1
df.reset_index()

Report No   Apple   Orange   Lemon   Grape   Pear
One           1       0        1       1      1
Two           1       1        0       1      1
Three         0       0        1       1      1
Four          1       1        1       0      0
Five          1       0        0       1      1
Six           1       1        1       1      0

Or use this if you have some non numeric columns. No need to use set_index and reset_index. This is equivalent to df.select_dtypes('number')

val = df._get_numeric_data()
val[val > 1] = 1
df
Report No   Apple   Orange   Lemon   Grape   Pear
One           1       0        1       1      1
Two           1       1        0       1      1
Three         0       0        1       1      1
Four          1       1        1       0      0
Five          1       0        0       1      1
Six           1       1        1       1      0

`df.mask`

df.set_index('Report No',inplace=True)
df.mask(df>1,1).reset_index()
Report No   Apple   Orange   Lemon   Grape   Pear
One           1       0        1       1      1
Two           1       1        0       1      1
Three         0       0        1       1      1
Four          1       1        1       0      0
Five          1       0        0       1      1
Six           1       1        1       1      0

`np.where`

df[df.columns[1:]] = df.iloc[:,1:].where(df.iloc[:,1:] >1 ,1)

`np.select`

This can be helpful when dealing with multiple conditions. If you want to convert values less than 0 to 0 and values greater than 1 to 1.

df.set_index('Report No', inplace=True)
condlist = [df >= 1, df <= 0] #you can have more conditions and add choices accordingly.
choice = [1, 0] #len(condlist) should be equal to len(choice).
df.loc[:] = np.select(condlist, choice)

Like Jan mentioned use df.clip

Not recommended but you can try this for fun. Using df.astype.

df.set_index('Report No',inplace=True)
df.astype('bool').astype('int')

NOTE: This will only convert falsy values to False and truthy values to True i.e. this will convert 0 to False and anything other than 0 is True even negative numbers.

s = pd.Series([1,-1,0])
s.astype('bool')
0     True
1     True
2    False
dtype: bool

s.astype('bool').astype('int')
0    1
1    1
2    0
dtype: int32

`np.sign`

When values present are between [0, n] i.e no negative values.

df.loc[:] = np.sign(df)

Jan Christoph Terasa · Answer

Use pandas.DataFrame.clip:

new_df = df.clip(0, 1)

EDIT: To exclude the first column by name (this will edit the DataFrame in-place)

mask = df.columns != "Report No"
df.loc[:, mask] = df.loc[:, mask].clip(0, 1)

Replacing values greater 1 in a large pandas dataframe

Tags:

python

pandas

dataframe

Matthias Gallagher

2 Answers

Using boolean mask

`df.mask`

`np.where`

`np.select`

`np.sign`

Ch3steR

Jan Christoph Terasa

Recent Activity

Donate For Us

Replacing values greater 1 in a large pandas dataframe

Tags:

python

pandas

dataframe

Matthias Gallagher

2 Answers

Using boolean mask

df.mask

np.where

np.select

np.sign

Ch3steR

Jan Christoph Terasa

Related questions

Recent Activity

Donate For Us

`df.mask`

`np.where`

`np.select`

`np.sign`