I would like to replace a value in my Pandas dataframe in Python. (replace float with string). I know the value itself, but not the column nor the row and want to run it afterwards with different inputs. I have the following dataframe:
P1899 P3486 P4074 P3352 P3500 P3447
Time
1997 100.0 89.745739 85.198939 87.377584 114.755270 81.131599
1998 100.0 101.597557 83.468442 86.369083 106.031629 95.263796
1999 100.0 97.234551 91.262551 88.759609 104.539337 95.859980
2000 100.0 100.759918 74.236098 88.295711 103.739557 90.272329
2001 100.0 96.873469 86.075067 87.530995 106.371072 91.807542
2002 100.0 95.000000 90.313561 82.699342 109.279845 94.444444
now I want to replace values larger than 110 with 'OVER' and smaller than 90 with 'UNDER'. I used the following, since I couldn't get any results with a for loop. I used lambda:
annual_rainfall_perc = annual_rainfall_perc.apply(lambda x: np.where(x > 110, 2000, x))
annual_rainfall_perc = annual_rainfall_perc.apply(lambda x: np.where(x < 90, 'UNDER', round(x, 2)))
Here I replaced all bigger values with 2000, because otherwise the second lambda won't be able to check a dataframe containing floats and strings... My dataframe now looks like the following:
P1899 P3486 P4074 P3352 P3500 P3447
Time
1997 100.0 Under Under Under 2000.0 Under
1998 100.0 101.6 Under Under 106.03 95.26
1999 100.0 97.23 91.26 Under 104.54 95.86
2000 100.0 100.76 Under Under 103.74 90.27
2001 100.0 96.87 Under Under 106.37 91.81
2002 100.0 95.0 90.31 Under 109.28 94.44
So now I was planning to replace all values equal to 2000 with 'OVER'. How do I do that?
I tried:
for x in annual_rainfall_perc:
for i in x:
if i == 2000:
annual_rainfall_perc[x][i]= 'Over'
else:
annual_rainfall_perc=annual_rainfall_perc
print(annual_rainfall_perc)
but nothing in the dataframe changed. Is there another way to this?
To replace multiple values in a DataFrame we can apply the method DataFrame. replace(). In Pandas DataFrame replace method is used to replace values within a dataframe object.
Very simple using mask
:
df.mask(df>110,'OVER').mask(df<90,'UNDER')
Result:
P1899 P3486 P4074 P3352 P3500 P3447
Time
1997 100 UNDER UNDER UNDER OVER UNDER
1998 100 101.598 UNDER UNDER 106.032 95.2638
1999 100 97.2346 91.2626 UNDER 104.539 95.86
2000 100 100.76 UNDER UNDER 103.74 90.2723
2001 100 96.8735 UNDER UNDER 106.371 91.8075
2002 100 95 90.3136 UNDER 109.28 94.4444
Here's a way to do it in a vectorized manner. Do all the strings operations in a separate data frame, and then assign the relevant values in one go:
new_df = df.copy()
new_df.loc[:, :] = " "
new_df[df > 110] = "over"
new_df[df < 90] = "under"
df[(df < 90) | (df > 110)] = new_df
The result:
P1899 P3486 P4074 P3352 P3500 P3447
Time
1997 100.0 under under under over under
1998 100.0 101.598 under under 106.032 95.2638
1999 100.0 97.2346 91.2626 under 104.539 95.86
2000 100.0 100.76 under under 103.74 90.2723
2001 100.0 96.8735 under under 106.371 91.8075
2002 100.0 95 90.3136 under 109.28 94.4444
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With