I would like to replace a value in my Pandas dataframe in Python. (replace float with string). I know the value itself, but not the column nor the row and want to run it afterwards with different inputs. I have the following dataframe: <pre class="prettyprint"><code> P1899 P3486 P4074 P3352 P3500 P3447 Time 1997 100.0 89.745739 85.198939 87.377584 114.755270 81.131599 1998 100.0 101.597557 83.468442 86.369083 106.031629 95.263796 1999 100.0 97.234551 91.262551 88.759609 104.539337 95.859980 2000 100.0 100.759918 74.236098 88.295711 103.739557 90.272329 2001 100.0 96.873469 86.075067 87.530995 106.371072 91.807542 2002 100.0 95.000000 90.313561 82.699342 109.279845 94.444444 </code></pre> now I want to replace values larger than 110 with 'OVER' and smaller than 90 with 'UNDER'. I used the following, since I couldn't get any results with a for loop. I used lambda: <pre class="prettyprint"><code>annual_rainfall_perc = annual_rainfall_perc.apply(lambda x: np.where(x > 110, 2000, x)) annual_rainfall_perc = annual_rainfall_perc.apply(lambda x: np.where(x < 90, 'UNDER', round(x, 2))) </code></pre> Here I replaced all bigger values with 2000, because otherwise the second lambda won't be able to check a dataframe containing floats and strings... My dataframe now looks like the following: <pre class="prettyprint"><code> P1899 P3486 P4074 P3352 P3500 P3447 Time 1997 100.0 Under Under Under 2000.0 Under 1998 100.0 101.6 Under Under 106.03 95.26 1999 100.0 97.23 91.26 Under 104.54 95.86 2000 100.0 100.76 Under Under 103.74 90.27 2001 100.0 96.87 Under Under 106.37 91.81 2002 100.0 95.0 90.31 Under 109.28 94.44 </code></pre> So now I was planning to replace all values equal to 2000 with 'OVER'. How do I do that? I tried: <pre class="prettyprint"><code>for x in annual_rainfall_perc: for i in x: if i == 2000: annual_rainfall_perc[x][i]= 'Over' else: annual_rainfall_perc=annual_rainfall_perc print(annual_rainfall_perc) </code></pre> but nothing in the dataframe changed. Is there another way to this?

Here's a way to do it in a vectorized manner. Do all the strings operations in a separate data frame, and then assign the relevant values in one go: <pre class="prettyprint"><code>new_df = df.copy() new_df.loc[:, :] = " " new_df[df > 110] = "over" new_df[df < 90] = "under" df[(df < 90) | (df > 110)] = new_df </code></pre> The result: <pre class="prettyprint"><code> P1899 P3486 P4074 P3352 P3500 P3447 Time 1997 100.0 under under under over under 1998 100.0 101.598 under under 106.032 95.2638 1999 100.0 97.2346 91.2626 under 104.539 95.86 2000 100.0 100.76 under under 103.74 90.2723 2001 100.0 96.8735 under under 106.371 91.8075 2002 100.0 95 90.3136 under 109.28 94.4444 </code></pre>

Replace certain value in pandas Dataframe without knowing neither column nor row

Tags:

python

replace

pandas

dataframe

I would like to replace a value in my Pandas dataframe in Python. (replace float with string). I know the value itself, but not the column nor the row and want to run it afterwards with different inputs. I have the following dataframe:

     P1899       P3486      P4074      P3352       P3500      P3447
Time                                                                
1997  100.0   89.745739  85.198939  87.377584  114.755270  81.131599
1998  100.0  101.597557  83.468442  86.369083  106.031629  95.263796
1999  100.0   97.234551  91.262551  88.759609  104.539337  95.859980
2000  100.0  100.759918  74.236098  88.295711  103.739557  90.272329
2001  100.0   96.873469  86.075067  87.530995  106.371072  91.807542
2002  100.0   95.000000  90.313561  82.699342  109.279845  94.444444

now I want to replace values larger than 110 with 'OVER' and smaller than 90 with 'UNDER'. I used the following, since I couldn't get any results with a for loop. I used lambda:

annual_rainfall_perc = annual_rainfall_perc.apply(lambda x: np.where(x > 110, 2000, x))
annual_rainfall_perc = annual_rainfall_perc.apply(lambda x: np.where(x < 90, 'UNDER', round(x, 2)))

Here I replaced all bigger values with 2000, because otherwise the second lambda won't be able to check a dataframe containing floats and strings... My dataframe now looks like the following:

     P1899   P3486  P4074  P3352   P3500  P3447
Time                                            
1997  100.0   Under  Under  Under  2000.0  Under
1998  100.0   101.6  Under  Under  106.03  95.26
1999  100.0   97.23  91.26  Under  104.54  95.86
2000  100.0  100.76  Under  Under  103.74  90.27
2001  100.0   96.87  Under  Under  106.37  91.81
2002  100.0    95.0  90.31  Under  109.28  94.44

So now I was planning to replace all values equal to 2000 with 'OVER'. How do I do that?

I tried:

for x in annual_rainfall_perc:
    for i in x:
        if i == 2000:
            annual_rainfall_perc[x][i]= 'Over'
        else:
            annual_rainfall_perc=annual_rainfall_perc
print(annual_rainfall_perc)

but nothing in the dataframe changed. Is there another way to this?

913

asked Jun 22 '20 13:06

ma2020

2 Answers

Very simple using mask:

df.mask(df>110,'OVER').mask(df<90,'UNDER')

Result:

      P1899    P3486    P4074  P3352    P3500    P3447
Time                                                  
1997    100    UNDER    UNDER  UNDER     OVER    UNDER
1998    100  101.598    UNDER  UNDER  106.032  95.2638
1999    100  97.2346  91.2626  UNDER  104.539    95.86
2000    100   100.76    UNDER  UNDER   103.74  90.2723
2001    100  96.8735    UNDER  UNDER  106.371  91.8075
2002    100       95  90.3136  UNDER   109.28  94.4444

199

answered Sep 30 '22 18:09

Stef

Here's a way to do it in a vectorized manner. Do all the strings operations in a separate data frame, and then assign the relevant values in one go:

new_df = df.copy()

new_df.loc[:, :] = " "
new_df[df > 110] = "over"
new_df[df < 90] = "under"

df[(df < 90) | (df > 110)] = new_df

The result:

      P1899    P3486    P4074  P3352    P3500    P3447
Time                                                  
1997  100.0    under    under  under     over    under
1998  100.0  101.598    under  under  106.032  95.2638
1999  100.0  97.2346  91.2626  under  104.539    95.86
2000  100.0   100.76    under  under   103.74  90.2723
2001  100.0  96.8735    under  under  106.371  91.8075
2002  100.0       95  90.3136  under   109.28  94.4444

answered Sep 30 '22 17:09

Roy2012

Related questions
                            
                                Rolling apply function must be real number, not Nonetype
                            
                                Removing lower case letter in column of Pandas dataframe
                            
                                can I split numpy array with mask?
                            
                                I need help making a discord py temp mute command in discord py
                            
                                How to fix ValueError: multiclass format is not supported
                            
                                kivy camera application with opencv in android shows black screen
                            
                                How to create a new column for each unique component in a given column of a dataframe in Pandas?
                            
                                How to open a project folder in Spyder IDE?
                            
                                browser_switcher_service.cc(238)] XXX Init() error with Python Selenium Script with Chrome for Web Scraping
                            
                                What is the most Pythonic way of processing messages like this Java "instance-filtering" [RabbitMQ]
                            
                                pandas : pd.concat results in duplicated columns
                            
                                Networkx: how to specify multiple roots for plotting multiple trees at once?
                            
                                Test Pydantic settings in FastAPI
                            
                                Package requires a different Python: 2.7.17 not in '>=3.6.1' while setting up pre-commit
                            
                                How to catch concurrent.futures._base.TimeoutError correctly when using asyncio.wait_for and asyncio.Semaphore?
                            
                                Does it make sense to build a residual network with only fully connected layers (instedad of convolutional layers)?
                            
                                Random number generator with conditions - Python
                            
                                Tensorflow Keras RMSE metric returns different results than my own built RMSE loss function
                            
                                How to Access Private Github Repo File (.csv) in Python using Pandas or Requests
                            
                                How do I read project dependencies from pyproject.toml from my setup.py, to avoid duplicating the information in both files?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With