so far my dataframe looks like this: <pre class="prettyprint"><code>ID Area Stage 1 P X 2 Q X 3 P X 4 Q Y </code></pre> I would like to replace the area 'Q' with 'P' for every row where the Stage is equal to 'X'. So the result should look like: <pre class="prettyprint"><code>ID Area Stage 1 P X 2 P X 3 P X 4 Q Y </code></pre> I tried: <pre class="prettyprint"><code>data.query('Stage in ["X"]')['Area']=data.query('Stage in ["X"]')['Area'].replace('Q','P') </code></pre> It does not work. Help is appreciated! :)

you can use 2 boolean conditions and use <code>loc</code>: <pre class="prettyprint"><code>df.loc[df['Area'].eq("Q") & df['Stage'].eq('X'),'Area']='P' print(df) </code></pre> <hr> <pre class="prettyprint"><code> ID Area Stage 0 1 P X 1 2 P X 2 3 P X 3 4 Q Y </code></pre> Or <code>np.where</code> <pre class="prettyprint"><code>df['Area'] = np.where(df['Area'].eq("Q") & df['Stage'].eq('X'),'P',df['Area']) </code></pre>

Could you please try following. <pre class="prettyprint"><code>import pandas as pd import numpy as np df['Area']=np.where(df['Stage']=='X','P',df['Area']) </code></pre>

You can use <code>loc</code> to specify where you want to replace, and pass the replaced series to the assignment: <pre class="prettyprint"><code>df.loc[df['Stage']=='X', 'Area'] = df['Area'].replace('Q','P') </code></pre> Output: <pre class="prettyprint"><code> ID Area Stage 0 1 P X 1 2 P X 2 3 P X 3 4 Q Y </code></pre>

Replace column value based on value in other column

Tags:

python

pandas

so far my dataframe looks like this:

ID   Area   Stage
1    P      X
2    Q      X
3    P      X
4    Q      Y

I would like to replace the area 'Q' with 'P' for every row where the Stage is equal to 'X'.

So the result should look like:

ID   Area   Stage
1    P      X
2    P      X
3    P      X
4    Q      Y

I tried:

data.query('Stage in ["X"]')['Area']=data.query('Stage in ["X"]')['Area'].replace('Q','P')

It does not work. Help is appreciated! :)

282

asked Oct 04 '20 16:10

Michelle

4 Answers

you can use 2 boolean conditions and use loc:

df.loc[df['Area'].eq("Q") & df['Stage'].eq('X'),'Area']='P'
print(df)

   ID Area Stage
0   1    P     X
1   2    P     X
2   3    P     X
3   4    Q     Y

Or np.where

df['Area'] = np.where(df['Area'].eq("Q") & df['Stage'].eq('X'),'P',df['Area'])

180

answered Sep 29 '22 13:09

anky

Could you please try following.

import pandas as pd
import numpy as np
df['Area']=np.where(df['Stage']=='X','P',df['Area'])

answered Sep 29 '22 11:09

RavinderSingh13

You can use loc to specify where you want to replace, and pass the replaced series to the assignment:

df.loc[df['Stage']=='X', 'Area'] = df['Area'].replace('Q','P')

Output:

   ID Area Stage
0   1    P     X
1   2    P     X
2   3    P     X
3   4    Q     Y

answered Sep 29 '22 12:09

Quang Hoang

Note : this not an answer proposing a new way to do, but a comparison of the execution time each needs

All the proposals in the answers are quite 'magic' doing the job in one line of code thanks to pandas/numpy, anyway to do the job is good but to do it quickly is better, so I wanted to compare the execution time of each.

Here my program, in the loops I modify the dataframe two times to let it unchanged from a turn to the next ( I am not a Python programmer as you so sorry in advance if the way to do is 'poor') :

import pandas as pd
import numpy as np
import time

df=pd.DataFrame({'ID' : [i for i in range(1,1000)],
                 'Area' : ['P' if (i & 1) else 'Q' for i in range(1,1000)],
                 'Stage' : [ 'X' if (i & 2) else 'Y' for i in range(1,1000)]})

t0=time.process_time()
for i in range(1,100):
    df.loc[df['Stage']=='X', 'Area'] = df['Area'].replace('Q','q')
    df.loc[df['Stage']=='X', 'Area'] = df['Area'].replace('q','Q')

print("Quang Hoang", '%.2f' % (time.process_time() - t0))

t0=time.process_time()
for i in range(1,100):
    df.loc[df['Stage'] == 'X', 'Area'] = 'q'
    df.loc[df['Stage'] == 'X', 'Area'] = 'Q'

print("Joe Ferndz", '%.2f' % (time.process_time() - t0))

t0=time.process_time()
for i in range(1,100):
    df.loc[df['Area'].eq("Q") & df['Stage'].eq('X'),'Area']='q'
    df.loc[df['Area'].eq("q") & df['Stage'].eq('X'),'Area']='Q'

print("anky 1", '%.2f' % (time.process_time() - t0))

t0=time.process_time()
for i in range(1,100):
    df['Area'] = np.where(df['Area'].eq("Q") & df['Stage'].eq('X'),'q',df['Area'])
    df['Area'] = np.where(df['Area'].eq("q") & df['Stage'].eq('X'),'Q',df['Area'])

print("anky 2", '%.2f' % (time.process_time() - t0))

t0=time.process_time()
for i in range(1,100):
    df['Area']=np.where(df['Stage']=='X','q',df['Area'])
    df['Area']=np.where(df['Stage']=='X','Q',df['Area'])

print("RavinderSingh13", '%.2f' % (time.process_time() - t0))

On my PI 4 the result is :

Quang Hoang 1.60
Joe Ferndz 1.12
anky 1 1.55
anky 2 0.86
RavinderSingh13 0.38

if I use a dataframe having 100000 lines rather than 1000 the result is :

Quang Hoang 10.79
Joe Ferndz 6.61
anky 1 10.91
anky 2 9.64
RavinderSingh13 4.75

Note the proposals of Joe Ferndz and RavinderSingh13 suppose Area is only 'P' or 'Q'

answered Sep 29 '22 11:09

bruno

Related questions
                            
                                Merge pandas DataFrame columns starting with the same letters
                            
                                How to use Newspaper3k library without downloading articles?
                            
                                spacy with joblib library generates _pickle.PicklingError: Could not pickle the task to send it to the workers
                            
                                How to return plain text from flask endpoint? Needed by Prometheus
                            
                                RuntimeWarning: coroutine 'main' was never awaited
                            
                                pip install error: Microsoft Visual C++ 10.0 is required
                            
                                How do I properly decorate a `classmethod` with `functools.lru_cache`?
                            
                                assert true vs assert is not None
                            
                                how to use np.max for empty numpy array without ValueError: zero-size array to reduction operation maximum which has no identity
                            
                                plot_confusion_matrix without estimator
                            
                                Making a tqdm progress bar for asyncio
                            
                                AWS Lambda "Unable to marshal response" Error
                            
                                How to run Python 3 function even after user has closed web browser/tab?
                            
                                PyTorch Lightning move tensor to correct device in validation_epoch_end
                            
                                How can I resolve - TypeError: cannot safely cast non-equivalent float64 to int64?
                            
                                PyTorch: What is the difference between tensor.cuda() and tensor.to(torch.device("cuda:0"))?
                            
                                Install local wheel file with requirements.txt
                            
                                Is OOP possible using discord.py without cogs?
                            
                                Divide two pandas columns of lists by each other
                            
                                return database_name == ':memory:' or 'mode=memory' in database_name TypeError: argument of type 'PosixPath' is not iterable

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With