Replace a column with another column if another is not null in pandas DataFrame

Tags:

I'm working with a dataframe using Pandas in which I have to replace a column if another colum value is not null.

My dataframe is something like:

v_4        v5             s_5     vt_5     ex_5          pfv           pfv_cat
0-50      StoreSale     Clothes   8-Apr   above 100   FatimaStore       Shoes
0-50      StoreSale     Clothes   8-Apr   0-50        DiscountWorld     Clothes
51-100    CleanShop     Clothes   4-Dec   51-100      BetterUncle       Shoes

So, I want to replace v_5 with pfv where pfv is not null, how can I achieve that?

842

asked Oct 09 '19 11:10

Abdul Rehman

4 Answers

Because missing values are strings null, use:

df.loc[df['pfv'].ne('null'), 'v5'] = df["pfv"]
print (df)
      v_4             v5      s_5   vt_5       ex_5            pfv  pfv_cat
0    0-50      StoreSale  Clothes  8-Apr  above 100           null    Shoes
1    0-50  DiscountWorld  Clothes  8-Apr       0-50  DiscountWorld  Clothes
2  51-100    BetterUncle  Clothes  4-Dec     51-100    BetterUncle    Shoes

If missing values are NaN or Nones (not strings) use Series.fillna:

df['v5'] = df['pfv'].fillna(df['v5'])

print (df)
      v_4             v5      s_5   vt_5       ex_5            pfv  pfv_cat
0    0-50      StoreSale  Clothes  8-Apr  above 100            NaN    Shoes
1    0-50  DiscountWorld  Clothes  8-Apr       0-50  DiscountWorld  Clothes
2  51-100    BetterUncle  Clothes  4-Dec     51-100    BetterUncle    Shoes

155

answered Oct 18 '22 22:10

jezrael

You should consider using the numpy where function which runs much faster the all the apply methods.

Where is basically if else function for vectors. The first entry is a vector with the condition, the second the value if true and third if false. Here is how it would look:

import numpy as np
df['v_5'] = np.where(~df['pfv'].isnull(),df['pfv'],df['v_5'])

Good luck

answered Oct 18 '22 22:10

YagoCaruso

My solution is the same of jezrael, but with one more step, based on an essay I made with the null problem. I've added one more row for a pfv without values.

    data = [['0-50','StoreSale','Clothes','8-Apr','above 100','FatimaStore','Shoes'],
    ['0-50','StoreSale','Clothes','8-Apr','0-50','DiscountWorld','Clothes'],
    ['51-100','CleanShop','Clothes','4-Dec','51-100','BetterUncle','Shoes'],
    ['0-50','StoreSale','Clothes','12-Apr','above 100','','Clothes']]

First step is to handle nulls. 'df' is the DataFrame.

    df = df.replace('', np.nan)

          v_4         v5      s_5    vt_5       ex_5            pfv  pfv_cat
    0    0-50  StoreSale  Clothes   8-Apr  above 100    FatimaStore    Shoes
    1    0-50  StoreSale  Clothes   8-Apr       0-50  DiscountWorld  Clothes
    2  51-100  CleanShop  Clothes   4-Dec     51-100    BetterUncle    Shoes
    3    0-50  StoreSale  Clothes  12-Apr  above 100            NaN  Clothes

Now let's update the v5 column. The command says that we will replace v5 for pfv, but if pfv is NaN we will replace with the current value of v5.

    df['v5'] = df['pfv'].fillna(df['v5'])


    print(df)

         v_4             v5      s_5    vt_5       ex_5            pfv  pfv_cat
    0    0-50    FatimaStore  Clothes   8-Apr  above 100    FatimaStore    Shoes
    1    0-50  DiscountWorld  Clothes   8-Apr       0-50  DiscountWorld  Clothes
    2  51-100    BetterUncle  Clothes   4-Dec     51-100    BetterUncle    Shoes
    3    0-50      StoreSale  Clothes  12-Apr  above 100            NaN  Clothes

answered Oct 18 '22 22:10

powerPixie

Late in the game but if truly nulls (not 'null' strings), you could also use

df['v_5'] = df['pfv'].combine_first(df['v_5'])

which is equivalent to COALESCE() in SQL.

answered Oct 19 '22 00:10

Yannick Einsweiler

Related questions
                            
                                How to stream audio from a Youtube URL in Python (without download)?
                            
                                Pipenv install matplotlib
                            
                                Add text with PdfPages - matplotlib
                            
                                Selenium on MAC, Message: 'chromedriver' executable may have wrong permissions
                            
                                Groupby on pandas dataframe and concatenate strings with comma based on the frequency of values in a column
                            
                                Showing index as xticks for pandas plot
                            
                                redis: max number of clients reached
                            
                                Pyinstaller 3.3.1 & 3.4.0-dev build with apscheduler
                            
                                How can I run an async function using the schedule library?
                            
                                Unwrap angle to have continuous phase
                            
                                Opencv: AttributeError: module 'cv2' has no attribute 'dnn'
                            
                                Why is a method of a Python class declared without "self" and without decorators not raising an exception?
                            
                                Python. Extract last digit of a string from a Pandas column
                            
                                How to work a job queue with kubernetes with scaling
                            
                                from urllib3.util.ssl_ import ( ImportError: cannot import name ssl
                            
                                Python can't install Box2D swig.exe failed with error code 1
                            
                                How to combine similar characters in a list?
                            
                                Why does dim=1 return row indices in torch.argmax?
                            
                                PIP randomly fails 'Could not find a version that satisfies the requirement' with the same requirements.txt
                            
                                List available cameras OpenCV/Python

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Replace a column with another column if another is not null in pandas DataFrame

Tags:

python

pandas

dataframe

Abdul Rehman

People also ask

4 Answers

jezrael

YagoCaruso

powerPixie

Yannick Einsweiler

Recent Activity

Donate For Us