Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Replace a column with another column if another is not null in pandas DataFrame

I'm working with a dataframe using Pandas in which I have to replace a column if another colum value is not null.

My dataframe is something like:

v_4        v5             s_5     vt_5     ex_5          pfv           pfv_cat
0-50      StoreSale     Clothes   8-Apr   above 100   FatimaStore       Shoes
0-50      StoreSale     Clothes   8-Apr   0-50        DiscountWorld     Clothes
51-100    CleanShop     Clothes   4-Dec   51-100      BetterUncle       Shoes

So, I want to replace v_5 with pfv where pfv is not null, how can I achieve that?

like image 842
Abdul Rehman Avatar asked Oct 09 '19 11:10

Abdul Rehman


People also ask

How do I replace column values based on conditions in pandas?

You can replace values of all or selected columns based on the condition of pandas DataFrame by using DataFrame. loc[ ] property. The loc[] is used to access a group of rows and columns by label(s) or a boolean array. It can access and can also manipulate the values of pandas DataFrame.

How do I overwrite a column in pandas?

In order to replace a value in Pandas DataFrame, use the replace() method with the column the from and to values.

How do I replace a column in a Dataframe in pandas?

The .replace () method is extremely powerful and lets you replace values across a single column, multiple columns, and an entire dataframe. The method also incorporates regular expressions to make complex replacements easier. To learn more about the Pandas .replace () method, check out the official documentation here.

How do I change the value of a column in a Dataframe?

Use the replace() Method to Modify Values. Another way to replace column values in Pandas DataFrame is the Series.replace() method. Series.replace() Syntax. Replace one single value; df[column_name].replace([old_value], new_value) Replace multiple values with the same value; df[column_name].replace([old_value1, old_value2, old_value3], new_value)

How to replace values from another Dataframe when different indices are used?

So to replace values from another DataFrame when different indices we can use: Now the values are correctly set: You can use Pandas merge function in order to get values and columns from another DataFrame. For this purpose you will need to have reference column between both DataFrames or use the index.

How to remove NaN values from pandas data frame?

Pandas is one of those packages, and makes importing and analyzing data much easier. Sometimes csv file has null values, which are later displayed as NaN in Data Frame. Just like pandas dropna() method manage and remove Null values from a data frame, fillna() manages and let the user replace NaN values with some value of their own.


4 Answers

Because missing values are strings null, use:

df.loc[df['pfv'].ne('null'), 'v5'] = df["pfv"]
print (df)
      v_4             v5      s_5   vt_5       ex_5            pfv  pfv_cat
0    0-50      StoreSale  Clothes  8-Apr  above 100           null    Shoes
1    0-50  DiscountWorld  Clothes  8-Apr       0-50  DiscountWorld  Clothes
2  51-100    BetterUncle  Clothes  4-Dec     51-100    BetterUncle    Shoes

If missing values are NaN or Nones (not strings) use Series.fillna:

df['v5'] = df['pfv'].fillna(df['v5'])

print (df)
      v_4             v5      s_5   vt_5       ex_5            pfv  pfv_cat
0    0-50      StoreSale  Clothes  8-Apr  above 100            NaN    Shoes
1    0-50  DiscountWorld  Clothes  8-Apr       0-50  DiscountWorld  Clothes
2  51-100    BetterUncle  Clothes  4-Dec     51-100    BetterUncle    Shoes
like image 155
jezrael Avatar answered Oct 18 '22 22:10

jezrael


You should consider using the numpy where function which runs much faster the all the apply methods.

Where is basically if else function for vectors. The first entry is a vector with the condition, the second the value if true and third if false. Here is how it would look:

import numpy as np
df['v_5'] = np.where(~df['pfv'].isnull(),df['pfv'],df['v_5'])

Good luck

like image 16
YagoCaruso Avatar answered Oct 18 '22 22:10

YagoCaruso


My solution is the same of jezrael, but with one more step, based on an essay I made with the null problem. I've added one more row for a pfv without values.

    data = [['0-50','StoreSale','Clothes','8-Apr','above 100','FatimaStore','Shoes'],
    ['0-50','StoreSale','Clothes','8-Apr','0-50','DiscountWorld','Clothes'],
    ['51-100','CleanShop','Clothes','4-Dec','51-100','BetterUncle','Shoes'],
    ['0-50','StoreSale','Clothes','12-Apr','above 100','','Clothes']]

First step is to handle nulls. 'df' is the DataFrame.

    df = df.replace('', np.nan)

          v_4         v5      s_5    vt_5       ex_5            pfv  pfv_cat
    0    0-50  StoreSale  Clothes   8-Apr  above 100    FatimaStore    Shoes
    1    0-50  StoreSale  Clothes   8-Apr       0-50  DiscountWorld  Clothes
    2  51-100  CleanShop  Clothes   4-Dec     51-100    BetterUncle    Shoes
    3    0-50  StoreSale  Clothes  12-Apr  above 100            NaN  Clothes

Now let's update the v5 column. The command says that we will replace v5 for pfv, but if pfv is NaN we will replace with the current value of v5.

    df['v5'] = df['pfv'].fillna(df['v5'])


    print(df)

         v_4             v5      s_5    vt_5       ex_5            pfv  pfv_cat
    0    0-50    FatimaStore  Clothes   8-Apr  above 100    FatimaStore    Shoes
    1    0-50  DiscountWorld  Clothes   8-Apr       0-50  DiscountWorld  Clothes
    2  51-100    BetterUncle  Clothes   4-Dec     51-100    BetterUncle    Shoes
    3    0-50      StoreSale  Clothes  12-Apr  above 100            NaN  Clothes
like image 1
powerPixie Avatar answered Oct 18 '22 22:10

powerPixie


Late in the game but if truly nulls (not 'null' strings), you could also use

df['v_5'] = df['pfv'].combine_first(df['v_5'])

which is equivalent to COALESCE() in SQL.

like image 1
Yannick Einsweiler Avatar answered Oct 19 '22 00:10

Yannick Einsweiler