I'm working with a dataframe using Pandas in which I have to replace a column if another colum value is not null.
My dataframe is something like:
v_4 v5 s_5 vt_5 ex_5 pfv pfv_cat
0-50 StoreSale Clothes 8-Apr above 100 FatimaStore Shoes
0-50 StoreSale Clothes 8-Apr 0-50 DiscountWorld Clothes
51-100 CleanShop Clothes 4-Dec 51-100 BetterUncle Shoes
So, I want to replace v_5
with pfv
where pfv
is not null, how can I achieve that?
You can replace values of all or selected columns based on the condition of pandas DataFrame by using DataFrame. loc[ ] property. The loc[] is used to access a group of rows and columns by label(s) or a boolean array. It can access and can also manipulate the values of pandas DataFrame.
In order to replace a value in Pandas DataFrame, use the replace() method with the column the from and to values.
The .replace () method is extremely powerful and lets you replace values across a single column, multiple columns, and an entire dataframe. The method also incorporates regular expressions to make complex replacements easier. To learn more about the Pandas .replace () method, check out the official documentation here.
Use the replace() Method to Modify Values. Another way to replace column values in Pandas DataFrame is the Series.replace() method. Series.replace() Syntax. Replace one single value; df[column_name].replace([old_value], new_value) Replace multiple values with the same value; df[column_name].replace([old_value1, old_value2, old_value3], new_value)
So to replace values from another DataFrame when different indices we can use: Now the values are correctly set: You can use Pandas merge function in order to get values and columns from another DataFrame. For this purpose you will need to have reference column between both DataFrames or use the index.
Pandas is one of those packages, and makes importing and analyzing data much easier. Sometimes csv file has null values, which are later displayed as NaN in Data Frame. Just like pandas dropna() method manage and remove Null values from a data frame, fillna() manages and let the user replace NaN values with some value of their own.
Because missing values are strings null
, use:
df.loc[df['pfv'].ne('null'), 'v5'] = df["pfv"]
print (df)
v_4 v5 s_5 vt_5 ex_5 pfv pfv_cat
0 0-50 StoreSale Clothes 8-Apr above 100 null Shoes
1 0-50 DiscountWorld Clothes 8-Apr 0-50 DiscountWorld Clothes
2 51-100 BetterUncle Clothes 4-Dec 51-100 BetterUncle Shoes
If missing values are NaN
or None
s (not strings) use Series.fillna
:
df['v5'] = df['pfv'].fillna(df['v5'])
print (df)
v_4 v5 s_5 vt_5 ex_5 pfv pfv_cat
0 0-50 StoreSale Clothes 8-Apr above 100 NaN Shoes
1 0-50 DiscountWorld Clothes 8-Apr 0-50 DiscountWorld Clothes
2 51-100 BetterUncle Clothes 4-Dec 51-100 BetterUncle Shoes
You should consider using the numpy where function which runs much faster the all the apply methods.
Where is basically if else function for vectors. The first entry is a vector with the condition, the second the value if true and third if false. Here is how it would look:
import numpy as np
df['v_5'] = np.where(~df['pfv'].isnull(),df['pfv'],df['v_5'])
Good luck
My solution is the same of jezrael, but with one more step, based on an essay I made with the null problem. I've added one more row for a pfv without values.
data = [['0-50','StoreSale','Clothes','8-Apr','above 100','FatimaStore','Shoes'],
['0-50','StoreSale','Clothes','8-Apr','0-50','DiscountWorld','Clothes'],
['51-100','CleanShop','Clothes','4-Dec','51-100','BetterUncle','Shoes'],
['0-50','StoreSale','Clothes','12-Apr','above 100','','Clothes']]
First step is to handle nulls. 'df' is the DataFrame.
df = df.replace('', np.nan)
v_4 v5 s_5 vt_5 ex_5 pfv pfv_cat
0 0-50 StoreSale Clothes 8-Apr above 100 FatimaStore Shoes
1 0-50 StoreSale Clothes 8-Apr 0-50 DiscountWorld Clothes
2 51-100 CleanShop Clothes 4-Dec 51-100 BetterUncle Shoes
3 0-50 StoreSale Clothes 12-Apr above 100 NaN Clothes
Now let's update the v5 column. The command says that we will replace v5 for pfv, but if pfv is NaN we will replace with the current value of v5.
df['v5'] = df['pfv'].fillna(df['v5'])
print(df)
v_4 v5 s_5 vt_5 ex_5 pfv pfv_cat
0 0-50 FatimaStore Clothes 8-Apr above 100 FatimaStore Shoes
1 0-50 DiscountWorld Clothes 8-Apr 0-50 DiscountWorld Clothes
2 51-100 BetterUncle Clothes 4-Dec 51-100 BetterUncle Shoes
3 0-50 StoreSale Clothes 12-Apr above 100 NaN Clothes
Late in the game but if truly nulls
(not 'null'
strings), you could also use
df['v_5'] = df['pfv'].combine_first(df['v_5'])
which is equivalent to COALESCE()
in SQL.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With