Take difference between two column of pandas dataframe based on condition in python

I have a dataframe named pricecomp_df, I want to take compare the price of column "market price" and each of the other columns like "apple price","mangoes price", "watermelon price" but prioritize the difference based on the condition : (First priority is watermelon price, second to mangoes and third for apple). The input dataframe is given below:

   code  apple price  mangoes price  watermelon price  market price
0   101          101            NaN               NaN           122
1   102          123            123               NaN           124
2   103          NaN            NaN               NaN           123
3   105          123            167               NaN           154
4   107          165            NaN               177           176
5   110          123            NaN               NaN           123

So here the first row has just apple price and market price then take their diff, but in second row, we have apple, mangoes price so i have to take only the difference between market price and mangoes price. likewise take the difference based on priority condition. Also skip the rows with nan for all three prices. Can anyone help on this?

How do you find the difference between two columns in a DataFrame Pandas?

Difference between rows or columns of a pandas DataFrame object is found using the diff() method. The axis parameter decides whether difference to be calculated is between rows or between columns. When the periods parameter assumes positive values, difference is found by subtracting the previous row from the next row.

What does diff () do in Pandas?

Pandas DataFrame diff() Method The diff() method returns a DataFrame with the difference between the values for each row and, by default, the previous row. Which row to compare with can be specified with the periods parameter.

Hope I'm not too late. The idea is to calculate the differences and overwrite them according to your priority list.

import numpy as np
import pandas as pd

df = pd.DataFrame({'code': [101, 102, 103, 105, 107, 110],
                   'apple price': [101, 123, np.nan, 123, 165, 123],
                   'mangoes price': [np.nan, 123, np.nan, 167, np.nan, np.nan],
                   'watermelon price': [np.nan, np.nan, np.nan, np.nan, 177, np.nan],
                   'market price': [122, 124, 123, 154, 176, 123]})

# Calculate difference to apple price
df['diff'] = df['market price'] - df['apple price']
# Overwrite with difference to mangoes price
df['diff'] = df.apply(lambda x: x['market price'] - x['mangoes price'] if not np.isnan(x['mangoes price']) else x['diff'], axis=1)
# Overwrite with difference to watermelon price
df['diff'] = df.apply(lambda x: x['market price'] - x['watermelon price'] if not np.isnan(x['watermelon price']) else x['diff'], axis=1)

print df
   apple price  code  mangoes price  market price  watermelon price  diff
0          101   101            NaN           122               NaN    21
1          123   102            123           124               NaN     1
2          NaN   103            NaN           123               NaN   NaN
3          123   105            167           154               NaN   -13
4          165   107            NaN           176               177    -1
5          123   110            NaN           123               NaN     0

Take difference between two column of pandas dataframe based on condition in python

Tags:

python

diff

conditional

User1090

People also ask

1 Answers

MERose

Recent Activity

Donate For Us

Take difference between two column of pandas dataframe based on condition in python

Tags:

python

diff

conditional

User1090

People also ask

1 Answers

MERose

Related questions

Recent Activity

Donate For Us