Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Take difference between two column of pandas dataframe based on condition in python

I have a dataframe named pricecomp_df, I want to take compare the price of column "market price" and each of the other columns like "apple price","mangoes price", "watermelon price" but prioritize the difference based on the condition : (First priority is watermelon price, second to mangoes and third for apple). The input dataframe is given below:

   code  apple price  mangoes price  watermelon price  market price
0   101          101            NaN               NaN           122
1   102          123            123               NaN           124
2   103          NaN            NaN               NaN           123
3   105          123            167               NaN           154
4   107          165            NaN               177           176
5   110          123            NaN               NaN           123

So here the first row has just apple price and market price then take their diff, but in second row, we have apple, mangoes price so i have to take only the difference between market price and mangoes price. likewise take the difference based on priority condition. Also skip the rows with nan for all three prices. Can anyone help on this?

like image 452
User1090 Avatar asked Apr 13 '16 04:04

User1090


People also ask

How do you find the difference between two columns in a DataFrame Pandas?

Difference between rows or columns of a pandas DataFrame object is found using the diff() method. The axis parameter decides whether difference to be calculated is between rows or between columns. When the periods parameter assumes positive values, difference is found by subtracting the previous row from the next row.

What does diff () do in Pandas?

Pandas DataFrame diff() Method The diff() method returns a DataFrame with the difference between the values for each row and, by default, the previous row. Which row to compare with can be specified with the periods parameter.


1 Answers

Hope I'm not too late. The idea is to calculate the differences and overwrite them according to your priority list.

import numpy as np
import pandas as pd

df = pd.DataFrame({'code': [101, 102, 103, 105, 107, 110],
                   'apple price': [101, 123, np.nan, 123, 165, 123],
                   'mangoes price': [np.nan, 123, np.nan, 167, np.nan, np.nan],
                   'watermelon price': [np.nan, np.nan, np.nan, np.nan, 177, np.nan],
                   'market price': [122, 124, 123, 154, 176, 123]})

# Calculate difference to apple price
df['diff'] = df['market price'] - df['apple price']
# Overwrite with difference to mangoes price
df['diff'] = df.apply(lambda x: x['market price'] - x['mangoes price'] if not np.isnan(x['mangoes price']) else x['diff'], axis=1)
# Overwrite with difference to watermelon price
df['diff'] = df.apply(lambda x: x['market price'] - x['watermelon price'] if not np.isnan(x['watermelon price']) else x['diff'], axis=1)

print df
   apple price  code  mangoes price  market price  watermelon price  diff
0          101   101            NaN           122               NaN    21
1          123   102            123           124               NaN     1
2          NaN   103            NaN           123               NaN   NaN
3          123   105            167           154               NaN   -13
4          165   107            NaN           176               177    -1
5          123   110            NaN           123               NaN     0
like image 162
MERose Avatar answered Sep 19 '22 05:09

MERose