Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Conditionally fill column with value from another DataFrame based on row match in Pandas

Tags:

python

pandas

I find myself lost trying to solve this problem (automating tax paperwork). I have two dataframes: one with the quarterly historical records of EUR/USD exchange rates, and another with my own invoices, as an example:

import pandas as pd
import numpy as np

usdeur = [(pd.Timestamp('20170705'),1.1329),
          (pd.Timestamp('20170706'),1.1385),
          (pd.Timestamp('20170707'),1.1412),
          (pd.Timestamp('20170710'),1.1387),
          (pd.Timestamp('20170711'),1.1405),
          (pd.Timestamp('20170712'),1.1449)]
labels = ['Date', 'Rate']
rates = pd.DataFrame.from_records(usdeur, columns=labels)

transactions = [(pd.Timestamp('20170706'), 'PayPal',     'USD', 100, 1),
                (pd.Timestamp('20170706'), 'Fastspring', 'USD', 200, 1),
                (pd.Timestamp('20170709'), 'Fastspring', 'USD', 100, 1),
                (pd.Timestamp('20170710'), 'EU',         'EUR', 100, 1),
                (pd.Timestamp('20170710'), 'PayPal',     'USD', 200, 1)]
labels = ['Date', 'From', 'Currency', 'Amount', 'Rate']
sales =pd.DataFrame.from_records(transactions, columns=labels)

resulting in:

enter image description here

I would need to have the sales['Rate'] column filled with the proper exchange rates from the rates['Rate'], that is to say:

  • if sales['Currency'] is 'EUR', leave it alone.
  • for each row of sales, find the row in rates with matching 'Date'; grab that very rates['Rate'] value and put it in sales['Rate']
  • bonus: if there's no matching 'Date' (e.g. during holidays, the exchange market is closed), check the previous row until a suitable value is found.

The full result should look like the following (note that row #2 has the rate from 2017-07-07):

Processed result

I've tried to follow several suggested solutions from other questions, but with no luck. Thank you very much in advance

like image 277
Davide Barranca Avatar asked Feb 03 '26 22:02

Davide Barranca


1 Answers

You can change your rates dataframe to include all the dates and then forward fill,create a column called "Currency" in your Rates Dataframe and then join the two df's on both the date & currency columns.

idx = pd.DataFrame(pd.date_range('2017-07-05', '2017-07-12'),columns=['Date'])
rates = pd.merge(idx,rates,how="left",on="Date")
rates['Currency'] = 'USD'
rates['Rate'] = rates['Rate'].ffill()           

     Date   Rate    Currency
0   2017-07-05  1.1329  USD
1   2017-07-06  1.1385  USD
2   2017-07-07  1.1412  USD
3   2017-07-08  1.1412  USD
4   2017-07-09  1.1412  USD
5   2017-07-10  1.1387  USD
6   2017-07-11  1.1405  USD
7   2017-07-12  1.1449  USD

then doing a left join would give:

result = pd.merge(sales,rates,how="left",on=["Currency","Date"])
result['Rate'] = np.where(result['Currency'] == 'EUR', 1, result['Rate_y'])
result = result.drop(['Rate_x','Rate_y'],axis =1)

would give:

     Date         From      Currency    Amount  Rate
0   2017-07-06  PayPal           USD    100 1.1385
1   2017-07-06  Fastspring       USD    200 1.1385
2   2017-07-09  Fastspring       USD    100 1.1412
3   2017-07-10  EU               EUR    100 1.0000
4   2017-07-10  PayPal           USD    200 1.1387
like image 126
Gayatri Avatar answered Feb 05 '26 12:02

Gayatri



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!