Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

In R data.table multiplication by column name based on values of another column

Tags:

r

data.table

I want to convert some prices in different currencies to a specific currency. Suppose I have this:

library(data.table)
set.seed(100)
DT <- data.table(day=1:10, price=runif(10), currency=c("aud","eur"), 
                 aud=runif(10) + 1, eur=runif(10) + 1.5)
DT
    day        price currency      aud      eur
 1:   1   0.30776611      aud 1.624996 2.035811
 2:   2   0.25767250      eur 1.882166 2.210804
 3:   3   0.55232243      aud 1.280354 2.038349
 4:   4   0.05638315      eur 1.398488 2.248972
 5:   5   0.46854928      aud 1.762551 1.920101
 6:   6   0.48377074      eur 1.669022 1.671420
 7:   7   0.81240262      aud 1.204612 2.270302
 8:   8   0.37032054      eur 1.357525 2.381954
 9:   9   0.54655860      aud 1.359475 2.049097
10:  10   0.17026205      eur 1.690291 1.777724

The price of every day is expressed in the respective currency shown in the currency column. So 0.30776611 on first day is in AUD (Australian Dollars), and 0.25767250 in EUR (Euro). Columns aud and eur show the exchange rates of the respective currencies in dollars. How do I create a new price column expressed in dollars in a data.table way?

I need to multiple price by the appropriate column name based on currency in order to obtain this:

DT
    day        price currency      aud      eur price.in.usd
 1:   1   0.30776611      aud 1.624996 2.035811    0.5001187
 2:   2   0.25767250      eur 1.882166 2.210804    0.5696634
 3:   3   0.55232243      aud 1.280354 2.038349    0.7071682
 4:   4   0.05638315      eur 1.398488 2.248972    0.1268041
 5:   5   0.46854928      aud 1.762551 1.920101    0.825842
 6:   6   0.48377074      eur 1.669022 1.671420    0.8085841
 7:   7   0.81240262      aud 1.204612 2.270302    0.9786299
 8:   8   0.37032054      eur 1.357525 2.381954    0.8820865
 9:   9   0.54655860      aud 1.359475 2.049097    0.7430328
10:  10   0.17026205      eur 1.690291 1.777724    0.3026789

So for the 1st day I multiplied price * aud = 0.30776611 * 1.624996, because price was in aud in the currency column, while on the 2nd price * eur = 0.25767250 * 2.210804 for the same reason.

The real data include around 40 currencies and thus multiple ifelse() creating an arrow anti-pattern are not very convenient.

For the moment, with a subsample of my data, I have this:

DT.all[, price := ifelse(curcdd=="AUD", adj.price * AUD, 
                       ifelse(curcdd=="BEF", adj.price * BEF, 
                              ifelse(curcdd=="BGN", adj.price * BGN, 
                                     ifelse(curcdd=="CHF", adj.price * CHF, 
                                            ifelse(curcdd=="CZK", adj.price * CZK, 
                                                   ifelse(curcdd=="DEM", adj.price * DEM, 
                                                          ifelse(curcdd=="EUR", adj.price * EUR, 
                                                                 ifelse(curcdd=="FRF", adj.price * FRF, 
                                                                        ifelse(curcdd=="GBP", adj.price * GBP, 
                                                                               ifelse(curcdd=="ILS", adj.price * ILS, 
                                                                                      ifelse(curcdd=="JPY", adj.price * JPY, 
                                                                                             ifelse(curcdd=="NLG", adj.price * NLG, 
                                                                                                    ifelse(curcdd=="NOK", adj.price * NOK, 
                                                                                                           ifelse(curcdd=="PLN", adj.price * PLN, 
                                                                                                                  ifelse(curcdd=="SEK", adj.price * SEK,
                                                                                                                         ifelse(curcdd=="SGD", adj.price * SGD,
                                                                                                                                ifelse(curcdd=="USD", adj.price, NA)))))))))))))))))]

which works, but it's only about 20 currencies, and doing all of them (~40) is certainly not elegant...

Thank you very much!

like image 629
Konstantinos Avatar asked Mar 29 '14 14:03

Konstantinos


2 Answers

[Edit] Working with the idea of using get to pull in values referenced by column names that I saw in an answer from Matthew Dowle this seems to be effective:

 setkey(DT, currency)
 DT[ , cvt :=  .SD[, get(currency)]*price, by=currency]
 DT

    day      price currency      aud      eur       cvt
 1:   1 0.30776611      aud 1.624996 2.035811 0.5001188
 2:   3 0.55232243      aud 1.280354 2.038349 0.7071681
 3:   5 0.46854928      aud 1.762551 1.920101 0.8258420
 4:   7 0.81240262      aud 1.204612 2.270302 0.9786301
 5:   9 0.54655860      aud 1.359475 2.049097 0.7430328
 6:   2 0.25767250      eur 1.882166 2.210804 0.5696634
 7:   4 0.05638315      eur 1.398488 2.248972 0.1268041
 8:   6 0.48377074      eur 1.669022 1.671420 0.8085842
 9:   8 0.37032054      eur 1.357525 2.381954 0.8820863
10:  10 0.17026205      eur 1.690291 1.777724 0.3026789

Here's one method although it doesn't generalize well to larger number of currencies:

DT[ , cvt := ifelse (currency == 'aud', price*aud, price*eur) ]
> DT
    day      price currency      aud      eur       cvt
 1:   1 0.30776611      aud 1.624996 2.035811 0.5001188
 2:   2 0.25767250      eur 1.882166 2.210804 0.5696634
 3:   3 0.55232243      aud 1.280354 2.038349 0.7071681
 4:   4 0.05638315      eur 1.398488 2.248972 0.1268041
 5:   5 0.46854928      aud 1.762551 1.920101 0.8258420
 6:   6 0.48377074      eur 1.669022 1.671420 0.8085842
 7:   7 0.81240262      aud 1.204612 2.270302 0.9786301
 8:   8 0.37032054      eur 1.357525 2.381954 0.8820863
 9:   9 0.54655860      aud 1.359475 2.049097 0.7430328
10:  10 0.17026205      eur 1.690291 1.777724 0.3026789

You get a warning (and different results if you try wit if(.){.}else{.} with:

DT[ , cvt := if (currency == 'aud'){price*aud}else{price*eur}]

This is completely analogous to what happens with data.frames. But ... using ifelse in data.table is known to be slow.

like image 182
IRTFM Avatar answered Nov 20 '22 20:11

IRTFM


Have you considered simply looping over the currencies, filtering the main dataframe to only keep prices in a given currency, performing the conversion in the subset dataframe and finally stacking all currency dataframes (or progressively fill a column in the main dataframe)

like image 1
etna Avatar answered Nov 20 '22 21:11

etna