I'm trying to match 4 variables pairwise and add a column with the lookup value. In base, I would do merge(df1,df2, by.x=c("lsr","ppr"),by.y=c("li","pro"))
, where df1
has 9 cols and df2
(2 being lsr
and pro
) df2
has only 3, li
, pro
, and the "value" I'm interested in, alpha
.
This works fine, but as I'm beginning to be a huge fan of data.table
, I would like to do this in the data.table
way - and because I have some millions of rows - so base merge is slow (I saw, that the by.x
, and by.y
feature is pending for data.table
, but maybe there is a workaround). See some sample data below:
df2: alpha li pro 1: 0.5000000 0.01666667 0.01666667 2: 0.3295455 0.03333333 0.01666667 3: 0.2435897 0.05000000 0.01666667 4: 0.1917808 0.06666667 0.01666667 5: 0.1571429 0.08333333 0.01666667 df1: demand rtime mcv mck ppr mlv mlk lsr 1: 0.3 1 357.57700 0.099326944 0.01666667 558.27267 0.155075741 0.01666667 2: 0.3 10 548.75433 0.152431759 0.01666667 614.30667 0.170640741 0.03333333 3: 0.3 11 314.55767 0.087377130 0.01666667 636.48100 0.176800278 0.03333333 4: 0.3 2 312.15033 0.086708426 0.01666667 677.48100 0.188189167 0.06666667 5: 0.3 3 454.47867 0.126244074 0.01666667 608.92067 0.169144630 0.01666667 --- 6899196: 0.6 5 537.92673 0.149424093 1.00000000 537.92673 0.149424093 1.00000000 6899197: 0.6 6 277.34732 0.077040923 1.00000000 277.34732 0.077040923 1.00000000 6899198: 0.6 7 73.31484 0.020365235 1.00000000 73.31484 0.020365235 1.00000000 6899199: 0.6 8 32.04197 0.008900546 1.00000000 32.04197 0.008900546 1.00000000 6899200: 0.6 9 14.59008 0.004052799 1.00000000 14.59008 0.004052799 1.00000000
Last, maybe of interest is, that in df2
I have unique rows, and in df1
, I have lots of duplicates in respect to lsr
and ppr
. I also tried to set two keys and join them the data.table
way, and adding a new column with alpha
. But without success.
You can use the statement provided by David Arenburg in comment:
setkey(df1, lsr, ppr) setkey(df2, li, pro) df1[df2, alpha := i.alpha]
From the current devel version, 1.9.5, we can perform joins directly without having to set keys using the on
argument:
df1[df2, alpha := i.alpha, on = c(lsr="li", ppr="pro")]
If you don't want to install the devel version, then you can wait until this is pushed as v1.9.6 on CRAN.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With