I am trying to do some left-join merges with data.tables. The package description quote that
In all joins the names of the columns are irrelevant; the columns of x's key are joined to in order
I understand that I can use .data.table[
and data.table:::merge.data.table
What I would like is : merge X and Y specifying the keys (like by.x and by.y in base merge, ->why taking this away ?)
Let's suppose I have
DT = data.table(x=rep(c("a","b","c"),each=3),y=c(1,3,6),v=1:9,key="x,y,v")
DT1 = data.frame(x1=c("aa","bb","cc"),y1=c(1,3,6),v1=1:3,key="x1,y1,v1")
and I would like this output:
#data.table:::merge is masking I don't know how to call the base version of merge anymore
R) {base::merge}(DT,DT1,by.x="y",by.y="y1")
y x v x1 v1
1 1 a 1 aa 1
2 1 c 7 aa 1
3 1 b 4 aa 1
4 3 a 2 bb 2
5 3 b 5 bb 2
6 3 c 8 bb 2
7 6 b 6 cc 3
8 6 a 3 cc 3
9 6 c 9 cc 3
I am very happy to use [
or data.table:::merge
but I would like an option that do not modify DT
or DT1
(like changing the column names and calling merge and changing it back)
Different column names are specified for merges in Pandas using the “left_on” and “right_on” parameters, instead of using only the “on” parameter. Merging dataframes with different names for the joining variable is achieved using the left_on and right_on arguments to the pandas merge function.
Update: Since data.table v1.9.6 (released September 19, 2015), merge.data.table()
does accept and nicely handles arguments by.x=
and by.y=
. Here's an updated link to the FR (now closed) referenced below.
Yes this is a feature request not yet implemented :
FR#2033 Add by.x and by.y to merge.data.table
There isn't anything preventing it. Just something that wasn't done. I very rarely need merge
and was slow to realise its usefulness more generally. We've made good progress in bringing merge
performance as fast as X[Y]
, and this feature request is at the highest priority. If you'd like it more quickly you are more than welcome to add those arguments to merge.data.table
and commit the change yourself. We try to keep source code short and together in one function/file, so by looking at merge.data.table
source hopefully you can follow it and see what needs to be done.
The arguments by.x
and by.y
are now available in the development version of data.table
. See here. Use devtools::install_github("Rdatatable/data.table", build_vignettes = FALSE)
to install the development version of data.table
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With