Due to time constraints, I've decided to use data tables in my code instead of data frames, as they are much faster. However, I still want the functionality of data frames. I need to merge two data tables, conserving all values (like setting all=TRUE in merge).
Some example code:
> x1 = data.frame(index = 1:10)
> y1 = data.frame(index = c(2,4,6), weight = c(.2, .5, .3))
> x1
index
1 1
2 2
3 3
4 4
5 5
6 6
7 7
8 8
9 9
10 10
> y1
index weight
1 2 0.2
2 4 0.5
3 6 0.3
> merge(x,y, all=TRUE)
index weight
[1,] 1 NA
[2,] 2 1
[3,] 3 NA
[4,] 4 2
[5,] 5 NA
[6,] 6 3
[7,] 7 NA
[8,] 8 NA
[9,] 9 NA
[10,] 10 NA
Now can I do a similar thing with data tables? (The NA's don't necessarily have to stay, I change them to 0's anyways).
> x2 = data.table(index = 1:10, key ="index")
> y2 = data.table(index = c(2,4,6), weight= c(.3,.5,.2))
I know you can merge, but I also know that there is a faster way.
To join two data frames (datasets) vertically, use the rbind function. The two data frames must have the same variables, but they do not have to be in the same order. If data frameA has variables that data frameB does not, then either: Delete the extra variables in data frameA or.
In R we use merge() function to merge two dataframes in R. This function is present inside join() function of dplyr package. The most important condition for joining two dataframes is that the column type should be the same on which the merging happens. merge() function works similarly like join in DBMS.
To combine two data frames in R, use the merge() function. The merge() is a built-in R function that merges two data frames by common columns or row names.
The merge() function in base R can be used to merge input dataframes by common columns or row names. The merge() function retains all the row names of the dataframes, behaving similarly to the inner join. The dataframes are combined in order of the appearance in the input function call.
so following on from Translating SQL joins on foreign keys to R data.table syntax
x2 = data.table(index = 1:10, key ="index")
y2 = data.table(index = c(2,4,6), weight= c(.3,.5,.2),key="index")
y2[J(x2$index)]
I use a function like:
mergefast<-function(x,y,by.x,by.y,all) {
x_dt<-data.table(x)
y2<-y
for (i in 1:length(by.y)) names(y2)[grep(by.y[i],names(y2))]<-by.x[i]
y_dt<-data.table(y2)
setkeyv(x_dt,by.x)
setkeyv(y_dt,by.x)
as.data.frame(merge(x_dt,y_dt,by=by.x,all=all))
}
which can be used in your example as:
mergefast(x1,y1,by.x="index",by.y="index",all=T)
It's a bit lacking in features that merge
has, e.g. by
, all.x
, all.y
, but these can be easily incorporated.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With