Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Merge data tables like data frames in R

Tags:

r

data.table

Due to time constraints, I've decided to use data tables in my code instead of data frames, as they are much faster. However, I still want the functionality of data frames. I need to merge two data tables, conserving all values (like setting all=TRUE in merge).

Some example code:

> x1 = data.frame(index = 1:10)
> y1 = data.frame(index = c(2,4,6), weight = c(.2, .5, .3))
> x1
   index
1      1
2      2
3      3
4      4
5      5
6      6
7      7
8      8
9      9
10    10
> y1
  index weight
1     2    0.2
2     4    0.5
3     6    0.3

> merge(x,y, all=TRUE)
      index weight
 [1,]     1     NA
 [2,]     2      1
 [3,]     3     NA
 [4,]     4      2
 [5,]     5     NA
 [6,]     6      3
 [7,]     7     NA
 [8,]     8     NA
 [9,]     9     NA
[10,]    10     NA

Now can I do a similar thing with data tables? (The NA's don't necessarily have to stay, I change them to 0's anyways).

> x2 = data.table(index = 1:10, key ="index")
> y2 = data.table(index = c(2,4,6), weight= c(.3,.5,.2))

I know you can merge, but I also know that there is a faster way.

like image 497
Mike Flynn Avatar asked Jul 11 '12 15:07

Mike Flynn


People also ask

How do I merge data tables in R?

To join two data frames (datasets) vertically, use the rbind function. The two data frames must have the same variables, but they do not have to be in the same order. If data frameA has variables that data frameB does not, then either: Delete the extra variables in data frameA or.

Can you merge data frames in R?

In R we use merge() function to merge two dataframes in R. This function is present inside join() function of dplyr package. The most important condition for joining two dataframes is that the column type should be the same on which the merging happens. merge() function works similarly like join in DBMS.

How do I combine Dataframes into one in R?

To combine two data frames in R, use the merge() function. The merge() is a built-in R function that merges two data frames by common columns or row names.

How do I merge two data frames in the same column in R?

The merge() function in base R can be used to merge input dataframes by common columns or row names. The merge() function retains all the row names of the dataframes, behaving similarly to the inner join. The dataframes are combined in order of the appearance in the input function call.


2 Answers

so following on from Translating SQL joins on foreign keys to R data.table syntax

x2 = data.table(index = 1:10, key ="index")
y2 = data.table(index = c(2,4,6), weight= c(.3,.5,.2),key="index")
y2[J(x2$index)]
like image 144
shhhhimhuntingrabbits Avatar answered Oct 19 '22 22:10

shhhhimhuntingrabbits


I use a function like:

mergefast<-function(x,y,by.x,by.y,all) {
  x_dt<-data.table(x)
  y2<-y
  for (i in 1:length(by.y)) names(y2)[grep(by.y[i],names(y2))]<-by.x[i]
  y_dt<-data.table(y2)
  setkeyv(x_dt,by.x)
  setkeyv(y_dt,by.x)
  as.data.frame(merge(x_dt,y_dt,by=by.x,all=all))
}

which can be used in your example as:

mergefast(x1,y1,by.x="index",by.y="index",all=T)

It's a bit lacking in features that merge has, e.g. by, all.x, all.y, but these can be easily incorporated.

like image 24
uday Avatar answered Oct 19 '22 22:10

uday