Due to time constraints, I've decided to use data tables in my code instead of data frames, as they are much faster. However, I still want the functionality of data frames. I need to merge two data tables, conserving all values (like setting all=TRUE in merge). Some example code: <pre class="prettyprint"><code>> x1 = data.frame(index = 1:10) > y1 = data.frame(index = c(2,4,6), weight = c(.2, .5, .3)) > x1 index 1 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8 9 9 10 10 > y1 index weight 1 2 0.2 2 4 0.5 3 6 0.3 > merge(x,y, all=TRUE) index weight [1,] 1 NA [2,] 2 1 [3,] 3 NA [4,] 4 2 [5,] 5 NA [6,] 6 3 [7,] 7 NA [8,] 8 NA [9,] 9 NA [10,] 10 NA </code></pre> Now can I do a similar thing with data tables? (The NA's don't necessarily have to stay, I change them to 0's anyways). <pre class="prettyprint"><code>> x2 = data.table(index = 1:10, key ="index") > y2 = data.table(index = c(2,4,6), weight= c(.3,.5,.2)) </code></pre> I know you can merge, but I also know that there is a faster way.

so following on from Translating SQL joins on foreign keys to R data.table syntax <pre class="prettyprint"><code>x2 = data.table(index = 1:10, key ="index") y2 = data.table(index = c(2,4,6), weight= c(.3,.5,.2),key="index") y2[J(x2$index)] </code></pre>

I use a function like: <pre class="prettyprint"><code>mergefast<-function(x,y,by.x,by.y,all) { x_dt<-data.table(x) y2<-y for (i in 1:length(by.y)) names(y2)[grep(by.y[i],names(y2))]<-by.x[i] y_dt<-data.table(y2) setkeyv(x_dt,by.x) setkeyv(y_dt,by.x) as.data.frame(merge(x_dt,y_dt,by=by.x,all=all)) } </code></pre> which can be used in your example as: <pre class="prettyprint"><code>mergefast(x1,y1,by.x="index",by.y="index",all=T) </code></pre> It's a bit lacking in features that <code>merge</code> has, e.g. <code>by</code>, <code>all.x</code>, <code>all.y</code>, but these can be easily incorporated.

Merge data tables like data frames in R

Tags:

r

data.table

Due to time constraints, I've decided to use data tables in my code instead of data frames, as they are much faster. However, I still want the functionality of data frames. I need to merge two data tables, conserving all values (like setting all=TRUE in merge).

Some example code:

> x1 = data.frame(index = 1:10)
> y1 = data.frame(index = c(2,4,6), weight = c(.2, .5, .3))
> x1
   index
1      1
2      2
3      3
4      4
5      5
6      6
7      7
8      8
9      9
10    10
> y1
  index weight
1     2    0.2
2     4    0.5
3     6    0.3

> merge(x,y, all=TRUE)
      index weight
 [1,]     1     NA
 [2,]     2      1
 [3,]     3     NA
 [4,]     4      2
 [5,]     5     NA
 [6,]     6      3
 [7,]     7     NA
 [8,]     8     NA
 [9,]     9     NA
[10,]    10     NA

Now can I do a similar thing with data tables? (The NA's don't necessarily have to stay, I change them to 0's anyways).

> x2 = data.table(index = 1:10, key ="index")
> y2 = data.table(index = c(2,4,6), weight= c(.3,.5,.2))

I know you can merge, but I also know that there is a faster way.

497

asked Jul 11 '12 15:07

Mike Flynn

2 Answers

so following on from Translating SQL joins on foreign keys to R data.table syntax

x2 = data.table(index = 1:10, key ="index")
y2 = data.table(index = c(2,4,6), weight= c(.3,.5,.2),key="index")
y2[J(x2$index)]

144

answered Oct 19 '22 22:10

shhhhimhuntingrabbits

I use a function like:

mergefast<-function(x,y,by.x,by.y,all) {
  x_dt<-data.table(x)
  y2<-y
  for (i in 1:length(by.y)) names(y2)[grep(by.y[i],names(y2))]<-by.x[i]
  y_dt<-data.table(y2)
  setkeyv(x_dt,by.x)
  setkeyv(y_dt,by.x)
  as.data.frame(merge(x_dt,y_dt,by=by.x,all=all))
}

which can be used in your example as:

mergefast(x1,y1,by.x="index",by.y="index",all=T)

It's a bit lacking in features that merge has, e.g. by, all.x, all.y, but these can be easily incorporated.

answered Oct 19 '22 22:10

uday

Related questions
                            
                                Background color in tabsetPanel in Shiny
                            
                                Convert RStudio presentation (.Rpres) to rmarkdown presentation (.Rmd)
                            
                                Comparison of R, statmodels, sklearn for a classification task with logistic regression
                            
                                LaTeX formula in Shiny panel
                            
                                roxygen2 (Version 5.0) incorrectly creates documentation when #' occurs inside function
                            
                                Running python/bash code in Rstudio
                            
                                ggplot2 and Shiny: how to scale the size of legend with figure size?
                            
                                How to adjust x-axis using plot() when range is changing daily?
                            
                                r - How to specify the path in normalizePath, or get around this error associated with it?
                            
                                Setup R alert when long process is Finished
                            
                                variable scope in R tryCatch block: is <<- necessary to change local variable defined before tryCatch?
                            
                                prevent plot_ly reordering matrix
                            
                                Unexpected match of regex
                            
                                how to impute the distance to a value
                            
                                Performance benefits of chaining over ANDing when filtering a data table
                            
                                How to make a CRAN package to download data only once regardless of OS?
                            
                                How can I suppress the vertical gridlines in a ggplot2 plot while retaining the x-axis labels?
                            
                                How can I pass flags to R when it is compiling C++ code to be used in a package?
                            
                                Relative positioning of geom_text in ggplot2?
                            
                                Printing dataframes with long strings in R

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With