Merging all column by reference in a data.table

Tags:

r

data.table

I would like to merge two data.table together by reference without having to write down all variables I want to merge. Here is a simple example to understand my needs :

set.seed(20170711)
(a <- data.table(v_key=seq(1, 5), key="v_key"))
#   v_key
#1:     1
#2:     2
#3:     3
#4:     4
#5:     5

a_backup <- copy(a)

(b <- data.table(v_key=seq(1, 5), v1=runif(5), v2=runif(5), v3=runif(5), key="v_key"))
#   v_key          v1        v2          v3
#1:     1 0.141804303 0.1311052 0.354798849
#2:     2 0.425955903 0.3635612 0.950234261
#3:     3 0.001070379 0.4615936 0.359660693
#4:     4 0.453054854 0.5768500 0.008470552
#5:     5 0.951767837 0.1649903 0.565894298

I want to copy every columns of b into a by reference without specifying the column names.

I could do the following, but that would make a copy of the object for no reason, reducing the performance of my program and increasing the RAM needed :

(a  <- a[b])
#   v_key          v1        v2          v3
#1:     1 0.141804303 0.1311052 0.354798849
#2:     2 0.425955903 0.3635612 0.950234261
#3:     3 0.001070379 0.4615936 0.359660693
#4:     4 0.453054854 0.5768500 0.008470552
#5:     5 0.951767837 0.1649903 0.565894298

Another option (without useless copy) would be to specify the name of every column of b, resulting in the following :

a <- copy(a_backup)
a[b, `:=`(v1=v1, v2=v2, v3=v3)][]
#   v_key          v1        v2          v3
#1:     1 0.141804303 0.1311052 0.354798849
#2:     2 0.425955903 0.3635612 0.950234261
#3:     3 0.001070379 0.4615936 0.359660693
#4:     4 0.453054854 0.5768500 0.008470552
#5:     5 0.951767837 0.1649903 0.565894298

In brief, I would like to have the efficiency of my second example (no useless copy) without having to specify every column names in b.

I guess I could find a way of doing it using a combination of the colnames() and get() functions, but I am wondering if there is a cleaner way to do it, syntax is so important for me.

697

asked Jul 11 '17 20:07

J.P. Le Cavalier

1 Answers

As you wrote, a combination of colnames and mget could get you there.

Consider this:

# retrieve the column names from b - without the key ('v_key')
thecols = setdiff(colnames(b), key(b))

# assign them to a
a[b, (thecols) := mget(thecols)]

This is not too bad-looking, is it?

Besides, I don't think another syntax is currently implemented with data.table. But I would be happy to be proven wrong :)

107

answered Sep 27 '22 22:09

Jealie

Related questions
                            
                                Distinguish Categorical Variables by Color using leaflet in R
                            
                                How to code a Sidebar collapse in shiny to show only icons
                            
                                How to get the %like% operator to be case insensitive
                            
                                What formula does prop.test use?
                            
                                Fontsize error when creating new Geom in ggplot2
                            
                                How to calculate new column depending on aggregate function on group using dplyr (add summary statistics on the summary statistics)?
                            
                                Tracking which group fails in a dplyr chain
                            
                                how to fit the plot over a background image in R and ggplot2
                            
                                why does map_if() not work within a list
                            
                                Weighted sankey / alluvial diagram for visualizing discrete and continuous panel data?
                            
                                multiple separate arguments in 'tidyr's separate function
                            
                                how create histogram from data frame in R
                            
                                Using R to append a row to a .csv file
                            
                                Convert Document Term Matrix (DTM) to Data Frame (R Programming)
                            
                                Update content on server only after I click action button in Shiny
                            
                                R Shiny Destroy ObserveEvent
                            
                                styleColorBar: have the size of the color bar be proportional to absolute values of a column
                            
                                How can I prevent `<<- ` from assigning in the global environment?
                            
                                Need help speeding up a dplyr aggregation
                            
                                How to flush the print buffer in R?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With