Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Merging all column by reference in a data.table

Tags:

r

data.table

I would like to merge two data.table together by reference without having to write down all variables I want to merge. Here is a simple example to understand my needs :

set.seed(20170711)
(a <- data.table(v_key=seq(1, 5), key="v_key"))
#   v_key
#1:     1
#2:     2
#3:     3
#4:     4
#5:     5

a_backup <- copy(a)

(b <- data.table(v_key=seq(1, 5), v1=runif(5), v2=runif(5), v3=runif(5), key="v_key"))
#   v_key          v1        v2          v3
#1:     1 0.141804303 0.1311052 0.354798849
#2:     2 0.425955903 0.3635612 0.950234261
#3:     3 0.001070379 0.4615936 0.359660693
#4:     4 0.453054854 0.5768500 0.008470552
#5:     5 0.951767837 0.1649903 0.565894298

I want to copy every columns of b into a by reference without specifying the column names.

I could do the following, but that would make a copy of the object for no reason, reducing the performance of my program and increasing the RAM needed :

(a  <- a[b])
#   v_key          v1        v2          v3
#1:     1 0.141804303 0.1311052 0.354798849
#2:     2 0.425955903 0.3635612 0.950234261
#3:     3 0.001070379 0.4615936 0.359660693
#4:     4 0.453054854 0.5768500 0.008470552
#5:     5 0.951767837 0.1649903 0.565894298

Another option (without useless copy) would be to specify the name of every column of b, resulting in the following :

a <- copy(a_backup)
a[b, `:=`(v1=v1, v2=v2, v3=v3)][]
#   v_key          v1        v2          v3
#1:     1 0.141804303 0.1311052 0.354798849
#2:     2 0.425955903 0.3635612 0.950234261
#3:     3 0.001070379 0.4615936 0.359660693
#4:     4 0.453054854 0.5768500 0.008470552
#5:     5 0.951767837 0.1649903 0.565894298

In brief, I would like to have the efficiency of my second example (no useless copy) without having to specify every column names in b.

I guess I could find a way of doing it using a combination of the colnames() and get() functions, but I am wondering if there is a cleaner way to do it, syntax is so important for me.

like image 697
J.P. Le Cavalier Avatar asked Jul 11 '17 20:07

J.P. Le Cavalier


People also ask

How do I merge data tables in R?

To join two data frames (datasets) vertically, use the rbind function. The two data frames must have the same variables, but they do not have to be in the same order. If data frameA has variables that data frameB does not, then either: Delete the extra variables in data frameA or.

How do I merge and match data in Excel?

On the Data tab, under Tools, click Consolidate. In the Function box, click the function that you want Excel to use to consolidate the data. In each source sheet, select your data, and then click Add. The file path is entered in All references.

What is := in data table?

Modify / Add / Delete columns To modify an existing column, or create a new one, use the := operator. Using the data. table := operator modifies the existing object 'in place', which has the benefit of being memory-efficient. Memory management is an important aspect of data.


1 Answers

As you wrote, a combination of colnames and mget could get you there.

Consider this:

# retrieve the column names from b - without the key ('v_key')
thecols = setdiff(colnames(b), key(b))

# assign them to a
a[b, (thecols) := mget(thecols)]

This is not too bad-looking, is it?

Besides, I don't think another syntax is currently implemented with data.table. But I would be happy to be proven wrong :)

like image 107
Jealie Avatar answered Sep 27 '22 22:09

Jealie