Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R data.table doing an inner join on a field and operating on another?

Tags:

r

data.table

I have the following scenario, I first create a data table as shown below

x = data.table(f1 = c('a','b','c','d'))
x = x[,rn := .I]

This yields

> x
   f1 rn
1:  a  1
2:  b  2
3:  c  3
4:  d  4
>

Where rn is simply the row number. Now, I have another data.table y as

y = data.table(f2=c('b','c','f'))

What I would like to be able to do is for elements in y that are in x, I want to subtract 2 from the corresponding values in rn. So the expected data.table is

x
   f1 rn
1: a  1
2: b  0
3: c  1
4: d  4

How does one get to this? x[y] and y[x] don't help at all as they just do joins.

like image 922
broccoli Avatar asked Nov 07 '13 23:11

broccoli


People also ask

How do you inner join two datasets in R?

We can merge two data frames in R by using the merge() function or by using family of join() function in dplyr package. The data frames must have same column names on which the merging happens.

How do I merge two Dataframes based on a column in R?

The merge() function in base R can be used to merge input dataframes by common columns or row names. The merge() function retains all the row names of the dataframes, behaving similarly to the inner join. The dataframes are combined in order of the appearance in the input function call.

How do I join two data tables in R?

To join two data frames (datasets) vertically, use the rbind function. The two data frames must have the same variables, but they do not have to be in the same order. If data frameA has variables that data frameB does not, then either: Delete the extra variables in data frameA or.

What is a cross join in R?

What is a Cross Join In R? An cross join is a merge operation between two data frame which seeks to return every matching records in the two data frames, including ones that lack a matching pair on the other side. Cross joins have a bad reputation among developers.


1 Answers

You can use %chin% in i to subset x by the required rows and then run your j expression...

x[ f1 %chin% y$f2 , rn := rn - 2L ]
x
#   f1 rn
#1:  a  1
#2:  b  0
#3:  c  1
#4:  d  4

%chin% is a fast version of the %in% operator specifically for character vectors that comes with data.table. Note that 2 should be 2L to specify an "integer" type, otherwise you will get a warning (obviously don't use this if you are dealing with "numeric" data types).

like image 107
Simon O'Hanlon Avatar answered Sep 30 '22 13:09

Simon O'Hanlon