Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What does < stand for in data.table joins with on=

Joining the data tables:

X <- data.table(A = 1:4, B = c(1,1,1,1)) 
#    A B
# 1: 1 1
# 2: 2 1
# 3: 3 1
# 4: 4 1

Y <- data.table(A = 4)
#    A
# 1: 4

via

X[Y, on = .(A == A)]
#    A B
# 1: 4 1

returns the expected result. However, I would expect the line:

X[Y, on = .(A < A)]
#    A B
# 1: 4 1
# 2: 4 1
# 3: 4 1

to return

   A B
1: 1 1
2: 2 1
3: 3 1

because the keyword on:

Indicate which columns in x should be joined with which columns in i along with the type of binary operator to join with

according to ?data.table. The way the joining is done is not explicitly mentioned, and certainly it is not as I have guessed. How exactly < joins columns in x with columns in i?

like image 489
FOMH Avatar asked Oct 13 '18 12:10

FOMH


People also ask

How do you read a data table?

A table can be read from left to right or from top to bottom. If you read a table across the row, you read the information from left to right. In the Cats and Dogs Table, the number of black animals is 2 + 2 = 4. You'll see that those are the numbers in the row directly to the right of the word 'Black.

How do I merge two data tables in R?

To join two data frames (datasets) vertically, use the rbind function. The two data frames must have the same variables, but they do not have to be in the same order. If data frameA has variables that data frameB does not, then either: Delete the extra variables in data frameA or.

What are data tables?

A data table is a range of cells in which you can change values in some of the cells and come up with different answers to a problem. A good example of a data table employs the PMT function with different loan amounts and interest rates to calculate the affordable amount on a home mortgage loan.


2 Answers

When doing a non-equi join like X[Y, on = .(A < A)] data.table returns the A-column from Y (the i-data.table).

To get the desired result, you could do:

X[Y, on = .(A < A), .(A = x.A, B)]

which gives:

   A B
1: 1 1
2: 2 1
3: 3 1

In the next release, data.table will return both A columns. See here for the discussion.

like image 156
Jaap Avatar answered Dec 11 '22 01:12

Jaap


You're partially correct. The missing piece of the puzzle is that (currently) when you perform any join, including a non-equi join with <, a single column is returned for the join colum (A in your example). This columns takes the values from the data.table on the right side of the join, in this case the values in A from Y.

Here's an illustrated example:

Illustration of current non-equi join behaviour

We're planning to change this behaviour in a future version of data.table so that both columns will be returned in the case of non-equi joins. See pull requests https://github.com/Rdatatable/data.table/pull/2706 and https://github.com/Rdatatable/data.table/pull/3093.

like image 29
Scott Ritchie Avatar answered Dec 11 '22 03:12

Scott Ritchie