The data.table
package provides many of the same table handling methods as SQL. If a table has a key, that key consists of one or more columns. But a table can't have more than one key, because it can't be sorted in two different ways at the same time.
In this example, X
and Y
are data.table
s with a single key column "id"; Y
also has a non-key column "x_id".
X <- data.table(id = 1:5, a=4:8,key="id")
Y <- data.table(id = c(1,1, 3,5,7), x_id=c(1,4:1), key="id")
The following syntax would join the tables on their keys:
X[Y]
How can I translate the following SQL syntax to data.table code?
select * from X join Y on X.id = Y.x_id;
The closest that I have gotten is:
Y[X,list(id, x_id),by = x_id,nomatch=0]
However, this does not do the same inner join as the SQL statement.
Here is a more clear example in which the foreign key is y_id, and we want the join to look up values of Y2 where X2$y_id = Y2$id
.
X2 <- data.table(id = 1:5, y_id = c(1,1,2,2,2), key="id")
Y2 <- data.table(id = 1:5, b = letters[1:5], key="id")
I would like to produce the table:
id y_id b
1 1 "a"
2 1 "a"
3 2 "b"
4 2 "b"
5 2 "b"
similar to what is done by the following kludge:
> merge(data.frame(X2), data.frame(Y2), by.x = "y_id", by.y = "id")
y_id id b
1 1 1 a
2 1 2 a
3 2 3 b
4 2 4 b
5 2 5 b
However, when I do this:
X2[Y2, 1:2,by = y_id]
I do not get the desired result:
y_id V1
[1,] 1 1
[2,] 1 2
[3,] 2 1
[4,] 2 2
A FOREIGN KEY enforces data integrity, making sure the data confirms to some rules when it is added to the DB. A JOIN is used when you extract/query data from the DB by giving rules how to select the data. JOIN s work if there are FK or not. FK's work if you extract data with or without JOIN s.
A foreign key is a column or group of columns in one table that contains values that match the primary key in another table. Foreign keys are used to join tables. The following figure shows the primary and foreign keys of the customer and orders tables from the demonstration database.
Inner Join clause in SQL Server creates a new table (not physical) by combining rows that have matching values in two or more tables. This join is based on a logical relationship (or a common field) between the tables and is used to retrieve data that appears in both tables.
FOREIGN KEY constraints can reference another column in the same table, and is referred to as a self-reference. A FOREIGN KEY constraint specified at the column level can list only one reference column. This column must have the same data type as the column on which the constraint is defined.
Good question. Note the following (admittedly buried) in ?data.table
:
When
i
is adata.table
,x
must have a key.i
is joined tox
using the key and the rows inx
that match are returned. An equi-join is performed between each column ini
to each column inx
's key. The match is a binary search in compiled C in O(log n) time. Ifi
has less columns thanx
's key then many rows ofx
may match to each row ofi
. Ifi
has more columns thanx
's key, the columns ofi
not involved in the join are included in the result. Ifi
also has a key, it isi
's key columns that are used to match tox
's key columns and a binary merge of the two tables is carried out.
So, the key here is that i
doesn't have to be keyed. Only x
must be keyed.
X2 <- data.table(id = 11:15, y_id = c(14,14,11,12,12), key="id")
id y_id
[1,] 11 14
[2,] 12 14
[3,] 13 11
[4,] 14 12
[5,] 15 12
Y2 <- data.table(id = 11:15, b = letters[1:5], key="id")
id b
[1,] 11 a
[2,] 12 b
[3,] 13 c
[4,] 14 d
[5,] 15 e
Y2[J(X2$y_id)] # binary search for each item of (unsorted and unkeyed) i
id b
[1,] 14 d
[2,] 14 d
[3,] 11 a
[4,] 12 b
[5,] 12 b
or,
Y2[SJ(X2$y_id)] # binary merge of keyed i, see ?SJ
id b
[1,] 11 a
[2,] 12 b
[3,] 12 b
[4,] 14 d
[5,] 14 d
identical(Y2[J(X2$y_id)], Y2[X2$y_id])
[1] FALSE
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With