Suppose you have data like
fruits <- data.table(FruitID=c(1,2,3), Fruit=c("Apple", "Banana", "Strawberry"))
colors <- data.table(ColorID=c(1,2,3,4,5), FruitID=c(1,1,1,2,3), Color=c("Red","Yellow","Green","Yellow","Red"))
tastes <- data.table(TasteID=c(1,2,3), FruitID=c(1,1,3), Taste=c("Sweeet", "Sour", "Sweet"))
setkey(fruits, "FruitID")
setkey(colors, "ColorID")
setkey(tastes, "TasteID")
fruits
FruitID Fruit
1: 1 Apple
2: 2 Banana
3: 3 Strawberry
colors
ColorID FruitID Color
1: 1 1 Red
2: 2 1 Yellow
3: 3 1 Green
4: 4 2 Yellow
5: 5 3 Red
tastes
TasteID FruitID Taste
1: 1 1 Sweeet
2: 2 1 Sour
3: 3 3 Sweet
I typically need to perform left-outer joins on data like this. For instance, "give me all fruits and their colors" requires me to write (and maybe there's a better way?)
setkey(colors, "FruitID")
result <- colors[fruits, allow.cartesian=TRUE]
setkey(colors, "ColorID")
Three lines of code for such a simple and frequent task seemed excessive, so I wrote a method myLeftJoin
myLeftJoin <- function(tbl1, tbl2){
# Performs a left join using the key in tbl1 (i.e. keeps all rows from tbl1 and only matching rows from tbl2)
oldkey <- key(tbl2)
setkeyv(tbl2, key(tbl1))
result <- tbl2[tbl1, allow.cartesian=TRUE]
setkeyv(tbl2, oldkey)
return(result)
}
which I can use like
myLeftJoin(fruits, colors)
ColorID FruitID Color Fruit
1: 1 1 Red Apple
2: 2 1 Yellow Apple
3: 3 1 Green Apple
4: 4 2 Yellow Banana
5: 5 3 Red Strawberry
How can I extend this method so that I can pass any number of tables to it and get the chained left outer join of all of them? Something like myLeftJoin(tbl1, ...)
For instance, I'd like the result of myleftJoin(fruits, colors, tastes)
to be equivalent to
setkey(colors, "FruitID")
setkey(tastes, "FruitID")
result <- tastes[colors[fruits, allow.cartesian=TRUE], allow.cartesian=TRUE]
setkey(tastes, "TasteID")
setkey(colors, "ColorID")
result
TasteID FruitID Taste ColorID Color Fruit
1: 1 1 Sweeet 1 Red Apple
2: 2 1 Sour 1 Red Apple
3: 1 1 Sweeet 2 Yellow Apple
4: 2 1 Sour 2 Yellow Apple
5: 1 1 Sweeet 3 Green Apple
6: 2 1 Sour 3 Green Apple
7: NA 2 NA 4 Yellow Banana
8: 3 3 Sweet 5 Red Strawberry
Perhaps there's an elegant solution using methods in the data.table package that I missed? Thanks
(EDIT: Fixed a mistake in my data)
LEFT (OUTER) JOIN : Returns all records from the left table, and the matched records from the right table. RIGHT (OUTER) JOIN : Returns all records from the right table, and the matched records from the left table. FULL (OUTER) JOIN : Returns all records when there is a match in either left or right table.
Using OUTER-JOINS, it is possible to join only two tables at a time. In order to join three tables, it is necessary to first select two tables to join and then join the third table to the result of the first join.
Yes, indeed! You can use multiple LEFT JOINs in one query if needed for your analysis.
The join operation is used to combine related rows from two tables into a result set. Join is a binary operation. More than two tables can be combined using multiple join operations.
I just committed a new feature in data.table, v1.9.5
, with which we can join without setting keys (that is, specify the columns to join by directly, without having to use setkey()
first):
With that, this is simply:
require(data.table) # v1.9.5+
fruits[tastes, on="FruitID"][colors, on="FruitID"] # no setkey required
# FruitID Fruit TasteID Taste ColorID Color
# 1: 1 Apple 1 Sweeet 1 Red
# 2: 1 Apple 2 Sour 1 Red
# 3: 1 Apple 1 Sweeet 2 Yellow
# 4: 1 Apple 2 Sour 2 Yellow
# 5: 1 Apple 1 Sweeet 3 Green
# 6: 1 Apple 2 Sour 3 Green
# 7: 2 NA NA NA 4 Yellow
# 8: 3 Strawberry 3 Sweet 5 Red
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With