Occasionally, I need to join a data frame to (usually a modified) version of itself within a dplyr chain. Something like this:
df <- data.frame(
id = c(1,2,3)
, status = c('foo','bar','meh')
, spouseid = c(4,3,2)
)
df %>%
filter( status == 'foo' | status == 'bar') %>%
# join the filtered table to itself using the dot as the right-hand side
left_join(., by = c('id' = 'spouseid'))
When I try that, I get Error in is.data.frame(y) : argument "y" is missing, with no default
.
Joins with dplyr. dplyr uses SQL database syntax for its join functions. A left join means: Include everything on the left (what was the x data frame in merge() ) and all rows that match from the right (y) data frame. If the join columns have the same name, all you need is left_join(x, y) .
Inner join: merge(df1, df2) will work for these examples because R automatically joins the frames by common variable names, but you would most likely want to specify merge(df1, df2, by = "CustomerId") to make sure that you were matching on only the fields you desired. You can also use the by. x and by.
All of the dplyr functions take a data frame (or tibble) as the first argument. Rather than forcing the user to either save intermediate objects or nest functions, dplyr provides the %>% operator from magrittr. x %>% f(y) turns into f(x, y) so the result from one step is then “piped” into the next step.
A self-join, also known as an inner join, is a structured query language (SQL) statement where a queried table is joined to itself. The self-join statement is necessary when two sets of data, within the same table, are compared.
The problem is that using the dot just moves around the left hand side, so the way it's written above only passes the lhs into left_join()
. To use the dot for both the left- and right-hand sides, use the dot twice:
df %>%
filter( status == 'foo' | status == 'bar') %>%
# the first dot is x argument and the second dot is the y argument
left_join(
x = .
, y = .
, by = c('id' = 'spouseid')
)
This way, you're passing the lhs to both arguments of left_join()
rather than relying on magrittr's implicit lhs like you normally would.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With