I need to join a table with itself in order to realize inheritance of a value in one column, as follows:
There are two types of rows, base and dep (for "dependent").
All rows have a key, but dep rows also have a basekey referring to a base row. There is a column val and any number of other columns.
My goal: Obtain all dep rows, with their val replaced by the val of the corresponding base row.
Here is an example:
library(tidyverse)
tb = tribble(
~key, ~type, ~basekey, ~val, ~col5, ~colN,
"a", "base", NA, 17, 1, 100,
"b", "base", NA, 21, 4, 130,
"d", "dep", "a", NA, 11, 320,
"g", "dep", "a", NA, 14, 420,
"n", "dep", "a", NA, 13, 230,
"y", "dep", "b", NA, 66, 665
)
My actual data has a few hundred rows and N=20 columns, so runtime or memory performance is not an issue, but operations requiring a complete column list should be avoided.
Previously, I used the sqldf package, which can express my requirement nicely:
sqldf("select tb.*, tb2.val as baseval
from tb, tb as tb2
where tb2.key = tb.basekey")
The result is almost exactly what I want; I only need to map baseval to val or live with the longer name (which is OK for me):
key type basekey val col5 colN baseval
1 d dep a NA 11 320 17
2 g dep a NA 14 420 17
3 n dep a NA 13 230 17
4 y dep b NA 66 665 21
But then I started learning ggplot2 and encountered the tidyverse.
After some reading, I decided to throw myself into its arms and rework my code according to its readable and tidy style.
readr and tibble worked like a charm, but with dplyr I am not yet entirely happy; my use case appears to be one that is not supported well (maybe it's just that my thinking is not taking the best route yet).
I need to do a left join, but although all I want is the val column, there appears to be no method (as of dplyr 0.7.4) for suppressing the others:
tbj = left_join(filter(tb, type=='dep'), tb, by=c(basekey='key'))
which gives me this:
# A tibble: 4 x 11
key type.x basekey val.x col5.x colN.x type.y basekey.y val.y col5.y colN.y
<chr> <chr> <chr> <dbl> <dbl> <dbl> <chr> <chr> <dbl> <dbl> <dbl>
1 d dep a NA 11 320 base <NA> 17 1 100
2 g dep a NA 14 420 base <NA> 17 1 100
3 n dep a NA 13 230 base <NA> 17 1 100
4 y dep b NA 66 665 base <NA> 21 4 130
Whew! To get what I want, I now need to
val.y to val.*.y columnsval.x*.x columns to remove the suffixThis sounds laborious.
How to do it best?
Is this even the right approach or is there a more dplyr-ish way to achieve my goal?
If I understand your question, how about the following approach with dplyr: Remove val from the dep subset, since it's empty anyway. Then, keep only key and val in the base subset, rename key to basekey and join. Then post-join renaming isn't needed and you don't get any extra columns.
tb %>%
filter(type=="dep") %>% select(-val) %>%
left_join(tb %>% filter(type=="base") %>% select(basekey=key, val))
key type basekey col5 colN val <chr> <chr> <chr> <dbl> <dbl> <dbl> 1 d dep a 11 320 17 2 g dep a 14 420 17 3 n dep a 13 230 17 4 y dep b 66 665 21
First, you join the two tables and then select the columns needed. You can take advantage of the argument suffix by leaving the column names of the first table unchanged. Then selecting all columns from the first table and adding the needed column from the second table: val.y.
val.y from the second.tb %>% filter(type == "dep") %>%
left_join(tb %>% filter(type == "base"),
by = c("basekey" = "key"),
suffix = c("", ".y")) %>%
select(names(tb), "val.y")
Output:
# A tibble: 4 x 7
key type basekey val col5 colN val.y
<chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
1 d dep a NA 11 320 17
2 g dep a NA 14 420 17
3 n dep a NA 13 230 17
4 y dep b NA 66 665 21
val from the first table , plus val.y from the second.tb %>% filter(type == "dep") %>%
left_join(tb %>% filter(type == "base"),
by = c("basekey" = "key"),
suffix = c("", ".y")) %>%
select(names(tb), -val, "val.y")
Output:
# A tibble: 4 x 6
key type basekey col5 colN val.y
<chr> <chr> <chr> <dbl> <dbl> <dbl>
1 d dep a 11 320 17
2 g dep a 14 420 17
3 n dep a 13 230 17
4 y dep b 66 665 21
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With