I need to join a table with itself in order to realize inheritance of a value in one column, as follows:
There are two types of rows, base
and dep
(for "dependent").
All rows have a key
, but dep
rows also have a basekey
referring to a base
row. There is a column val
and any number of other columns.
My goal: Obtain all dep
rows, with their val
replaced by the val
of the corresponding base
row.
Here is an example:
library(tidyverse)
tb = tribble(
~key, ~type, ~basekey, ~val, ~col5, ~colN,
"a", "base", NA, 17, 1, 100,
"b", "base", NA, 21, 4, 130,
"d", "dep", "a", NA, 11, 320,
"g", "dep", "a", NA, 14, 420,
"n", "dep", "a", NA, 13, 230,
"y", "dep", "b", NA, 66, 665
)
My actual data has a few hundred rows and N=20
columns, so runtime or memory performance is not an issue, but operations requiring a complete column list should be avoided.
Previously, I used the sqldf
package, which can express my requirement nicely:
sqldf("select tb.*, tb2.val as baseval
from tb, tb as tb2
where tb2.key = tb.basekey")
The result is almost exactly what I want; I only need to map baseval
to val
or live with the longer name (which is OK for me):
key type basekey val col5 colN baseval
1 d dep a NA 11 320 17
2 g dep a NA 14 420 17
3 n dep a NA 13 230 17
4 y dep b NA 66 665 21
But then I started learning ggplot2
and encountered the tidyverse.
After some reading, I decided to throw myself into its arms and rework my code according to its readable and tidy style.
readr
and tibble
worked like a charm, but with dplyr
I am not yet entirely happy; my use case appears to be one that is not supported well (maybe it's just that my thinking is not taking the best route yet).
I need to do a left join, but although all I want is the val
column, there appears to be no method (as of dplyr
0.7.4) for suppressing the others:
tbj = left_join(filter(tb, type=='dep'), tb, by=c(basekey='key'))
which gives me this:
# A tibble: 4 x 11
key type.x basekey val.x col5.x colN.x type.y basekey.y val.y col5.y colN.y
<chr> <chr> <chr> <dbl> <dbl> <dbl> <chr> <chr> <dbl> <dbl> <dbl>
1 d dep a NA 11 320 base <NA> 17 1 100
2 g dep a NA 14 420 base <NA> 17 1 100
3 n dep a NA 13 230 base <NA> 17 1 100
4 y dep b NA 66 665 base <NA> 21 4 130
Whew! To get what I want, I now need to
val.y
to val
.*.y
columnsval.x
*.x
columns to remove the suffixThis sounds laborious.
How to do it best?
Is this even the right approach or is there a more dplyr
-ish way to achieve my goal?
If I understand your question, how about the following approach with dplyr
: Remove val
from the dep
subset, since it's empty anyway. Then, keep only key
and val
in the base
subset, rename key
to basekey
and join. Then post-join renaming isn't needed and you don't get any extra columns.
tb %>%
filter(type=="dep") %>% select(-val) %>%
left_join(tb %>% filter(type=="base") %>% select(basekey=key, val))
key type basekey col5 colN val <chr> <chr> <chr> <dbl> <dbl> <dbl> 1 d dep a 11 320 17 2 g dep a 14 420 17 3 n dep a 13 230 17 4 y dep b 66 665 21
First, you join the two tables and then select the columns needed. You can take advantage of the argument suffix
by leaving the column names of the first table unchanged. Then selecting all columns from the first table and adding the needed column from the second table: val.y
.
val.y
from the second.tb %>% filter(type == "dep") %>%
left_join(tb %>% filter(type == "base"),
by = c("basekey" = "key"),
suffix = c("", ".y")) %>%
select(names(tb), "val.y")
Output:
# A tibble: 4 x 7
key type basekey val col5 colN val.y
<chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
1 d dep a NA 11 320 17
2 g dep a NA 14 420 17
3 n dep a NA 13 230 17
4 y dep b NA 66 665 21
val
from the first table , plus val.y
from the second.tb %>% filter(type == "dep") %>%
left_join(tb %>% filter(type == "base"),
by = c("basekey" = "key"),
suffix = c("", ".y")) %>%
select(names(tb), -val, "val.y")
Output:
# A tibble: 4 x 6
key type basekey col5 colN val.y
<chr> <chr> <chr> <dbl> <dbl> <dbl>
1 d dep a 11 320 17
2 g dep a 14 420 17
3 n dep a 13 230 17
4 y dep b 66 665 21
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With