I have two sets of data:
The first data frame (small) is relatively smaller than the second data frame (large). Each data frame has an id column with unique identifiers. The smaller data frame has a list column called links, which contains a list of links to the larger second data frame. The larger data frame has a column of attributes, we'll call att:
library(tidyverse)
a <- c(3, 3, NA, 5)
b <- c(NA, 3, 4, 5)
small <- tibble(id = c(1, 2),
                links = list(a, b))
large <- tibble(id = c(3, 4, 5),
                att = c("yes", "no", "maybe"))
My goal is to count the number of times each observation in the small data frame has links to observations with the "yes" attribute in the large data frame.
I feel like something like this is on the right track, but it isn't quite counting correctly:
counted <- small %>%
  mutate(count_yes = map_int(links, ~ sum(large$att[large$id %in% .x] == "yes")))
print(counted)
#> # A tibble: 2 × 3
#>      id links     count_yes
#>   <dbl> <list>        <int>
#> 1     1 <dbl [4]>         1
#> 2     2 <dbl [4]>         1
Here, count_yes appears as only 1, when it should read as a 2 and a 1.
You are on the right track but need some adjustment.
small %>%
  mutate(count_yes = map_int(links, ~sum(.x %in% large$id[large$att %in% "yes"])))
#     id links     count_yes
#  <dbl> <list>        <int>
#1     1 <dbl [4]>         2
#2     2 <dbl [4]>         1
Or in base R :
sapply(small$links, \(x) sum(x %in% large$id[large$att %in% "yes"]))
Note the use of %in% instead of == would return FALSE for NA values.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With