I am trying to collate results from a simulation study using dplyr and purrr. My results are saved as a list of data frames with the results from several different classification algorithms, and I'm trying to use purrr and dplyr to summarize these results.
I'm trying to calculate - number of objects assigned to each cluster - number of objects in the cluster that actually belong to the cluster - number of true positives, false positives, false negatives, and true negatives using 3 different algorithms (KEEP1 - KEEP3) - for 2 of the algorithms, I have access to a probability of being in the cluster, so I can compare this to alternate choices of alpha - and so I can calculate true positives etc. using a different choice of alpha.
I found this: https://github.com/tidyverse/dplyr/issues/3101, which I used successfully on a single element of the list to get exactly what I wanted:
f <- function(.x, .y) {
sum(.x & .y)
}
actions <- list(
.vars = lst(
c('correct'),
c('KEEP1', 'KEEP2', 'KEEP3'),
c('pval1', 'pval2')
),
.funs = lst(
funs(Nk = length, N_correct = sum),
funs(
TP1 = f(., .y = correct),
FN1 = f(!(.), .y = correct),
TN1 = f(!(.), .y = !(correct)),
FP1 = f(., .y = !(correct))
),
funs(
TP2 = f((. < alpha0) , .y = correct),
FN2 = f(!(. < alpha0), .y = correct),
TN2 = f(!(. < alpha0), .y = !(correct)),
FP2 = f((. < alpha0), .y = !(correct))
)
)
)
reproducible_data <- replicate(2,
data_frame(
k = factor(rep(1:10, each = 20)), # group/category
correct = sample(x = c(TRUE, FALSE), 10 * 20, replace = TRUE, prob = c(.8, .2)),
pval1 = rbeta(10 * 20, 1, 10),
pval2 = rbeta(10 * 20, 1, 10),
KEEP1 = pval1 < 0.05,
KEEP2 = pval2 < 0.05,
KEEP3 = runif(10 * 20) > .2,
alpha0 = 0.05,
alpha = 0.05 / 20 # divided by no. of objects in each group (k)
),
simplify = FALSE)
# works
df1 <- reproducible_data[[1]]
pmap(actions, ~df1 %>% group_by(k) %>% summarize_at(.x, .y)) %>%
reduce(inner_join,by = 'k')
Now, I want to use map to do this to the entire list. However, I can no longer access the variable "correct" (it hasn't gotten far enough to not see alpha or alpha0, but presumably the same issue will occur). I'm still learning dplyr/purrr, but my experimenting hasn't proved useful.
# does not work
out_summary <- map(
reproducible_data,
pmap(actions, ~ as_tibble(.) %>% group_by("k") %>% summarize_at(.x, .y)) %>%
reduce(inner_join,by = 'k')
)
# this doesn't either
out_summary <- map(
reproducible_data,
pmap(actions, ~ as_tibble(.) %>% group_by("k") %>% summarize_at(.x, .y, alpha = alpha, alpha0 = alpha0, correct = correct)) %>%
reduce(inner_join,by = 'k')
)
Within map, I don't see the variable 'k' in $group_by(k)$ unless it is quoted $group_by('k')$, but I do not need to quote it when I just used pmap. I've tried various ways to pass the correct variables to these functions, but I'm still learning dplyr and purrr, and haven't succeeded yet.
One more note - the actual data is stored as a regular data frame, so I need $as_tibble()$ in the pmap function. I was running into some different errors when I removed it in this example, so I opted to add it back so I would get the same issues. Thanks!
map(
reproducible_data,
function(df1) {
pmap(actions, ~ df1 %>%
as_tibble() %>%
group_by(k) %>%
summarize_at(.x, .y)) %>%
reduce(inner_join, by = "k")
}
)
I think your arguments might get mixed up when using map
and pmap
at the same time. I used the function
syntax for map
to define df1
to try to fix that. The rest of it looks ok (although I switched to pmap_df
to return a dataframe (the structure of the list was ugly without it and pmap_df
was the easiest way to make it pretty. Lmk if it's not the expected output. 👍
group_by("k")
vs. group_by(k)
Also: writing group_by("k")
actually creates a variable "k" and fills it with characters "k", then uses that to group. That will get your code to run, but it won't do what you like. Sometimes that kind of problem is really because of an error that occurs a line or two before (or, with dplyr
, a pipe or two before). In this case, map
wasn't passing df1
where you needed it.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With