The goal is to create multiple list columns from data columns of a nested data frame. The following code achieves that goal. However, the code is quite long and I wonder if there is a possibility to shorten it by using tidyverse tools (dplyr, purrr etc.). In a non-nested data frame I would use, e. g., dplyr's across().
# R version 3.6.1
library(dplyr) # 1.0.7
library(tidyr) # 1.2.0
df_distribution <- iris %>%
dplyr::group_by(Species) %>%
tidyr::nest() %>%
dplyr::mutate(Sepal.Length = purrr::map(data, ~ dplyr::select(.x, Sepal.Length) %>%
dplyr::group_by(Sepal.Length) %>%
dplyr::summarise(n = n() ) %>%
dplyr::mutate(perc = n / sum(n) ) %>%
dplyr::select(-n) ) ) %>%
dplyr::mutate(Sepal.Width = purrr::map(data, ~ dplyr::select(.x, Sepal.Width) %>%
dplyr::group_by(Sepal.Width) %>%
dplyr::summarise(n = n() ) %>%
dplyr::mutate(perc = n / sum(n) ) %>%
dplyr::select(-n) ) ) %>%
dplyr::mutate(Petal.Length = purrr::map(data, ~ dplyr::select(.x, Petal.Length) %>%
dplyr::group_by(Petal.Length) %>%
dplyr::summarise(n = n() ) %>%
dplyr::mutate(perc = n / sum(n) ) %>%
dplyr::select(-n) ) ) %>%
dplyr::mutate(Petal.Width = purrr::map(data, ~ dplyr::select(.x, Petal.Width) %>%
dplyr::group_by(Petal.Width) %>%
dplyr::summarise(n = n() ) %>%
dplyr::mutate(perc = n / sum(n) ) %>%
dplyr::select(-n) ) )
My ultimate goal is to use the created empirical distributions to randomly draw from them. However, that step is not part of the provided code but I would appreciate any pointer to helpful ressources for that, too.
We could make a custom function to use within purrr::map:
library(dplyr)
library(tidyr)
library(purrr)
f <- function(data, col) {
data %>%
group_by({{ col }}) %>%
summarise(n = n()) %>%
mutate(perc = n / sum(n)) %>%
select(-n)
}
df_distributionNew <- iris %>%
group_by(Species) %>%
nest() %>%
mutate(
Sepal.Length = map(data, ~ f(.x, Sepal.Length)),
Sepal.Width = map(data, ~ f(.x, Sepal.Width)),
Petal.Length = map(data, ~ f(.x, Petal.Length)),
Petal.Width = map(data, ~ f(.x, Petal.Width))
)
identical(df_distribution, df_distributionNew)
# [1] TRUE
There is still a repetition within mutate, not sure how to fix that.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With