Consider this example
mydata <- data_frame(ind_1 = c(NA,NA,3,4),
ind_2 = c(2,3,4,5),
ind_3 = c(5,6,NA,NA),
y = c(28,34,25,12),
group = c('a','a','b','b'))
> mydata
# A tibble: 4 x 5
ind_1 ind_2 ind_3 y group
<dbl> <dbl> <dbl> <dbl> <chr>
1 NA 2 5 28 a
2 NA 3 6 34 a
3 3 4 NA 25 b
4 4 5 NA 12 b
Here I want, for each group, regress y on whatever variable is not missing in that group, and store the corresponding lm object in a list-column.
That is:
a, these variables correspond to ind_2 and ind_3
b, they correspond to ind_1 and ind_2
I tried the following but this does not work
mydata %>% group_by(group) %>% nest() %>%
do(filtered_df <- . %>% select(which(colMeans(is.na(.)) == 0)),
myreg = lm(y~ names(filtered_df)))
Any ideas? Thanks!
To select rows of an R data frame that are non-Na, we can use complete. cases function with single square brackets. For example, if we have a data frame called that contains some missing values (NA) then the selection of rows that are non-NA can be done by using the command df[complete. cases(df),].
The SAS function N calculates the number of non-blank numeric values across multiple columns. To count the number of missing numeric values, you can use NMISS function. Note - The N(of x--a) is equivalent to N(x, y, z, a).
map() returns a list or a data frame; map_lgl() , map_int() , map_dbl() and map_chr() return vectors of the corresponding type (or die trying); map_df() returns a data frame by row-binding the individual elements.
We can use map and mutate. We can either select and model in one step (nestdat1) or in separate steps using two map's if you want to preserve the filtered data (nestdat2):
library(tidyverse)
nestdat1 <- mydata %>%
group_by(group) %>%
nest() %>%
mutate(model = data %>% map(~ select_if(., funs(!any(is.na(.)))) %>%
lm(y ~ ., data = .)))
nestdat2 <- mydata %>%
group_by(group) %>%
nest() %>%
mutate(data = data %>% map(~ select_if(., funs(!any(is.na(.))))),
model = data %>% map(~ lm(y ~ ., data = .)))
Output:
They produce different data columns:
> nestdat1 %>% pull(data)
[[1]]
# A tibble: 2 x 4
ind_1 ind_2 ind_3 y
<dbl> <dbl> <dbl> <dbl>
1 NA 2 5 28
2 NA 3 6 34
[[2]]
# A tibble: 2 x 4
ind_1 ind_2 ind_3 y
<dbl> <dbl> <dbl> <dbl>
1 3 4 NA 25
2 4 5 NA 12
> nestdat2 %>% pull(data)
[[1]]
# A tibble: 2 x 3
ind_2 ind_3 y
<dbl> <dbl> <dbl>
1 2 5 28
2 3 6 34
[[2]]
# A tibble: 2 x 3
ind_1 ind_2 y
<dbl> <dbl> <dbl>
1 3 4 25
2 4 5 12
But the same model column:
> nestdat1 %>% pull(model)
[[1]]
Call:
lm(formula = y ~ ., data = .)
Coefficients:
(Intercept) ind_2 ind_3
16 6 NA
[[2]]
Call:
lm(formula = y ~ ., data = .)
Coefficients:
(Intercept) ind_1 ind_2
64 -13 NA
> nestdat2 %>% pull(model)
[[1]]
Call:
lm(formula = y ~ ., data = .)
Coefficients:
(Intercept) ind_2 ind_3
16 6 NA
[[2]]
Call:
lm(formula = y ~ ., data = .)
Coefficients:
(Intercept) ind_1 ind_2
64 -13 NA
Here's another tidyverse option, assign to mydata$model if you wish to keep it in your tibble :
library(tidyverse)
mydata %>%
nest(-group) %>%
pull(data) %>%
map(~lm(y ~., discard(.,anyNA)))
# [[1]]
#
# Call:
# lm(formula = y ~ ., data = discard(., anyNA))
#
# Coefficients:
# (Intercept) ind_2 ind_3
# 16 6 NA
#
#
# [[2]]
#
# Call:
# lm(formula = y ~ ., data = discard(., anyNA))
#
# Coefficients:
# (Intercept) ind_1 ind_2
# 64 -13 NA
#
#
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With