Consider this example
mydata <- data_frame(ind_1 = c(NA,NA,3,4),
ind_2 = c(2,3,4,5),
ind_3 = c(5,6,NA,NA),
y = c(28,34,25,12),
group = c('a','a','b','b'))
> mydata
# A tibble: 4 x 5
ind_1 ind_2 ind_3 y group
<dbl> <dbl> <dbl> <dbl> <chr>
1 NA 2 5 28 a
2 NA 3 6 34 a
3 3 4 NA 25 b
4 4 5 NA 12 b
Here I want, for each group
, regress y
on whatever variable is not missing in that group, and store the corresponding lm
object in a list-column
.
That is:
a
, these variables correspond to ind_2
and ind_3
b
, they correspond to ind_1
and ind_2
I tried the following but this does not work
mydata %>% group_by(group) %>% nest() %>%
do(filtered_df <- . %>% select(which(colMeans(is.na(.)) == 0)),
myreg = lm(y~ names(filtered_df)))
Any ideas? Thanks!
To select rows of an R data frame that are non-Na, we can use complete. cases function with single square brackets. For example, if we have a data frame called that contains some missing values (NA) then the selection of rows that are non-NA can be done by using the command df[complete. cases(df),].
The SAS function N calculates the number of non-blank numeric values across multiple columns. To count the number of missing numeric values, you can use NMISS function. Note - The N(of x--a) is equivalent to N(x, y, z, a).
map() returns a list or a data frame; map_lgl() , map_int() , map_dbl() and map_chr() return vectors of the corresponding type (or die trying); map_df() returns a data frame by row-binding the individual elements.
We can use map
and mutate
. We can either select
and model in one step (nestdat1
) or in separate steps using two map
's if you want to preserve the filtered data (nestdat2
):
library(tidyverse)
nestdat1 <- mydata %>%
group_by(group) %>%
nest() %>%
mutate(model = data %>% map(~ select_if(., funs(!any(is.na(.)))) %>%
lm(y ~ ., data = .)))
nestdat2 <- mydata %>%
group_by(group) %>%
nest() %>%
mutate(data = data %>% map(~ select_if(., funs(!any(is.na(.))))),
model = data %>% map(~ lm(y ~ ., data = .)))
Output:
They produce different data
columns:
> nestdat1 %>% pull(data)
[[1]]
# A tibble: 2 x 4
ind_1 ind_2 ind_3 y
<dbl> <dbl> <dbl> <dbl>
1 NA 2 5 28
2 NA 3 6 34
[[2]]
# A tibble: 2 x 4
ind_1 ind_2 ind_3 y
<dbl> <dbl> <dbl> <dbl>
1 3 4 NA 25
2 4 5 NA 12
> nestdat2 %>% pull(data)
[[1]]
# A tibble: 2 x 3
ind_2 ind_3 y
<dbl> <dbl> <dbl>
1 2 5 28
2 3 6 34
[[2]]
# A tibble: 2 x 3
ind_1 ind_2 y
<dbl> <dbl> <dbl>
1 3 4 25
2 4 5 12
But the same model
column:
> nestdat1 %>% pull(model)
[[1]]
Call:
lm(formula = y ~ ., data = .)
Coefficients:
(Intercept) ind_2 ind_3
16 6 NA
[[2]]
Call:
lm(formula = y ~ ., data = .)
Coefficients:
(Intercept) ind_1 ind_2
64 -13 NA
> nestdat2 %>% pull(model)
[[1]]
Call:
lm(formula = y ~ ., data = .)
Coefficients:
(Intercept) ind_2 ind_3
16 6 NA
[[2]]
Call:
lm(formula = y ~ ., data = .)
Coefficients:
(Intercept) ind_1 ind_2
64 -13 NA
Here's another tidyverse
option, assign to mydata$model
if you wish to keep it in your tibble
:
library(tidyverse)
mydata %>%
nest(-group) %>%
pull(data) %>%
map(~lm(y ~., discard(.,anyNA)))
# [[1]]
#
# Call:
# lm(formula = y ~ ., data = discard(., anyNA))
#
# Coefficients:
# (Intercept) ind_2 ind_3
# 16 6 NA
#
#
# [[2]]
#
# Call:
# lm(formula = y ~ ., data = discard(., anyNA))
#
# Coefficients:
# (Intercept) ind_1 ind_2
# 64 -13 NA
#
#
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With