Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

select non-missing variables in a purrr loop

Tags:

r

dplyr

purrr

lm

Consider this example

mydata <- data_frame(ind_1 = c(NA,NA,3,4),
                     ind_2 = c(2,3,4,5),
                     ind_3 = c(5,6,NA,NA),
                     y = c(28,34,25,12),
                     group = c('a','a','b','b'))

> mydata
# A tibble: 4 x 5
  ind_1 ind_2 ind_3     y group
  <dbl> <dbl> <dbl> <dbl> <chr>
1    NA     2     5    28 a    
2    NA     3     6    34 a    
3     3     4    NA    25 b    
4     4     5    NA    12 b 

Here I want, for each group, regress y on whatever variable is not missing in that group, and store the corresponding lm object in a list-column.

That is:

  • for group a, these variables correspond to ind_2 and ind_3
  • for group b, they correspond to ind_1 and ind_2

I tried the following but this does not work

mydata %>% group_by(group) %>% nest() %>% 
  do(filtered_df <- . %>% select(which(colMeans(is.na(.)) == 0)),
     myreg = lm(y~ names(filtered_df)))

Any ideas? Thanks!

like image 989
ℕʘʘḆḽḘ Avatar asked Sep 10 '18 19:09

ℕʘʘḆḽḘ


People also ask

How do I extract non missing values in R?

To select rows of an R data frame that are non-Na, we can use complete. cases function with single square brackets. For example, if we have a data frame called that contains some missing values (NA) then the selection of rows that are non-NA can be done by using the command df[complete. cases(df),].

How do you find non missing values in SAS?

The SAS function N calculates the number of non-blank numeric values across multiple columns. To count the number of missing numeric values, you can use NMISS function. Note - The N(of x--a) is equivalent to N(x, y, z, a).

What does map_ df() do in R?

map() returns a list or a data frame; map_lgl() , map_int() , map_dbl() and map_chr() return vectors of the corresponding type (or die trying); map_df() returns a data frame by row-binding the individual elements.


2 Answers

We can use map and mutate. We can either select and model in one step (nestdat1) or in separate steps using two map's if you want to preserve the filtered data (nestdat2):

library(tidyverse)

nestdat1 <- mydata %>%
  group_by(group) %>%
  nest() %>%
  mutate(model = data %>% map(~ select_if(., funs(!any(is.na(.)))) %>%
                                lm(y ~ ., data = .)))

nestdat2 <- mydata %>%
  group_by(group) %>%
  nest() %>%
  mutate(data = data %>% map(~ select_if(., funs(!any(is.na(.))))),
         model = data %>% map(~ lm(y ~ ., data = .)))

Output:

They produce different data columns:

> nestdat1 %>% pull(data)
[[1]]
# A tibble: 2 x 4
  ind_1 ind_2 ind_3     y
  <dbl> <dbl> <dbl> <dbl>
1    NA     2     5    28
2    NA     3     6    34

[[2]]
# A tibble: 2 x 4
  ind_1 ind_2 ind_3     y
  <dbl> <dbl> <dbl> <dbl>
1     3     4    NA    25
2     4     5    NA    12

> nestdat2 %>% pull(data)
[[1]]
# A tibble: 2 x 3
  ind_2 ind_3     y
  <dbl> <dbl> <dbl>
1     2     5    28
2     3     6    34

[[2]]
# A tibble: 2 x 3
  ind_1 ind_2     y
  <dbl> <dbl> <dbl>
1     3     4    25
2     4     5    12

But the same model column:

> nestdat1 %>% pull(model)
[[1]]

Call:
lm(formula = y ~ ., data = .)

Coefficients:
(Intercept)        ind_2        ind_3  
         16            6           NA  

[[2]]

Call:
lm(formula = y ~ ., data = .)

Coefficients:
(Intercept)        ind_1        ind_2  
         64          -13           NA  


> nestdat2 %>% pull(model)
[[1]]

Call:
lm(formula = y ~ ., data = .)

Coefficients:
(Intercept)        ind_2        ind_3  
         16            6           NA  

[[2]]

Call:
lm(formula = y ~ ., data = .)

Coefficients:
(Intercept)        ind_1        ind_2  
         64          -13           NA 
like image 168
acylam Avatar answered Oct 17 '22 00:10

acylam


Here's another tidyverse option, assign to mydata$model if you wish to keep it in your tibble :

library(tidyverse)
mydata %>%
  nest(-group) %>%
  pull(data) %>%
  map(~lm(y ~., discard(.,anyNA)))
# [[1]]
# 
# Call:
# lm(formula = y ~ ., data = discard(., anyNA))
# 
# Coefficients:
# (Intercept)        ind_2        ind_3  
#          16            6           NA  
# 
# 
# [[2]]
# 
# Call:
# lm(formula = y ~ ., data = discard(., anyNA))
# 
# Coefficients:
# (Intercept)        ind_1        ind_2  
#          64          -13           NA  
# 
# 
like image 20
Moody_Mudskipper Avatar answered Oct 17 '22 00:10

Moody_Mudskipper