Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

filter for complete cases in data.frame using dplyr (case-wise deletion)

Tags:

r

dplyr

magrittr

Is it possible to filter a data.frame for complete cases using dplyr? complete.cases with a list of all variables works, of course. But that is a) verbose when there are a lot of variables and b) impossible when the variable names are not known (e.g. in a function that processes any data.frame).

library(dplyr) df = data.frame(     x1 = c(1,2,3,NA),     x2 = c(1,2,NA,5) )  df %.%   filter(complete.cases(x1,x2)) 
like image 335
user2503795 Avatar asked Mar 12 '14 13:03

user2503795


People also ask

How do I filter multiple values in R dplyr?

In this, first, pass your dataframe object to the filter function, then in the condition parameter write the column name in which you want to filter multiple values then put the %in% operator, and then pass a vector containing all the string values which you want in the result.

What does %>% do in dplyr?

%>% is called the forward pipe operator in R. It provides a mechanism for chaining commands with a new forward-pipe operator, %>%. This operator will forward a value, or the result of an expression, into the next function call/expression. It is defined by the package magrittr (CRAN) and is heavily used by dplyr (CRAN).

Can you use filter on a data frame in R?

How to apply a filter on dataframe in R ? A filter () function is used to filter out specified elements from a dataframe that return TRUE value for the given condition (s). filter () helps to reduce a huge dataset into small chunks of datasets.

What is the function of code filter () in R?

The filter() function is used to produce a subset of the data frame, retaining all rows that satisfy the specified conditions. The filter() method in R programming language can be applied to both grouped and ungrouped data.


1 Answers

Try this:

df %>% na.omit 

or this:

df %>% filter(complete.cases(.)) 

or this:

library(tidyr) df %>% drop_na 

If you want to filter based on one variable's missingness, use a conditional:

df %>% filter(!is.na(x1)) 

or

df %>% drop_na(x1) 

Other answers indicate that of the solutions above na.omit is much slower but that has to be balanced against the fact that it returns row indices of the omitted rows in the na.action attribute whereas the other solutions above do not.

str(df %>% na.omit) ## 'data.frame':   2 obs. of  2 variables: ##  $ x1: num  1 2 ##  $ x2: num  1 2 ##  - attr(*, "na.action")= 'omit' Named int  3 4 ##    ..- attr(*, "names")= chr  "3" "4" 

ADDED Have updated to reflect latest version of dplyr and comments.

ADDED Have updated to reflect latest version of tidyr and comments.

like image 144
G. Grothendieck Avatar answered Oct 17 '22 02:10

G. Grothendieck