Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Removing NA in dplyr pipe [duplicate]

Tags:

r

na

dplyr

I tried to remove NA's from the subset using dplyr piping. Is my answer an indication of a missed step. I'm trying to learn how to write functions using dplyr:

> outcome.df%>% + group_by(Hospital,State)%>% + arrange(desc(HeartAttackDeath,na.rm=TRUE))%>% + head() Source: local data frame [6 x 5] Groups: Hospital, State 
                            Hospital State HeartAttackDeath 1     ABBEVILLE AREA MEDICAL CENTER    SC               NA 2        ABBEVILLE GENERAL HOSPITAL    LA               NA 3      ABBOTT NORTHWESTERN HOSPITAL    MN             12.3 4   ABILENE REGIONAL MEDICAL CENTER    TX             17.2 5        ABINGTON MEMORIAL HOSPITAL    PA             14.3 6 ABRAHAM LINCOLN MEMORIAL HOSPITAL    IL               NA Variables not shown: HeartFailureDeath (dbl), PneumoniaDeath   (dbl) 
like image 601
ITCoderWhiz Avatar asked Oct 30 '14 23:10

ITCoderWhiz


People also ask

How do I remove Na from a data frame in R?

To remove all rows having NA, we can use na. omit function. For Example, if we have a data frame called df that contains some NA values then we can remove all rows that contains at least one NA by using the command na. omit(df).

How do I remove Na from a column in R?

To remove observations with missing values in at least one column, you can use the na. omit() function. The na. omit() function in the R language inspects all columns from a data frame and drops rows that have NA's in one or more columns.

How do I remove missing variables in R?

Firstly, we use brackets with complete. cases() function to exclude missing values in R. Secondly, we omit missing values with na. omit() function.


1 Answers

I don't think desc takes an na.rm argument... I'm actually surprised it doesn't throw an error when you give it one. If you just want to remove NAs, use na.omit (base) or tidyr::drop_na:

outcome.df %>%   na.omit() %>%   group_by(Hospital, State) %>%   arrange(desc(HeartAttackDeath)) %>%   head()  library(tidyr) outcome.df %>%   drop_na() %>%   group_by(Hospital, State) %>%   arrange(desc(HeartAttackDeath)) %>%   head() 

If you only want to remove NAs from the HeartAttackDeath column, filter with is.na, or use tidyr::drop_na:

outcome.df %>%   filter(!is.na(HeartAttackDeath)) %>%   group_by(Hospital, State) %>%   arrange(desc(HeartAttackDeath)) %>%   head()  outcome.df %>%   drop_na(HeartAttackDeath) %>%   group_by(Hospital, State) %>%   arrange(desc(HeartAttackDeath)) %>%   head() 

As pointed out at the dupe, complete.cases can also be used, but it's a bit trickier to put in a chain because it takes a data frame as an argument but returns an index vector. So you could use it like this:

outcome.df %>%   filter(complete.cases(.)) %>%   group_by(Hospital, State) %>%   arrange(desc(HeartAttackDeath)) %>%   head() 
like image 109
Gregor Thomas Avatar answered Sep 18 '22 13:09

Gregor Thomas