Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Searching for greater/less than values with NAs

Tags:

r

subset

I have a dataframe for which I've calculated and added a difftime column:

    name   amount   1st_date   2nd_date  days_out
    JEAN  318.5 1971-02-16 1972-11-27  650 days
 GREGORY 1518.5       <NA>       <NA>   NA days
    JOHN  318.5       <NA>       <NA>   NA days
  EDWARD  318.5       <NA>       <NA>   NA days
  WALTER  518.5 1971-07-06 1975-03-14 1347 days
   BARRY 1518.5 1971-11-09 1972-02-09   92 days
   LARRY  518.5 1971-09-08 1972-02-09  154 days
   HARRY  318.5 1971-09-16 1972-02-09  146 days
   GARRY 1018.5 1971-10-26 1972-02-09  106 days

I want to break it out and take subtotals where days_out is 0-60, 61-90, 91-120, 121-180.

For some reason I can't even reliably write bracket notation. I would expect

members[members$days_out<=120, ] to show just Barry and Garry, but I get a whole lot of lines like:

NA.1095     <NA>     NA       <NA>       <NA>  NA days
NA.1096     <NA>     NA       <NA>       <NA>  NA days
NA.1097     <NA>     NA       <NA>       <NA>  NA days

Those don't exist in the original data. There's no one without a name. What am I doing wrong here?

like image 489
Amanda Avatar asked Dec 14 '12 22:12

Amanda


People also ask

How do you omit rows with NA in R?

To remove all rows having NA, we can use na. omit function. For Example, if we have a data frame called df that contains some NA values then we can remove all rows that contains at least one NA by using the command na. omit(df).

What does NAS mean in R?

A missing value is one whose value is unknown. Missing values are represented in R by the NA symbol.

Is Na omit R?

The na. omit R function removes all incomplete cases of a data object (typically of a data frame, matrix or vector). The syntax above illustrates the basic programming code for na.

How do I exclude missing values in R?

Firstly, we use brackets with complete. cases() function to exclude missing values in R. Secondly, we omit missing values with na. omit() function.


1 Answers

This is standard behavior for < and other relational operators: when asked to evaluate whether NA is less than (or greater than, or equal to, or ...) some other number, they return NA, rather than TRUE or FALSE.

Here's an example that should make clear what is going on and point to a simple fix.

x <- c(1, 2, NA, 4, 5)
x[x < 3]
# [1]  1  2 NA
x[x < 3 & !is.na(x)]
# [1] 1 2

To see why all of those rows indexed by NA's have row.names like NA.1095, NA.1096, and so on, try this:

data.frame(a=1:2, b=1:2)[rep(NA, 5),]
#       a  b
# NA   NA NA
# NA.1 NA NA
# NA.2 NA NA
# NA.3 NA NA
# NA.4 NA NA
like image 142
Josh O'Brien Avatar answered Oct 12 '22 11:10

Josh O'Brien