I have a dataframe for which I've calculated and added a <code>difftime</code> column: <pre class="prettyprint"><code> name amount 1st_date 2nd_date days_out JEAN 318.5 1971-02-16 1972-11-27 650 days GREGORY 1518.5 <NA> <NA> NA days JOHN 318.5 <NA> <NA> NA days EDWARD 318.5 <NA> <NA> NA days WALTER 518.5 1971-07-06 1975-03-14 1347 days BARRY 1518.5 1971-11-09 1972-02-09 92 days LARRY 518.5 1971-09-08 1972-02-09 154 days HARRY 318.5 1971-09-16 1972-02-09 146 days GARRY 1018.5 1971-10-26 1972-02-09 106 days </code></pre> I want to break it out and take subtotals where days_out is 0-60, 61-90, 91-120, 121-180. For some reason I can't even reliably write bracket notation. I would expect members[members$days_out<=120, ] to show just Barry and Garry, but I get a whole lot of lines like: <pre class="prettyprint"><code>NA.1095 <NA> NA <NA> <NA> NA days NA.1096 <NA> NA <NA> <NA> NA days NA.1097 <NA> NA <NA> <NA> NA days </code></pre> Those don't exist in the original data. There's no one without a name. What am I doing wrong here?

This is standard behavior for <code><</code> and other relational operators: when asked to evaluate whether <code>NA</code> is less than (or greater than, or equal to, or ...) some other number, they return <code>NA</code>, rather than <code>TRUE</code> or <code>FALSE</code>. Here's an example that should make clear what is going on and point to a simple fix. <pre class="prettyprint"><code>x <- c(1, 2, NA, 4, 5) x[x < 3] # [1] 1 2 NA x[x < 3 & !is.na(x)] # [1] 1 2 </code></pre> <hr> To see why all of those rows indexed by <code>NA</code>'s have row.names like <code>NA.1095</code>, <code>NA.1096</code>, and so on, try this: <pre class="prettyprint"><code>data.frame(a=1:2, b=1:2)[rep(NA, 5),] # a b # NA NA NA # NA.1 NA NA # NA.2 NA NA # NA.3 NA NA # NA.4 NA NA </code></pre>

Searching for greater/less than values with NAs

Q: How do you omit rows with NA in R?

To remove all rows having NA, we can use na. omit function. For Example, if we have a data frame called df that contains some NA values then we can remove all rows that contains at least one NA by using the command na. omit(df).

Q: What does NAS mean in R?

A missing value is one whose value is unknown. Missing values are represented in R by the NA symbol.

Q: Is Na omit R?

The na. omit R function removes all incomplete cases of a data object (typically of a data frame, matrix or vector). The syntax above illustrates the basic programming code for na.

Q: How do I exclude missing values in R?

Firstly, we use brackets with complete. cases() function to exclude missing values in R. Secondly, we omit missing values with na. omit() function.

Tags:

r

subset

I have a dataframe for which I've calculated and added a difftime column:

    name   amount   1st_date   2nd_date  days_out
    JEAN  318.5 1971-02-16 1972-11-27  650 days
 GREGORY 1518.5       <NA>       <NA>   NA days
    JOHN  318.5       <NA>       <NA>   NA days
  EDWARD  318.5       <NA>       <NA>   NA days
  WALTER  518.5 1971-07-06 1975-03-14 1347 days
   BARRY 1518.5 1971-11-09 1972-02-09   92 days
   LARRY  518.5 1971-09-08 1972-02-09  154 days
   HARRY  318.5 1971-09-16 1972-02-09  146 days
   GARRY 1018.5 1971-10-26 1972-02-09  106 days

I want to break it out and take subtotals where days_out is 0-60, 61-90, 91-120, 121-180.

For some reason I can't even reliably write bracket notation. I would expect

members[members$days_out<=120, ] to show just Barry and Garry, but I get a whole lot of lines like:

NA.1095     <NA>     NA       <NA>       <NA>  NA days
NA.1096     <NA>     NA       <NA>       <NA>  NA days
NA.1097     <NA>     NA       <NA>       <NA>  NA days

Those don't exist in the original data. There's no one without a name. What am I doing wrong here?

489

asked Dec 14 '12 22:12

Amanda

1 Answers

This is standard behavior for < and other relational operators: when asked to evaluate whether NA is less than (or greater than, or equal to, or ...) some other number, they return NA, rather than TRUE or FALSE.

Here's an example that should make clear what is going on and point to a simple fix.

x <- c(1, 2, NA, 4, 5)
x[x < 3]
# [1]  1  2 NA
x[x < 3 & !is.na(x)]
# [1] 1 2

To see why all of those rows indexed by NA's have row.names like NA.1095, NA.1096, and so on, try this:

data.frame(a=1:2, b=1:2)[rep(NA, 5),]
#       a  b
# NA   NA NA
# NA.1 NA NA
# NA.2 NA NA
# NA.3 NA NA
# NA.4 NA NA

142

answered Oct 12 '22 11:10

Josh O'Brien

Related questions
                            
                                Calculate the difference betwen pairs of consecutive rows in a data frame - R
                            
                                Creating a filled contour plot using data in lists
                            
                                Vectorized element-wise division on Sparse Matrices in R
                            
                                standard errors for loess in R
                            
                                grouping by date ranges
                            
                                Switching row-major to column-major dimensions
                            
                                How to change column values in a data frame?
                            
                                ggplot2 one line per each row dataframe
                            
                                Merge data frames by approximate column values
                            
                                Cut function split by factor levels
                            
                                Compare two character vectors matching names
                            
                                Removing NULL objects from an environment
                            
                                Outputting HTML code from inside R markdown
                            
                                multiple boxplots grouped by two conditions
                            
                                Offset not working in binomial GLM
                            
                                Calculating days per month between interval of two dates
                            
                                Is there any way to get expression() to display P(a <= X <= b)?
                            
                                Sourcing r-files only once on Rserve
                            
                                a simple loop with data.table
                            
                                Sum of values within a week

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With