I am trying to create a column <code>ID</code> based on logical statements for values of other columns. For example, in the following dataframe <pre class="prettyprint"><code>test <- structure(list(time = c(10L, 20L, NA, 30L), type = structure(c(1L, 2L, 3L, NA), .Label = c("A", "B", "C"), class = "factor"), ID = c(NA, "1", NA, NA)), .Names = c("time", "type", "ID"), row.names = c(NA, -4L), class = "data.frame") </code></pre> which looks like <pre class="prettyprint"><code> time type 1 10 A 2 20 B 3 NA C 4 30 NA </code></pre> I want to make a new column <code>ID</code> containing a value of 1 for all <code>time</code> that are not <code>NA</code> and all <code>type</code> that are not <code>A</code>. I am using the following code for this: <pre class="prettyprint"><code>test$ID <- ifelse(is.na(test$time) | test$type == "A", NA, "1") </code></pre> This gives the result as <pre class="prettyprint"><code> time type ID 1 10 A NA 2 20 B 1 3 NA C NA 4 30 NA NA </code></pre> However, this code ignores the <code>NA</code> in column <code>type</code>, resulting in a value of <code>NA</code> in column <code>ID</code>. I need this to be a value of 1, so my needed solution should give: <pre class="prettyprint"><code> time type ID 1 10 A NA 2 20 B 1 3 NA C NA 4 30 NA 1 </code></pre> Can anyone tell me how I might do this? I could get this to work with my existing code if I could somehow change the result of <code>is.na(test$type)</code> to return <code>FALSE</code> instead of <code>TRUE</code>, but I'm not sure how to do that. Or, maybe the structure of my existing code needs to be entirely changed? I appreciate any help!

You can't really compare <code>NA</code> with another value, so using <code>==</code> would not work. Consider the following: <pre class="prettyprint"><code>NA == NA # [1] NA </code></pre> You can just change your comparison from <code>==</code> to <code>%in%</code>: <pre class="prettyprint"><code>ifelse(is.na(test$time) | test$type %in% "A", NA, "1") # [1] NA "1" NA "1" </code></pre> <hr> Regarding your other question, <blockquote> I could get this to work with my existing code if I could somehow change the result of <code>is.na(test$type)</code> to return <code>FALSE</code> instead of <code>TRUE</code>, but I'm not sure how to do that. </blockquote> just use <code>!</code> to negate the results: <pre class="prettyprint"><code>!is.na(test$time) # [1] TRUE TRUE FALSE TRUE </code></pre>

How to include NA in ifelse?

Tags:

r

if-statement

I am trying to create a column ID based on logical statements for values of other columns. For example, in the following dataframe

test <- structure(list(time = c(10L, 20L, NA, 30L), type = structure(c(1L,  2L, 3L, NA), .Label = c("A", "B", "C"), class = "factor"), ID = c(NA,  "1", NA, NA)), .Names = c("time", "type", "ID"), row.names = c(NA,  -4L), class = "data.frame")

which looks like

    time    type 1   10      A 2   20      B 3   NA      C 4   30      NA

I want to make a new column ID containing a value of 1 for all time that are not NA and all type that are not A. I am using the following code for this:

test$ID <- ifelse(is.na(test$time) | test$type == "A", NA, "1")

This gives the result as

    time    type    ID 1   10      A       NA 2   20      B       1 3   NA      C       NA 4   30      NA      NA

However, this code ignores the NA in column type, resulting in a value of NA in column ID. I need this to be a value of 1, so my needed solution should give:

    time    type    ID 1   10      A       NA 2   20      B       1 3   NA      C       NA 4   30      NA      1

Can anyone tell me how I might do this? I could get this to work with my existing code if I could somehow change the result of is.na(test$type) to return FALSE instead of TRUE, but I'm not sure how to do that. Or, maybe the structure of my existing code needs to be entirely changed? I appreciate any help!

829

asked Feb 27 '14 17:02

Thomas

1 Answers

You can't really compare NA with another value, so using == would not work. Consider the following:

NA == NA # [1] NA

You can just change your comparison from == to %in%:

ifelse(is.na(test$time) | test$type %in% "A", NA, "1") # [1] NA  "1" NA  "1"

Regarding your other question,

I could get this to work with my existing code if I could somehow change the result of is.na(test$type) to return FALSE instead of TRUE, but I'm not sure how to do that.

just use ! to negate the results:

!is.na(test$time) # [1]  TRUE  TRUE FALSE  TRUE

150

answered Oct 06 '22 18:10

A5C1D2H2I1M1N2O1R2T1

Related questions
                            
                                higher level functions in R - is there an official compose operator or curry function?
                            
                                is there a way to call R functions from C# and retrieve the result in C#
                            
                                Plotting a "sequence logo" using ggplot2?
                            
                                How can I tell when my dataset in R is going to be too large?
                            
                                How to use tabPanel as input in R Shiny?
                            
                                Adding minor tick marks to the x axis in ggplot2 (with no labels)
                            
                                Reading data from Microsoft SQL Server into R
                            
                                Why am I getting "algorithm did not converge" and "fitted prob numerically 0 or 1" warnings with glm?
                            
                                Dynamic column names in data.table
                            
                                Dplyr join on by=(a = b), where a and b are variables containing strings?
                            
                                How to define a vectorized function in R
                            
                                Replace missing values (NA) with blank (empty string)
                            
                                what is the difference between names and colnames
                            
                                How to update a package in R?
                            
                                Extracting coefficient variable names from glmnet into a data.frame
                            
                                RStudio enters debug mode for every function error - how can I stop it?
                            
                                Why is using assign bad?
                            
                                Use data.table to count and aggregate / summarize a column
                            
                                matplotlib analog of R's `pairs`
                            
                                is it possible to redirect console output to a variable?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With