I am trying to create a column ID
based on logical statements for values of other columns. For example, in the following dataframe
test <- structure(list(time = c(10L, 20L, NA, 30L), type = structure(c(1L, 2L, 3L, NA), .Label = c("A", "B", "C"), class = "factor"), ID = c(NA, "1", NA, NA)), .Names = c("time", "type", "ID"), row.names = c(NA, -4L), class = "data.frame")
which looks like
time type 1 10 A 2 20 B 3 NA C 4 30 NA
I want to make a new column ID
containing a value of 1 for all time
that are not NA
and all type
that are not A
. I am using the following code for this:
test$ID <- ifelse(is.na(test$time) | test$type == "A", NA, "1")
This gives the result as
time type ID 1 10 A NA 2 20 B 1 3 NA C NA 4 30 NA NA
However, this code ignores the NA
in column type
, resulting in a value of NA
in column ID
. I need this to be a value of 1, so my needed solution should give:
time type ID 1 10 A NA 2 20 B 1 3 NA C NA 4 30 NA 1
Can anyone tell me how I might do this? I could get this to work with my existing code if I could somehow change the result of is.na(test$type)
to return FALSE
instead of TRUE
, but I'm not sure how to do that. Or, maybe the structure of my existing code needs to be entirely changed? I appreciate any help!
As you can see in the fifth row, NA is not considered FALSE in the R function ifelse. The result is a missing value. Combined with another ifelse statement or nested ifelse, results are even weirder.
In R, missing values are represented by the symbol NA (not available). Impossible values (e.g., dividing by zero) are represented by the symbol NaN (not a number). Unlike SAS, R uses the same symbol for character and numeric data. For more practice on working with missing data, try this course on cleaning data in R.
To check which value in NA in an R data frame, we can use apply function along with is.na function. This will return the data frame in logical form with TRUE and FALSE.
You can't really compare NA
with another value, so using ==
would not work. Consider the following:
NA == NA # [1] NA
You can just change your comparison from ==
to %in%
:
ifelse(is.na(test$time) | test$type %in% "A", NA, "1") # [1] NA "1" NA "1"
Regarding your other question,
I could get this to work with my existing code if I could somehow change the result of
is.na(test$type)
to returnFALSE
instead ofTRUE
, but I'm not sure how to do that.
just use !
to negate the results:
!is.na(test$time) # [1] TRUE TRUE FALSE TRUE
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With