I have quite a simple question which I am currently struggling with. If I have an example dataframe:
a <- c(1:5)
b <- c(1,3,5,9,11)
df1 <- data.frame(a,b)
How do I create a new column ('c') which is then populated using if statements on column b. For example: 'cat' for those values in b which are 1 or 2 'dog' for those values in b which are between 3 and 5 'rabbit' for those values in b which are greater than 6
So column 'c' using dataframe df1 would read: cat, dog, dog, rabbit, rabbit.
Many thanks in advance.
To run an if-then statement in R, we use the if() {} function. The function has two main elements, a logical test in the parentheses, and conditional code in curly braces. The code in the curly braces is conditional because it is only evaluated if the logical test contained in the parentheses is TRUE .
The if statement takes a condition; if the condition evaluates to TRUE , the R code associated with the if statement is executed. The condition to check appears inside parentheses, while the R code that has to be executed if the condition is TRUE , follows in curly brackets ( expr ).
dfrm$dc <- c("dog", "cat", "rabbit")[ findInterval(dfrm$b, c(1, 2.5, 5.5, Inf)) ]
The findInterval approach will be much faster than nested ifelse
strategies, and I'm guessing very much faster than a function that loops over unnested if
statements. Those of us working with bigger data do notice the differences when we pick inefficient algorithms.
This didn't actually address the request, but I don't always think that new users of R will know the most expressive or efficient approach to problems. A request to "use IF" sounded like an effort to translate coding approaches typical of the two major macro statistical processors SPSS and SAS. The R if
control structure is not generally an efficient approach to recoding a column since the argument to its first position will only get evaluated for the first element. On its own it doesn't process a column, whereas the ifelse
function will do so. The cut
function might have been used here (with appropriate breaks
and labels
parameters) , although it would have delivered a factor
-value instead of a character value. The findInterval
approach was chosen for its ability to return multiple levels (which a single ifelse
cannot). I think chaining or nesting ifelse
's becomes quickly ugly and confusing after about 2 or 3 levels of nesting.
df1 <-
transform(
df1 ,
c =
ifelse( b %in% 1:2 , 'cat' ,
ifelse( b %in% 3:5 , 'dog' , 'rabbit' ) ) )
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With