Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Populate a column using if statements in r

Tags:

r

I have quite a simple question which I am currently struggling with. If I have an example dataframe:

a <- c(1:5)  
b <- c(1,3,5,9,11)
df1 <- data.frame(a,b)

How do I create a new column ('c') which is then populated using if statements on column b. For example: 'cat' for those values in b which are 1 or 2 'dog' for those values in b which are between 3 and 5 'rabbit' for those values in b which are greater than 6

So column 'c' using dataframe df1 would read: cat, dog, dog, rabbit, rabbit.

Many thanks in advance.

like image 362
KT_1 Avatar asked Dec 02 '12 19:12

KT_1


People also ask

How do you write an IF THEN statement in R?

To run an if-then statement in R, we use the if() {} function. The function has two main elements, a logical test in the parentheses, and conditional code in curly braces. The code in the curly braces is conditional because it is only evaluated if the logical test contained in the parentheses is TRUE .

Can I use IF statements in R?

The if statement takes a condition; if the condition evaluates to TRUE , the R code associated with the if statement is executed. The condition to check appears inside parentheses, while the R code that has to be executed if the condition is TRUE , follows in curly brackets ( expr ).


2 Answers

dfrm$dc <- c("dog", "cat", "rabbit")[ findInterval(dfrm$b, c(1, 2.5, 5.5, Inf)) ]

The findInterval approach will be much faster than nested ifelse strategies, and I'm guessing very much faster than a function that loops over unnested if statements. Those of us working with bigger data do notice the differences when we pick inefficient algorithms.

This didn't actually address the request, but I don't always think that new users of R will know the most expressive or efficient approach to problems. A request to "use IF" sounded like an effort to translate coding approaches typical of the two major macro statistical processors SPSS and SAS. The R if control structure is not generally an efficient approach to recoding a column since the argument to its first position will only get evaluated for the first element. On its own it doesn't process a column, whereas the ifelse function will do so. The cut function might have been used here (with appropriate breaks and labels parameters) , although it would have delivered a factor-value instead of a character value. The findInterval approach was chosen for its ability to return multiple levels (which a single ifelse cannot). I think chaining or nesting ifelse's becomes quickly ugly and confusing after about 2 or 3 levels of nesting.

like image 185
IRTFM Avatar answered Oct 13 '22 20:10

IRTFM


df1 <- 
    transform(
        df1 ,
        c =
            ifelse( b %in% 1:2 , 'cat' ,
            ifelse( b %in% 3:5 , 'dog' , 'rabbit' ) ) )
like image 24
Anthony Damico Avatar answered Oct 13 '22 19:10

Anthony Damico