Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

dplyr mutate with conditional values

Tags:

r

dplyr

In a large dataframe ("myfile") with four columns I have to add a fifth column with values conditionally based on the first four columns.

Prefer answers with dplyr and mutate, mainly because of its speed in large datasets.

My dataframe looks like this:

  V1 V2 V3 V4 1  1  2  3  5 2  2  4  4  1 3  1  4  1  1 4  4  5  1  3 5  5  5  5  4 ... 

The values of the fifth column (V5) are based on some conditional rules:

if (V1==1 & V2!=4) {   V5 <- 1 } else if (V2==4 & V3!=1) {   V5 <- 2 } else {   V5 <- 0 } 

Now I want to use the mutate function to use these rules on all rows (to avoid slow loops). Something like this (and yes, I know it doesn't work this way!):

myfile <- mutate(myfile, if (V1==1 & V2!=4){V5 = 1}     else if (V2==4 & V3!=1){V5 = 2}     else {V5 = 0}) 

This should be the result:

  V1 V2 V3 V4 V5 1  1  2  3  5  1 2  2  4  4  1  2 3  1  4  1  1  0 4  4  5  1  3  0 5  5  5  5  4  0 

How to do this in dplyr?

like image 427
rdatasculptor Avatar asked Mar 11 '14 21:03

rdatasculptor


People also ask

How do I change the value of dplyr?

Use mutate() and its other verbs mutate_all() , mutate_if() and mutate_at() from dplyr package to replace/update the values of the column (string, integer, or any type) in R DataFrame (data. frame).

What does mutate in dplyr do?

mutate() adds new variables and preserves existing ones; transmute() adds new variables and drops existing ones. New variables overwrite existing variables of the same name.

How do you create a new variable in R based on condition?

Often you may want to create a new variable in a data frame in R based on some condition. Fortunately this is easy to do using the mutate() and case_when() functions from the dplyr package.


2 Answers

Try this:

myfile %>% mutate(V5 = (V1 == 1 & V2 != 4) + 2 * (V2 == 4 & V3 != 1)) 

giving:

  V1 V2 V3 V4 V5 1  1  2  3  5  1 2  2  4  4  1  2 3  1  4  1  1  0 4  4  5  1  3  0 5  5  5  5  4  0 

or this:

myfile %>% mutate(V5 = ifelse(V1 == 1 & V2 != 4, 1, ifelse(V2 == 4 & V3 != 1, 2, 0))) 

giving:

  V1 V2 V3 V4 V5 1  1  2  3  5  1 2  2  4  4  1  2 3  1  4  1  1  0 4  4  5  1  3  0 5  5  5  5  4  0 

Note

Suggest you get a better name for your data frame. myfile makes it seem as if it holds a file name.

Above used this input:

myfile <-  structure(list(V1 = c(1L, 2L, 1L, 4L, 5L), V2 = c(2L, 4L, 4L,  5L, 5L), V3 = c(3L, 4L, 1L, 1L, 5L), V4 = c(5L, 1L, 1L, 3L, 4L )), .Names = c("V1", "V2", "V3", "V4"), class = "data.frame", row.names = c("1",  "2", "3", "4", "5")) 

Update 1 Since originally posted dplyr has changed %.% to %>% so have modified answer accordingly.

Update 2 dplyr now has case_when which provides another solution:

myfile %>%         mutate(V5 = case_when(V1 == 1 & V2 != 4 ~ 1,                               V2 == 4 & V3 != 1 ~ 2,                              TRUE ~ 0)) 
like image 56
G. Grothendieck Avatar answered Oct 11 '22 15:10

G. Grothendieck


With dplyr 0.7.2, you can use the very useful case_when function :

x=read.table(  text="V1 V2 V3 V4  1  1  2  3  5  2  2  4  4  1  3  1  4  1  1  4  4  5  1  3  5  5  5  5  4") x$V5 = case_when(x$V1==1 & x$V2!=4 ~ 1,                  x$V2==4 & x$V3!=1 ~ 2,                  TRUE ~ 0) 

Expressed with dplyr::mutate, it gives:

x = x %>% mutate(      V5 = case_when(          V1==1 & V2!=4 ~ 1,          V2==4 & V3!=1 ~ 2,          TRUE ~ 0      ) ) 

Please note that NA are not treated specially, as it can be misleading. The function will return NA only when no condition is matched. If you put a line with TRUE ~ ..., like I did in my example, the return value will then never be NA.

Therefore, you have to expressively tell case_when to put NA where it belongs by adding a statement like is.na(x$V1) | is.na(x$V3) ~ NA_integer_. Hint: the dplyr::coalesce() function can be really useful here sometimes!

Moreover, please note that NA alone will usually not work, you have to put special NA values : NA_integer_, NA_character_ or NA_real_.

like image 20
Dan Chaltiel Avatar answered Oct 11 '22 15:10

Dan Chaltiel