Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using conditions in dplyr::mutate

Tags:

dataframe

r

dplyr

I am working with a large data frame. I'm trying to create a new vector based on the conditions that exist in two current vectors.

Given the size of the dataset (and its general awesomeness) I'm trying to find a solution using dplyr, which has lead me to mutate. I feel like I'm not far off, but I'm just not able to get a solution to stick.

My data frame resembles:

   ID  X  Y
1   1 10 12
2   2 10 NA
3   3 11 NA
4   4 10 12
5   5 11 NA
6   6 NA NA
7   7 NA NA
8   8 11 NA
9   9 10 12
10 10 11 NA

To recreate it:

ID <- c(1:10)
X <- c(10, 10, 11, 10, 11, NA, NA, 11, 10, 11)
Y <- c(12, NA, NA, 12, NA, NA, NA, NA, 12, NA)

I'm looking to create a new vector 'Z' from the existing data. If Y > X, then I want it return the value from Y. If Y is NA then I'd like it to return the X value. If both are NA, then it should return NA.

My attempt thus far, has using the code below has let me create a new vector meeting the first condition, but not the second.

newData <- data %>% 
        mutate(Z =
               ifelse(Y > X, Y,
               ifelse(is.na(Y), X, NA)))

> newData
   ID  X  Y  Z
1   1 10 12 12
2   2 10 NA NA
3   3 11 NA NA
4   4 10 12 12
5   5 11 NA NA
6   6 NA NA NA
7   7 NA NA NA
8   8 11 NA NA
9   9 10 12 12
10 10 11 NA NA

I feel like I'm missing something mindblowingly simple. Can point me in the right direction?

like image 939
vengefulsealion Avatar asked Jan 22 '15 03:01

vengefulsealion


People also ask

What does mutate in dplyr do?

mutate() adds new variables and preserves existing ones; transmute() adds new variables and drops existing ones. New variables overwrite existing variables of the same name.

What does %>% do in dplyr?

%>% is called the forward pipe operator in R. It provides a mechanism for chaining commands with a new forward-pipe operator, %>%. This operator will forward a value, or the result of an expression, into the next function call/expression. It is defined by the package magrittr (CRAN) and is heavily used by dplyr (CRAN).

How do I get mutate function in R?

In R programming, the mutate function is used to create a new variable from a data set. In order to use the function, we need to install the dplyr package, which is an add-on to R that includes a host of cool functions for selecting, filtering, grouping, and arranging data.


2 Answers

pmax(, na.rm=TRUE) is what you are looking for

data <- data_frame(ID = c(1:10),
           X = c(10, 10, 11, 10, 11, NA, NA, 11, 10, 11),
           Y = c(12, NA, NA, 12, NA, NA, NA, NA, 12, NA))  
data %>% mutate(Z = pmax(X, Y, na.rm=TRUE))
#   ID  X  Y  Z
#1   1 10 12 12
#2   2 10 NA 10
#3   3 11 NA 11
#4   4 10 12 12
#5   5 11 NA 11
#6   6 NA NA NA
#7   7 NA NA NA
#8   8 11 NA 11
#9   9 10 12 12
#10 10 11 NA 11
like image 106
Khashaa Avatar answered Oct 21 '22 01:10

Khashaa


The ifelse code can be

data %>%
       mutate(Z= ifelse(Y>X & !is.na(Y), Y, X))
#   ID  X  Y  Z
#1   1 10 12 12
#2   2 10 NA 10
#3   3 11 NA 11
#4   4 10 12 12
#5   5 11 NA 11
#6   6 NA NA NA
#7   7 NA NA NA
#8   8 11 NA 11
#9   9 10 12 12
#10 10 11 NA 11
like image 35
akrun Avatar answered Oct 21 '22 02:10

akrun