I am working with a large data frame. I'm trying to create a new vector based on the conditions that exist in two current vectors. Given the size of the dataset (and its general awesomeness) I'm trying to find a solution using dplyr, which has lead me to mutate. I feel like I'm not far off, but I'm just not able to get a solution to stick. My data frame resembles: <pre class="prettyprint"><code> ID X Y 1 1 10 12 2 2 10 NA 3 3 11 NA 4 4 10 12 5 5 11 NA 6 6 NA NA 7 7 NA NA 8 8 11 NA 9 9 10 12 10 10 11 NA </code></pre> To recreate it: <pre class="prettyprint"><code>ID <- c(1:10) X <- c(10, 10, 11, 10, 11, NA, NA, 11, 10, 11) Y <- c(12, NA, NA, 12, NA, NA, NA, NA, 12, NA) </code></pre> I'm looking to create a new vector 'Z' from the existing data. If Y > X, then I want it return the value from Y. If Y is NA then I'd like it to return the X value. If both are NA, then it should return NA. My attempt thus far, has using the code below has let me create a new vector meeting the first condition, but not the second. <pre class="prettyprint"><code>newData <- data %>% mutate(Z = ifelse(Y > X, Y, ifelse(is.na(Y), X, NA))) > newData ID X Y Z 1 1 10 12 12 2 2 10 NA NA 3 3 11 NA NA 4 4 10 12 12 5 5 11 NA NA 6 6 NA NA NA 7 7 NA NA NA 8 8 11 NA NA 9 9 10 12 12 10 10 11 NA NA </code></pre> I feel like I'm missing something mindblowingly simple. Can point me in the right direction?

<code>pmax(, na.rm=TRUE)</code> is what you are looking for <pre class="prettyprint"><code>data <- data_frame(ID = c(1:10), X = c(10, 10, 11, 10, 11, NA, NA, 11, 10, 11), Y = c(12, NA, NA, 12, NA, NA, NA, NA, 12, NA)) data %>% mutate(Z = pmax(X, Y, na.rm=TRUE)) # ID X Y Z #1 1 10 12 12 #2 2 10 NA 10 #3 3 11 NA 11 #4 4 10 12 12 #5 5 11 NA 11 #6 6 NA NA NA #7 7 NA NA NA #8 8 11 NA 11 #9 9 10 12 12 #10 10 11 NA 11 </code></pre>

The <code>ifelse</code> code can be <pre class="prettyprint"><code>data %>% mutate(Z= ifelse(Y>X & !is.na(Y), Y, X)) # ID X Y Z #1 1 10 12 12 #2 2 10 NA 10 #3 3 11 NA 11 #4 4 10 12 12 #5 5 11 NA 11 #6 6 NA NA NA #7 7 NA NA NA #8 8 11 NA 11 #9 9 10 12 12 #10 10 11 NA 11 </code></pre>

Using conditions in dplyr::mutate

Tags:

dataframe

r

dplyr

I am working with a large data frame. I'm trying to create a new vector based on the conditions that exist in two current vectors.

Given the size of the dataset (and its general awesomeness) I'm trying to find a solution using dplyr, which has lead me to mutate. I feel like I'm not far off, but I'm just not able to get a solution to stick.

My data frame resembles:

   ID  X  Y
1   1 10 12
2   2 10 NA
3   3 11 NA
4   4 10 12
5   5 11 NA
6   6 NA NA
7   7 NA NA
8   8 11 NA
9   9 10 12
10 10 11 NA

To recreate it:

ID <- c(1:10)
X <- c(10, 10, 11, 10, 11, NA, NA, 11, 10, 11)
Y <- c(12, NA, NA, 12, NA, NA, NA, NA, 12, NA)

I'm looking to create a new vector 'Z' from the existing data. If Y > X, then I want it return the value from Y. If Y is NA then I'd like it to return the X value. If both are NA, then it should return NA.

My attempt thus far, has using the code below has let me create a new vector meeting the first condition, but not the second.

newData <- data %>% 
        mutate(Z =
               ifelse(Y > X, Y,
               ifelse(is.na(Y), X, NA)))

> newData
   ID  X  Y  Z
1   1 10 12 12
2   2 10 NA NA
3   3 11 NA NA
4   4 10 12 12
5   5 11 NA NA
6   6 NA NA NA
7   7 NA NA NA
8   8 11 NA NA
9   9 10 12 12
10 10 11 NA NA

I feel like I'm missing something mindblowingly simple. Can point me in the right direction?

939

asked Jan 22 '15 03:01

vengefulsealion

2 Answers

pmax(, na.rm=TRUE) is what you are looking for

data <- data_frame(ID = c(1:10),
           X = c(10, 10, 11, 10, 11, NA, NA, 11, 10, 11),
           Y = c(12, NA, NA, 12, NA, NA, NA, NA, 12, NA))  
data %>% mutate(Z = pmax(X, Y, na.rm=TRUE))
#   ID  X  Y  Z
#1   1 10 12 12
#2   2 10 NA 10
#3   3 11 NA 11
#4   4 10 12 12
#5   5 11 NA 11
#6   6 NA NA NA
#7   7 NA NA NA
#8   8 11 NA 11
#9   9 10 12 12
#10 10 11 NA 11

106

answered Oct 21 '22 01:10

Khashaa

The ifelse code can be

data %>%
       mutate(Z= ifelse(Y>X & !is.na(Y), Y, X))
#   ID  X  Y  Z
#1   1 10 12 12
#2   2 10 NA 10
#3   3 11 NA 11
#4   4 10 12 12
#5   5 11 NA 11
#6   6 NA NA NA
#7   7 NA NA NA
#8   8 11 NA 11
#9   9 10 12 12
#10 10 11 NA 11

answered Oct 21 '22 02:10

akrun

Related questions
                            
                                In R, can I make the table() function return the number of NA values in a named element?
                            
                                How to convert multiple columns to individual rows in R
                            
                                How to sum values of array in each dimension into one matrix
                            
                                R - svd() function - infinite or missing values in 'x'
                            
                                Error in read.table: !header: invalid argument type
                            
                                Getting observations corresponding to each quartile
                            
                                Reading in multiple png files in order to create a new plot with grid.arrange
                            
                                User Defined Metric in Caret Package
                            
                                User supplied arguments for ordering a data.frame using arrange
                            
                                How to extract the non-empty elements of list in R?
                            
                                How can I have darker gridlines for theme_bw() in ggplot2?
                            
                                For loop for forecasting several datasets at once in R
                            
                                Function generation; change defaults of other functions (partial)
                            
                                Pasting elements of two vectors alphabetically
                            
                                How can I specify which shiny account to use when deploying?
                            
                                Converting R file to Stata with missing string values
                            
                                how to track progress in mclapply in R in parallel package
                            
                                dplyr and Non-standard evaluation (NSE)
                            
                                Subsetting at the row level, but value must be column name
                            
                                ggplot with data frame columns

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With