I have a dataframe with 2 columns: the date and the return.
df <- tibble(
date = lubridate::today() +0:9,
return= c(1,2.5,2,3,5,6.5,1,9,3,2))
And now I want to add a third column with an ifelse-condition. If the return on day t is higher than 3.5, than the retrun on the subsequent day t+1 is NA (else = the return on day t).
Here is my desired output:
date return retrun_subsequent_day
<date> <dbl> <dbl>
1 2019-03-14 1 1
2 2019-03-15 2.5 2.5
3 2019-03-16 2 2
4 2019-03-17 3 3
5 2019-03-18 5 5
6 2019-03-19 6.5 NA
7 2019-03-20 1 NA
8 2019-03-21 9 9
9 2019-03-22 3 NA
10 2019-03-23 2 2
Can someone describe me how can I formulate this condition?
The 'ifelse()' function is the alternative and shorthand form of the R if-else statement. Also, it uses the 'vectorized' technique, which makes the operation faster. All of the vector values are taken as an argument at once rather than taking individual values as an argument multiple times.
Syntax of ifelse() function This is to say, the i-th element of result will be x[i] if test_expression[i] is TRUE else it will take the value of y[i] . The vectors x and y are recycled whenever necessary.
Instead of a cumbersomely nested ifelse statement, use dplyr's mutate and case_when functions instead.
To declare a user-defined function in R, we use the keyword function . The syntax is as follows: function_name <- function(parameters){ function body } Above, the main components of an R function are: function name, function parameters, and function body.
using lag
and mutate
from dplyr
. With lag we compare the return
-value of the previous row with 3.5
: if it's bigger or equal we take the NA
, and if it's smaller we take the return value of the current row
library(dplyr)
df <- df %>% mutate(return_subsequent_day = ifelse(lag(return, default = 0) >= 3.5, NA, return))
output:
# A tibble: 10 x 3
date return return_subsequent_day
<date> <dbl> <dbl>
1 2019-03-14 1 1
2 2019-03-15 2.5 2.5
3 2019-03-16 2 2
4 2019-03-17 3 3
5 2019-03-18 5 5
6 2019-03-19 6.5 NA
7 2019-03-20 1 NA
8 2019-03-21 9 9
9 2019-03-22 3 NA
10 2019-03-23 2 2
A base R
approach would be to create a copy of the 'return' as new column 'return_sub', then using the numeric index ('i1'), assign the value to NA
i1 <- which(df$return > 3.5)
df$return_subsequent_day <- df$return
df$return_subsequent_day[pmin(i1 +1, nrow(df))] <- NA
df$return_subsequent_day
#[1] 1.0 2.5 2.0 3.0 5.0 NA NA 9.0 NA 2.0
Simple solution using ifelse
df$return_sub_day <- ifelse(dplyr::lag(df$return) > 3.5, NA ,df$return)
df$return_sub_day[1] <- df$return[1]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With