Ok this is the over all view of what i'm trying to achieve with dplyr:
Using dplyr I am making calculations to form new columns.
initial.capital -
x.long.shares -
x.end.value -
x.net.profit -
new.initial.capital
The code that does this:
# Calculate Share Prices For Each ETF
# Initialize Start Capital Column
library(dplyr)
library(data.table)
df$inital.capital <- 10000
output <- df %>%
dplyr::mutate(RunID = data.table::rleid(x.long)) %>%
group_by(RunID) %>%
dplyr::mutate(x.long.shares = ifelse(x.long == 0,0,
ifelse(row_number() == n(),
first(inital.capital) / first(close.x),0))) %>%
dplyr::mutate(x.end.value = ifelse(x.long == 0,0,
ifelse(row_number() == n(),
last(x.long.shares) * last(close.x),0))) %>%
dplyr::mutate(x.net.profit = ifelse(x.long == 0,0,
ifelse(row_number() == n(),
last(initial.capital) - last(x.end.value),0))) %>%
dplyr::mutate(new.initial.capital = ifelse(x.long == 0,0,
ifelse(row_number() == n(),
last(x.net.profit) + last(inital.capital),0))) %>%
ungroup() %>%
select(-RunID)
I am grouping per x.long column. And when grouped. Making calculations from different columns using the first/last positions within the group My basic question is:
In the photo, see red highlight under new.initial.capital column. How can I 'save' this value (10185.33)... and insert it on the NEXT group, saving it under initial.capital column, again highlighted in red (it would replace 10,000 Or storing it on the first line of the group)?
What I really need to do is save the final value in the new.initial.capital column into a variable. Then this variable can be used in the next group (see code below) The value here will be used as part of the next groups calculations... and then when the end new.initial.capital is updated, then this values goes into the variable, then it carrys to the start of the next group (see code below).. then all the values will update again.... The variable would be placed here:
output <- df %>%
dplyr::mutate(RunID = data.table::rleid(x.long)) %>%
group_by(RunID) %>%
dplyr::mutate(x.long.shares = ifelse(x.long == 0,0,
ifelse(row_number() == n(),
first(end_of_new.initial.capital_variable_from_previous_group) / first(close.x),0))) %>%
I essentially want to carry over values between dplyr groups. Is this possible? Or can I store it in a variable each time?
Heres some example data that is in the photo: Save to .txt
df <- read.table("your_dir\df.txt",header=TRUE, sep="", stringsAsFactors=FALSE)
close.x x.long y.short x.short y.long inital.capital x.long.shares x.end.value x.net.profit new.initial.capital
37.96 NA NA NA NA 10000 NA NA NA NA
36.52 0 0 0 0 10000 0 0 0 0
38.32 0 0 0 0 10000 0 0 0 0
38.5504 0 0 0 0 10000 0 0 0 0
38.17 0 0 0 0 10000 0 0 0 0
38.85 1 1 0 0 10000 0 0 0 0
38.53 1 1 0 0 10000 0 0 0 0
39.13 1 1 0 0 10000 0 0 0 0
38.13 1 1 0 0 10000 257.4002574 9814.671815 185.3281853 10185.32819
37.01 0 0 1 1 10000 0 0 0 0
36.14 0 0 1 1 10000 0 0 0 0
35.27 0 0 1 1 10000 0 0 0 0
35.13 0 0 1 1 10000 0 0 0 0
32.2 0 0 1 1 10000 0 0 0 0
33.03 1 1 0 0 10000 0 0 0 0
34.94 1 1 0 0 10000 0 0 0 0
34.57 1 1 0 0 10000 0 0 0 0
33.6 1 1 0 0 10000 0 0 0 0
34.34 1 1 0 0 10000 302.7550711 10396.60914 -396.6091432 9603.390857
35.86 0 0 1 1 10000 0 0 0 0
I tried to make a variable:
inital.capital <- 10000
And insert this in the code...
output <- df %>%
dplyr::mutate(RunID = data.table::rleid(x.long)) %>%
group_by(RunID) %>%
dplyr::mutate(x.long.shares = ifelse(x.long == 0,0,
ifelse(row_number() == n(),
initial.capital / first(close.x),0))) %>% # place initial.capital variable.. initialized with 10000
dplyr::mutate(x.end.value = ifelse(x.long == 0,0,
ifelse(row_number() == n(),
last(x.long.shares) * last(close.x),0))) %>%
dplyr::mutate(x.net.profit = ifelse(x.long == 0,0,
ifelse(row_number() == n(),
last(initial.capital) - last(x.end.value),0))) %>%
dplyr::mutate(new.initial.capital = ifelse(x.long == 0,0,
ifelse(row_number() == n(),
last(x.net.profit) + last(inital.capital),0))) %>%
dplyr::mutate(new.initial.capitals = ifelse(x.long == 0,0,
ifelse(row_number() == n(),
inital.capital < - last(new.initial.capital),0))) %>% # update variable with the final balance of new.inital.capital column
ungroup() %>%
select(-RunID)
If I can update the initial.capital variable each time. This then would serve as the 'link' between groups. However, this idea is not currently working in the dplyr setup.
Any assistance appreciated.
This kind of use of first and last is very untidy, so we'll keep it for the latest step.
First we build intermediate data, following your code, but adding some columns to join later at the right places. I'm not sure if you need to keep all columns, you won't need the second join if not.
library(dplyr)
library(tidyr)
df1 <- df0 %>%
dplyr::mutate(RunID = data.table::rleid(x.long)) %>%
group_by(RunID) %>%
mutate(RunID_f = ifelse(row_number()==1,RunID,NA)) %>% # for later merge
mutate(RunID_l = ifelse(row_number()==n(),RunID,NA)) # possibly unneeded
Then we build summarized data, I refactored your code a bit as you see, because these operations "should" be rowwise.
summarized_data <- df1 %>%
filter(x.long !=0) %>%
summarize_at(vars(close.x,inital.capital),c("first","last")) %>%
mutate(x.long.share = inital.capital_first / close.x_first,
x.end.value = x.long.share * close.x_last,
x.net.profit = inital.capital_last - x.end.value,
new.initial.capital = x.net.profit + inital.capital_last,
lagged.new.initial.capital = lag(new.initial.capital,1))
# A tibble: 2 x 10
# RunID close.x_first inital.capital_first close.x_last inital.capital_last x.long.share x.end.value x.net.profit new.initial.capital lagged.new.initial.capital
# <int> <dbl> <int> <dbl> <int> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 3 38.85 10000 38.13 10000 257.4003 9814.672 185.3282 10185.328 NA
# 2 5 33.03 10000 34.34 10000 302.7551 10396.609 -396.6091 9603.391 10185.33
Then we join our summarized table to the original, getting advantage of the trick of the firt step. The first join may be skipped if you don't need all columns.
df2 <- df1 %>% ungroup %>%
left_join(summarized_data %>% select(-lagged.new.initial.capital) ,by=c("RunID_l"="RunID")) %>% # if you want the other variables, if not, skip the line
left_join(summarized_data %>% select(RunID,lagged.new.initial.capital) ,by=c("RunID_f"="RunID")) %>%
mutate(inital.capital = ifelse(is.na(lagged.new.initial.capital),inital.capital,lagged.new.initial.capital)) %>%
select(close.x:inital.capital) # for readability here
# # A tibble: 20 x 6
# close.x x.long y.short x.short y.long inital.capital
# <dbl> <int> <int> <int> <int> <dbl>
# 1 37.9600 NA NA NA NA 10000.00
# 2 36.5200 0 0 0 0 10000.00
# 3 38.3200 0 0 0 0 10000.00
# 4 38.5504 0 0 0 0 10000.00
# 5 38.1700 0 0 0 0 10000.00
# 6 38.8500 1 1 0 0 10000.00
# 7 38.5300 1 1 0 0 10000.00
# 8 39.1300 1 1 0 0 10000.00
# 9 38.1300 1 1 0 0 10000.00
# 10 37.0100 0 0 1 1 10000.00
# 11 36.1400 0 0 1 1 10000.00
# 12 35.2700 0 0 1 1 10000.00
# 13 35.1300 0 0 1 1 10000.00
# 14 32.2000 0 0 1 1 10000.00
# 15 33.0300 1 1 0 0 10185.33
# 16 34.9400 1 1 0 0 10000.00
# 17 34.5700 1 1 0 0 10000.00
# 18 33.6000 1 1 0 0 10000.00
# 19 34.3400 1 1 0 0 10000.00
# 20 35.8600 0 0 1 1 10000.00
data
df<- read.table(text="close.x x.long y.short x.short y.long inital.capital x.long.shares x.end.value x.net.profit new.initial.capital
37.96 NA NA NA NA 10000 NA NA NA NA
36.52 0 0 0 0 10000 0 0 0 0
38.32 0 0 0 0 10000 0 0 0 0
38.5504 0 0 0 0 10000 0 0 0 0
38.17 0 0 0 0 10000 0 0 0 0
38.85 1 1 0 0 10000 0 0 0 0
38.53 1 1 0 0 10000 0 0 0 0
39.13 1 1 0 0 10000 0 0 0 0
38.13 1 1 0 0 10000 257.4002574 9814.671815 185.3281853 10185.32819
37.01 0 0 1 1 10000 0 0 0 0
36.14 0 0 1 1 10000 0 0 0 0
35.27 0 0 1 1 10000 0 0 0 0
35.13 0 0 1 1 10000 0 0 0 0
32.2 0 0 1 1 10000 0 0 0 0
33.03 1 1 0 0 10000 0 0 0 0
34.94 1 1 0 0 10000 0 0 0 0
34.57 1 1 0 0 10000 0 0 0 0
33.6 1 1 0 0 10000 0 0 0 0
34.34 1 1 0 0 10000 302.7550711 10396.60914 -396.6091432 9603.390857
35.86 0 0 1 1 10000 0 0 0 0",stringsAsFactors=FALSE,header=TRUE)
df0 <- df %>% select(close.x:inital.capital)
You're using data.table in the question and have tagged the question data.table, so here is a data.table answer. When j
evaluates, it's in a static scope where local variables retain their values from the previous group.
Using dummy data to demonstrate :
require(data.table)
set.seed(1)
DT = data.table( long = rep(c(0,1,0,1),each=3),
val = sample(5,12,replace=TRUE))
DT
long val
1: 0 2
2: 0 2
3: 0 3
4: 1 5
5: 1 2
6: 1 5
7: 0 5
8: 0 4
9: 0 4
10: 1 1
11: 1 2
12: 1 1
DT[, v1:=sum(val), by=rleid(long)][]
long val v1
1: 0 2 7
2: 0 2 7
3: 0 3 7
4: 1 5 12
5: 1 2 12
6: 1 5 12
7: 0 5 13
8: 0 4 13
9: 0 4 13
10: 1 1 4
11: 1 2 4
12: 1 1 4
So far, simple enough.
prev = NA # initialize previous group value
DT[, v2:={ans<-last(val)/prev; prev<-sum(val); ans}, by=rleid(long)][]
long val v1 v2
1: 0 2 7 NA
2: 0 2 7 NA
3: 0 3 7 NA
4: 1 5 12 0.71428571
5: 1 2 12 0.71428571
6: 1 5 12 0.71428571
7: 0 5 13 0.33333333
8: 0 4 13 0.33333333
9: 0 4 13 0.33333333
10: 1 1 4 0.07692308
11: 1 2 4 0.07692308
12: 1 1 4 0.07692308
> 3/NA
[1] NA
> 5/7
[1] 0.7142857
> 4/12
[1] 0.3333333
> 1/13
[1] 0.07692308
> prev
[1] NA
Notice that the prev
value did not update because prev
and ans
are local variables inside j
's scope that were being updated as each group ran. Just to illustrate, the global prev
can be updated from within each group using R's <<-
operator :
DT[, v2:={ans<-last(val)/prev; prev<<-sum(val); ans}, by=rleid(long)]
prev
[1] 4
But there's no need to use <<-
in data.table as local variables are static (retain their values from previous group). Unless you need to use the final group's value after the query has finished.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With