I want to estimate rolling value-at-risk for a dataset of about 22.5 million observations, thus I want to use sparklyr for fast computation. Here is what I did (using a sample database):
library(PerformanceAnalytics)
library(reshape2)
library(dplyr)
data(managers)
data <- zerofill(managers)
data<-as.data.frame(data)
class(data)
data$date=row.names(data)
lmanagers<-melt(data, id.vars=c('date'))
Now I estimate VaR using dplyr and PerformanceAnalytics packages:
library(zoo) # for rollapply()
var <- lmanagers %>% group_by(variable) %>% arrange(variable,date) %>%
mutate(var=rollapply(value, 10,FUN=function(x) VaR(x, p=.95, method="modified",align = "right"), partial=T))
This works fine. Now I do this to make use of sparklyr:
library(sparklyr)
sc <- spark_connect(master = "local")
lmanagers_sp <- copy_to(sc,lmanagers)
src_tbls(sc)
var_sp <- lmanagers_sp %>% group_by(variable) %>% arrange(variable,date) %>%
mutate(var=rollapply(value, 10,FUN=function(x) VaR(x, p=.95, method="modified",align = "right"), partial=T)) %>%
collect
But this gives the following error:
Error: Unknown input type: pairlist
Can anyone please tell me where is the error and what is the correct code? Or any other solution to estimate rolling VaR faster is also appreciates.
For custom dplyr
backends like sparklyr
, mutate
does not currently support arbitrary R functions defined in other packages; therefore, rollapply()
is currently unsupported.
In order to calculate value-at-risk in sparklyr
, one approach is to extend sparklyr using Scala and R and follow an approach similar to: Estimating Financial Risk with Apache Spark.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With