Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

rollapply for large data using sparklyr

I want to estimate rolling value-at-risk for a dataset of about 22.5 million observations, thus I want to use sparklyr for fast computation. Here is what I did (using a sample database):

library(PerformanceAnalytics)
library(reshape2)
library(dplyr)

data(managers)
data <- zerofill(managers)
data<-as.data.frame(data)
class(data)
data$date=row.names(data)
lmanagers<-melt(data, id.vars=c('date'))

Now I estimate VaR using dplyr and PerformanceAnalytics packages:

library(zoo) # for rollapply()
var <- lmanagers %>% group_by(variable) %>% arrange(variable,date) %>% 
  mutate(var=rollapply(value, 10,FUN=function(x) VaR(x, p=.95, method="modified",align = "right"), partial=T))

This works fine. Now I do this to make use of sparklyr:

library(sparklyr)
sc <- spark_connect(master = "local")
lmanagers_sp <- copy_to(sc,lmanagers)
src_tbls(sc)

var_sp <- lmanagers_sp %>% group_by(variable) %>% arrange(variable,date) %>% 
  mutate(var=rollapply(value, 10,FUN=function(x) VaR(x, p=.95, method="modified",align = "right"), partial=T)) %>% 
  collect

But this gives the following error:

Error: Unknown input type: pairlist

Can anyone please tell me where is the error and what is the correct code? Or any other solution to estimate rolling VaR faster is also appreciates.

like image 912
Jairaj Gupta Avatar asked Sep 03 '17 14:09

Jairaj Gupta


1 Answers

For custom dplyr backends like sparklyr, mutate does not currently support arbitrary R functions defined in other packages; therefore, rollapply() is currently unsupported.

In order to calculate value-at-risk in sparklyr, one approach is to extend sparklyr using Scala and R and follow an approach similar to: Estimating Financial Risk with Apache Spark.

like image 193
Javier Luraschi Avatar answered Sep 19 '22 14:09

Javier Luraschi