Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Cumulative measure using R embedded in SQL

Tags:

sql

sql-server

r

Forgive me I'm very new with R and I'm just looking at options currently in our SQL 2016 environment.

We currently have a requirement to provide cumulative performance returns. An example dataset is below:

FundID  Date        FundReturn
ABC     1987-10-31  0
ABC     1987-11-30  -9.28669
ABC     1987-12-31  3.08304
ABC     1988-01-31  -3.00125
ABC     1988-02-29  0.61238
ABC     1988-03-31  4.29258
ABC     1988-04-30  0.13697
ABC     1988-05-31  2.57786
ABC     1988-06-30  2.36947
ABC     1988-07-31  0.57114
ABC     1988-08-31  -1.21550
ABC     1988-09-30  7.09027
ABC     1988-10-31  3.45807
ABC     1988-11-30  1.12679

We would need to take this data set and apply a cumulative performance return measure to it so that the datset would look like the following:

FundID  Date        FundReturn      FundReturnCumu100   FundReturnCumu0
ABC     1987-10-31  0               1                   0
ABC     1987-11-30  -9.28669        0.9071331           -0.0928669
ABC     1987-12-31  3.08304         0.935100376         -0.064899624
ABC     1988-01-31  -3.00125        0.907035676         -0.092964324
ABC     1988-02-29  0.61238         0.912590181         -0.087409819
ABC     1988-03-31  4.29258         0.951763845         -0.048236155
ABC     1988-04-30  0.13697         0.953067476         -0.046932524
ABC     1988-05-31  2.57786         0.977636221         -0.022363779
ABC     1988-06-30  2.36947         1.000801018         0.000801018
ABC     1988-07-31  0.57114         1.006516993         0.006516993
ABC     1988-08-31  -1.2155         0.994282779         -0.005717221
ABC     1988-09-30  7.09027         1.064780113         0.064780113
ABC     1988-10-31  3.45807         1.101600954         0.101600954
ABC     1988-11-30  1.12679         1.114013684         0.114013684

I can create this in SQL with the following code:

SELECT
        FundID          
    ,   [Date]          
    ,   FundReturn      

    ,   ISNULL  (
                    EXP(SUM(LOG(ABS(NULLIF((FundReturn+100)/100, 1))))
                        OVER(ORDER BY FundID, [Date] ROWS UNBOUNDED PRECEDING))
                ,1)                                                     AS FundReturnCumu100        

    ,   ISNULL  (
                    EXP(SUM(LOG(ABS(NULLIF((FundReturn+100)/100, 1))))
                        OVER(ORDER BY FundID, [Date] ROWS UNBOUNDED PRECEDING))
                ,1)-1                                                   AS FundReturnCumu0  

FROM #Worktable
ORDER BY [Date]

I also want to test if this is possible using an R function embedded into a SQL Stored Procedure to end up with the same result? The maths above is basically the product of the performance return over a timeseries, so is there a product function I could use that could create the same result datset?

Edit: I have knocked up the following so far which returns the basis of the dataset using sp_execute_external_script:

EXEC sp_execute_external_script
        @language       =   N'R'
    ,   @script         =   N'OutputDataSet<-InputDataSet'
    ,   @input_data_1   =   N'  SELECT * 
                                FROM [InMemory].[dbo].[CumulativePerformanceTest] 
                                ORDER BY [FundID],[Date]'

WITH RESULT SETS    (
                    (
                            [FundID]            NVARCHAR(50)
                        ,   [Date]              DATE
                        ,   [FundReturn]        NVARCHAR(255)
                    )
                    );


GO

What would I have to do to change the above to apply the FundReturnCumu100 and FundReturnCumu0 calculation in R?

Thanks

like image 521
chrissy p Avatar asked Apr 04 '18 15:04

chrissy p


People also ask

How do you calculate cumulative in SQL?

In SQL server also you can calculate cumulative sum by using sum function. We can use same table as sample table. select dept_no Department_no, count(empno) Employee_Per_Dept, sum(count(*)) over (order by deptno) Cumulative_Total from [DBO].


1 Answers

We can convert it to dplyr code with

library(dplyr)
df1 %>% 
   arrange(FundID, Date) %>%
   mutate(FundReturnCumu100 = exp(cumsum(log(abs((FundReturn + 100)/100)))), 
          FundReturnCumu0 = FundReturnCumu100 - 1)
# FundID       Date FundReturn FundReturnCumu100 FundReturnCumu0
#1     ABC 1987-10-31    0.00000         1.0000000    0.0000000000
#2     ABC 1987-11-30   -9.28669         0.9071331   -0.0928669000
#3     ABC 1987-12-31    3.08304         0.9351004   -0.0648996237
#4     ABC 1988-01-31   -3.00125         0.9070357   -0.0929643237
#5     ABC 1988-02-29    0.61238         0.9125902   -0.0874098186
#6     ABC 1988-03-31    4.29258         0.9517638   -0.0482361550
#7     ABC 1988-04-30    0.13697         0.9530675   -0.0469325241
#8     ABC 1988-05-31    2.57786         0.9776362   -0.0223637789
#9     ABC 1988-06-30    2.36947         1.0008010    0.0008010181
#10    ABC 1988-07-31    0.57114         1.0065170    0.0065169930
#11    ABC 1988-08-31   -1.21550         0.9942828   -0.0057172210
#12    ABC 1988-09-30    7.09027         1.0647801    0.0647801126
#13    ABC 1988-10-31    3.45807         1.1016010    0.1016009542
#14    ABC 1988-11-30    1.12679         1.1140137    0.1140136836
like image 130
akrun Avatar answered Oct 21 '22 08:10

akrun