Fast adding random variables in C++

Tags:

Short version: how to most efficiently represent and add two random variables given by lists of their realizations?

Mildly longer version: for a workproject, I need to add several random variables each of which is given by a list of values. For example, the realizations of rand. var. A are {1,2,3} and the realizations of B are {5,6,7}. Hence, what I need is the distribution of A+B, i.e. {1+5,1+6,1+7,2+5,2+6,2+7,3+5,3+6,3+7}. And I need to do this kind of adding several times (let's denote this number of additions as COUNT, where COUNT might reach 720) for different random variables (C, D, ...).

The problem: if I use this stupid algorithm of summing each realization of A with each realization of B, the complexity is exponential in COUNT. Hence, for the case where each r.v. is given by three values, the amount of calculations for COUNT=720 is 3^720 ~ 3.36xe^343 which will last till the end of our days to calculate:) Not to mention that in real life, the lenght of each r.v. is gonna be 5000+.

Solutions: 1/ The first solution is to use the fact that I am OK with rounding, i.e. having integer values of realizations. Like this, I can represent each r.v. as a vector and for at the index corresponding to a realization I have a value of 1 (when the r.v. has this realization once). So for a r.v. A and a vector of realizations indexed from 0 to 10, the vector representing A would be [0,1,1,1,0,0,0...] and the representation for B would be [0,0,0,0,0,1,1,1,0,0,10]. Now I create A+B by going through these vectors and do the same thing as above (sum each realization of A with each realization of B and codify it into the same vector structure, quadratic complexity in vector length). The upside of this approach is that the complexity is bound. The problem of this approach is that in real applications, the realizations of A will be in the interval [-50000,50000] with a granularity of 1. Hence, after adding two random variables, the span of A+B gets to -100K, 100K.. and after 720 additions, the span of SUM(A, B, ...) gets to [-36M, 36M] and even quadratic complexity (compared to exponential complexity) on arrays this large will take forever.

2/ To have shorter arrays, one could possibly use a hashmap, which would most likely reduce the number of operations (array accesses) involved in A+B as the assumption is that some non-trivial portion of the theoreical span [-50K, 50K] will never be a realization. However, with continuing summing of more and more random variables, the number of realizations increases exponentially while the span increases only linearly, hence the density of numbers in the span increases over time. And this would kill the hashmap's benefits.

So the question is: how can I do this problem efficiently? The solution is needed for calculating a VaR in electricity trading where all distributions are given empirically and are like no ordinary distributions, hence formulas are of no use, we can only simulate.

Using math was considered as the first option as half of our dept. are mathematicians. However, the distributions that we're going to add are badly behaved and the COUNT=720 is an extreme. More likely, we are going to use COUNT=24 for a daily VaR. Taking into account the bad behaviour of distributions to add, for COUNT=24 the central limit theorem would not hold too closely (the distro of SUM(A1, A2, ..., A24) would not be close to normal). As we're calculating possible risks, we'd like to get a number as precise as possible.

The intended use is this: you have hourly casflows from some operation. The distribution of cashflows for one hour is the r.v. A. For the next hour, it's r.v. B, etc. And your question is: what is the largest loss in 99 percent of cases? So you model the cashflows for each of those 24 hours and add these cashflows as random variables so as to get a distribution of the total casfhlow over the whole day. Then you take the 0.01 quantile.

872

asked Oct 24 '12 08:10

Daniel Bencik

1 Answers

Try to reduce the number of passes required to make the whole addition, possibly reducing it to a single pass for every list, including the final one.

I don't think you can cut down on the total number of additions.

In addition, you should look into parallel algorithms and multithreading, if applicable.

At this point, most processors are able to perform additions in parallel, given proper instrucions (SSE), which will make the additions many times faster(still not a cure for the complexity problem).

143

answered Oct 22 '22 07:10

jt234

Related questions
                            
                                WICConvertBitmapSource BGR to Gray unexpected pixel format conversion
                            
                                Automatic template deduction C++20 with aggregate type
                            
                                Changing image of a menu button in a CMFCToolbar
                            
                                C++: emulating RTTI
                            
                                How can I grab all function parameters values list and chain them to a logging class in Visual C++?
                            
                                OpenGL/PBO pixel drawing example needed
                            
                                C++ type id at compile time
                            
                                Class hierarchy of tokens and checking their type in the parser
                            
                                Is this hack to remove aliasing warning UB?
                            
                                What's the memory limit in WinXP when getting a callback from a C++ DLL in C#?
                            
                                Which video decoder should I use for cutscenes in my game?
                            
                                how to detect references to members of temporary objects
                            
                                Find the best Region of Interest after edge detection in OpenCV
                            
                                Eclipse CDT: 'can't find a source file' while debugging
                            
                                Leak caused by fread
                            
                                Tutorial or example for zookeeper C binding [closed]
                            
                                Setting limit on post queue size with Boost Asio?
                            
                                Receiving UDP broadcast
                            
                                Template template class with enum specification fails on MSVC++ Compiler: C3201
                            
                                What is the best way to implement an array of 3d vectors?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Fast adding random variables in C++

Tags:

c++

performance

random

Daniel Bencik

People also ask

1 Answers

jt234

Recent Activity

Donate For Us