Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is there a statistical difference between generating many random vectors vs a single random matrix

Is there a statistical difference between generating a series of paths for a montecarlo simulation using the following two methods (note that by path I mean a vector of 350 points, normally distributed):

A)

for path = 1:300000
    Zn(path, :) = randn(1, 350); 
end

or the far more efficient B)

Zn = randn(300000, 350);

I just want to be sure there is no funny added correlation or dependence between the rows in method B that isn't present in method A. Like maybe method B distributes normally over 2 dimensions where A is over 1 dimension, so maybe that makes the two statistically different?

If there is a difference then I need to know the same for uniform distributions (i.e. rand instead of randn)

like image 733
Dan Avatar asked Jan 30 '13 07:01

Dan


2 Answers

Just to add to the answer of @natan (+1), run the following code:

%# Store the seed
Rng1 = rng;

%# Get a matrix of random numbers
X = rand(3, 3);

%# Restore the seed
rng(Rng1);

%# Get a matrix of random numbers one vector at a time
Y = nan(3, 3);
for n = 1:3
    Y(:, n) = rand(3, 1);
end

%# Test for differences
if any(any(X - Y ~= 0)); disp('Error'); end;

You'll note that there is no difference between X and Y. That is, there is no difference between building a matrix in one step, and building a matrix from a sequence of vectors.

However, there is a difference between my code and yours. Note I am populating the matrix by columns, not rows, since when rand is used to construct a matrix in one step, it populates by column. By the way, I'm not sure if you realize, but as a general rule you should always try and perform vector operations on the columns of matrices, not the rows. I explained why in a response to a question on SO the other day; see here for more...

Regarding the question of independence/dependence, one needs to be careful with the language one uses. The sequence of numbers generated by rand are perfectly dependent. For the vast majority of statistical tests, they will appear to be independent - nonetheless, in theory, one could construct a statistical test that would demonstrate the dependency between a sequence of numbers generated by rand.

Final thought, if you have a copy of Greene's "Econometric Analysis", he gives a neat discussion of random number generation in section 17.2.

like image 81
Colin T Bowers Avatar answered Sep 29 '22 23:09

Colin T Bowers


As far as the base R's random number generator is concerned, also, there doesn't appear to be any difference between generating a sequence of random numbers at once or doing it one-by one. Thus, @Colin T Bowers' (+1) suggested behavior above also holds in R. Below is an R version of Colin's code:

#set seed
set.seed(1234)
# generate a sequence of 10,000 random numbers at once 
X<-rnorm(10000)
# reset the seed
set.seed(1234)
# create a vector of 10,000 zeros
Y<-rep(0,times=10000)
# generate a sequence of 10,000 random numbers, one at a time
for (i in 1:10000){
Y[i]<-rnorm(1)
}
# Test for differences
if(any(X-Y!=0)){print("Error")} 
like image 32
user3204008 Avatar answered Sep 29 '22 23:09

user3204008