Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

scipy linalg deterministic/non-deterministic code

I'm running this SVD solver from scipy with the below code:

import numpy as np
from scipy.sparse.linalg import svds

features = np.arange(9,dtype=np.float64).reshape((3,3))
for i in range(10):
    _,_,V = svds(features,2)
    print i,np.mean(V)

I expected the printed mean value to be the same each time, however it changes and seems to cycle through a few favourite values. I'm happy to accept that behaviour as a consequence of the low level optimisation/random seeding.

What I don't quite get is why it will output the same values in the same order each time I run that script. To me it seems semi deterministic and semi non-deterministic.

This is problem is affecting some more complicated processing and it would be nice to understand it so I can at least do some hacky workaround.

like image 265
chris Avatar asked Oct 31 '22 04:10

chris


1 Answers

Without testing myself (on a tablet right now without a Python shell handy), I believe this is due to somewhat strange behavior related to the starting point for the initialization used by the approximate eigensolver library ARPACK, which is what svds ends up calling.

If you follow through the Python code from svds, v0 (the starting point in question) is handled only in _ArpackParams, where it's set to zeros and the info parameter set to 0 if v0 is None; otherwise, v0 is kept as its value and info is 1. Then we go into the realm of dragons Fortran, calling (if the matrix is doubles) the function dsaupd, which I didn't fully check but I'm assuming eventually calls cgetv0 when a random starting point is requested. This function appears to initialize the LAPACK RNG seed to 1357 the first time it's called.

So, if you don't make any other calls to ARPACK (or possibly other LAPACK things, not sure how these interact with one another), you're starting the RNG with the same seed every time and thus getting the same initialization points every time; thus, assuming this is the only source of randomness in the algorithm, you'll get the same sequence of answers every time.

You could hack around this by, say, calling eigs on a small matrix a random number of times at the start of your code.

like image 82
Danica Avatar answered Nov 09 '22 15:11

Danica