Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Sparse random matrix in Python with different range than [0,1]

I need to generate a sparse random matrix in Python with all values in the range [-1,1] with uniform distribution. What is the most efficient way to do this?

I have a basic sparse random matrix:

from scipy import sparse
from numpy.random import RandomState

p = sparse.rand(10, 10, 0.1, random_state=RandomState(1))

And this gives me values in [0,1]:

print p
  (0, 0)    0.419194514403
  (0, 3)    0.0273875931979
  (1, 4)    0.558689828446
  (2, 7)    0.198101489085
  (3, 5)    0.140386938595
  (4, 1)    0.204452249732
  (4, 3)    0.670467510178
  (8, 1)    0.878117436391
  (9, 0)    0.685219500397
  (9, 3)    0.417304802367

It would be good to have an in-place solution or something that doesn't require blowing it up to a full matrix since in practice I will be using very large dimensions. It surprises me there are not some quick parameters to set for sparse.rand itself.

like image 588
adamconkey Avatar asked Jun 02 '15 03:06

adamconkey


2 Answers

Looks like the feature that you want was added about two months ago and will be available in scipy 0.16: https://github.com/scipy/scipy/blob/77af8f44bef43a67cb14c247bc230282022ed0c2/scipy/sparse/construct.py#L671

You will be able to call sparse.random(10, 10, 0.1, random_state=RandomState(1), data_fvs=func) where func "should take a single argument specifying the length of the ndarray that it will return. The structurally nonzero entries of the sparse random matrix will be taken from the array sampled by this function." So you will be able to provide an arbitrary distribution to sample from.

For now, you can at least stretch the uniform distribution to [0,N] by multiplying p by a scalar N:

>>> print 2*p

(0, 0)  0.838389028807
(9, 0)  1.37043900079
(4, 1)  0.408904499463
(8, 1)  1.75623487278
(0, 3)  0.0547751863959
(4, 3)  1.34093502036
(9, 3)  0.834609604734
(1, 4)  1.11737965689
(3, 5)  0.28077387719
(2, 7)  0.39620297817

You can't add scalars, but as a bit of a hack you can create a sparse matrix with all ones in the non-zero elements with p.ceil() since all elements of p were generated within [0,1]. Then to transform the uniform distribution to [-1,1] you can do

 print 2*p - p.ceil()

(0, 0)  -0.161610971193
(0, 3)  -0.945224813604
(1, 4)  0.117379656892
(2, 7)  -0.60379702183
(3, 5)  -0.71922612281
(4, 1)  -0.591095500537
(4, 3)  0.340935020357
(8, 1)  0.756234872782
(9, 0)  0.370439000794
(9, 3)  -0.165390395266

So in general if you need some interval [a,b] just perform:

p = (b - a)*p + a*p.ceil()

I can't see much of a better solution at present short of writing your own constructor similar to sparse.rand, but I would be curious to know if anyone at least knows a way to get around the ceil() hack.

like image 172
Eric Appelt Avatar answered Oct 23 '22 01:10

Eric Appelt


Since sparse.rand produces a coo matrix (as default) you could directly manipulate its .data attribute. ('csr' format could be transformed this way)

p=sparse.rand(10,10,0.1)
p.data *=2
p.data -=1

Before and after values would be:

  (0, 4)    0.758811389117
  (1, 8)    0.703514506105
  (1, 9)    0.640418745353
  (4, 0)    0.896198785835
  (4, 6)    0.511459880587
  (5, 2)    0.580048680358
  (7, 1)    0.739418689993
  (8, 3)    0.506395207688
  (8, 5)    0.900696518461
  (9, 4)    0.474014207942

  (0, 4)    0.517622778234
  (1, 8)    0.40702901221
  (1, 9)    0.280837490706
  (4, 0)    0.79239757167
  (4, 6)    0.0229197611736
  (5, 2)    0.160097360716
  (7, 1)    0.478837379986
  (8, 3)    0.0127904153758
  (8, 5)    0.801393036923
  (9, 4)    -0.051971584115

Same spatial density, just different value distribution.

In fact you could generate completely new .data values. The end of sparse.rand is:

....
j = .... # tweak random values
i = ...  # tweak ints
vals = random_state.rand(k).astype(dtype)
return coo_matrix((vals, (i, j)), shape=(m, n)).asformat(format)

The random array is generated from 3 random sequences, 2 producing integers in the right shape range, and the third producing the random values.

For example random values chosen from a list:

In [209]: p.data=np.random.choice(np.arange(20)-10,len(p.data))/10

In [210]: print(p.A)
[[ 0.   0.   0.   0.   0.9  0.   0.   0.   0.   0. ]
 [ 0.   0.   0.   0.   0.   0.   0.   0.  -0.1 -0.7]
 [ 0.   0.   0.   0.   0.   0.   0.   0.   0.   0. ]
 [ 0.   0.   0.   0.   0.   0.   0.   0.   0.   0. ]
 [-1.   0.   0.   0.   0.   0.  -0.8  0.   0.   0. ]
 [ 0.   0.   0.5  0.   0.   0.   0.   0.   0.   0. ]
 [ 0.   0.   0.   0.   0.   0.   0.   0.   0.   0. ]
 [ 0.   0.   0.   0.   0.   0.   0.   0.   0.   0. ]
 [ 0.   0.   0.   0.5  0.   0.4  0.   0.   0.   0. ]
 [ 0.   0.   0.   0.  -0.8  0.   0.   0.   0.   0. ]]

The development code just changes the 2nd to the last line to:

vals = data_rvs(k).astype(dtype)

where data_rvs is a parameter (or the default randomstate.rand).

like image 31
hpaulj Avatar answered Oct 23 '22 02:10

hpaulj