Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python: How to generate a vector with nonzero entries at random positions?

Tags:

python

numpy

I want to generate a vector using Numpy that is k-sparse, i.e. it has n entries of which k are nonzero. The positions of the nonzero entries are chosen randomly, and the entries themselves are chosen from a Gaussian distribution with zero mean and unit variance. The test vector is small (256 entries) so I don't think Scipy's sparse matrix interface is necessary here.

My current approach is to generate a random list of k integers between 0 and 256, initialize a vector full of zeros, and then use a for loop to choose a random value and replace the entries of the vector with those values, like so:

# Construct data vector x
# Entries of x are ~N(0, 1) and are placed in the positions specified by the
# 'nonzeros' vector
x = np.zeros((256, 1))

# Get a random value ~N(0, 1) and place it in x at the position specified by
# 'nonzeros' vector
for i in range(k):
    x[nonzeros[i]] = np.random.normal(mu, sigma)

Performance isn't an issue here (it's research-related) so this works, but it feels unpythonic, and I suspect there is a more elegant solution.

like image 670
Scott Avatar asked Feb 12 '23 19:02

Scott


2 Answers

How about this:

In [41]: import numpy as np

In [42]: x = np.zeros(10)

In [43]: positions = np.random.choice(np.arange(10), 3, replace=False)

In [44]: x[positions] = np.random.normal(0,1,3)

In [45]: x
Out[45]: 
array([ 0.        ,  0.11197222,  0.        ,  0.09540939, -0.04488175,
        0.        ,  0.        ,  0.        ,  0.        ,  0.        ])
like image 164
Akavall Avatar answered Feb 14 '23 07:02

Akavall


Here's a recipe: first fill in the first k elements of a zero vector with Gaussian noise, then shuffle to get those k at random positions.

>>> n = 10
>>> k = 3
>>> a = np.zeros(n)
>>> a[:k] = np.random.randn(k)
>>> np.random.shuffle(a)
>>> a
array([ 1.26611853,  0.        ,  0.        ,  0.        , -0.84272405,
        0.        ,  0.        ,  1.96992445,  0.        ,  0.        ])

This solution can be faster than the accepted one:

>>> def rand_nz_1(n, k):
...     a = np.zeros(n)
...     pos = np.random.choice(np.arange(n), k, replace=False)
...     a[pos] = np.random.randn(k)
...     return a
... 
>>> def rand_nz_2(n, k):
...     a = np.zeros(n)
...     a[:k] = np.random.randn(k)
...     np.random.shuffle(a)
...     return a
... 
>>> %timeit rand_nz_1(256, 15)
10000 loops, best of 3: 63.8 µs per loop
>>> %timeit rand_nz_2(256, 15)
10000 loops, best of 3: 38.3 µs per loop
like image 21
Fred Foo Avatar answered Feb 14 '23 09:02

Fred Foo