I need to create a 10,000 x 50 array in which each row contains an ascending series of random numbers between 1 and 365, like so:
[[ 4 11 14 ..., 355 360 364]
[ 2 13 15 ..., 356 361 361]
[ 4 12 18 ..., 356 361 365]
...,
[ 6 9 17 ..., 356 362 364]
[ 1 10 19 ..., 352 357 360]
[ 1 9 17 ..., 356 358 364]]
The only way I've figured out to do this is by way of an iterator:
sample_dates = np.array([np.sort(np.random.choice(365, 50, replace=False)) for _ in range(10000)])
which works, but is pretty slow (~0.33 seconds to run) and I'm going to be doing this thousands of times). Is there a faster way to accomplish this?
EDIT: From what I can tell, the most expensive part of this solution is the iteration and 10k individual calls to np.random.choice, not the sorting
choice() function is used to get random elements from a NumPy array. It is a built-in function in the NumPy package of python. Parameters: a: a one-dimensional array/list (random sample will be generated from its elements) or an integer (random samples will be generated in the range of this integer)
Generate Random NumberNumPy offers the random module to work with random numbers.
The random module from numpy offers a wide range ways to generate random numbers sampled from a known distribution with a fixed set of parameters.
Numpy's random number routines produce pseudo random numbers using combinations of a BitGenerator to create sequences and a Generator to use those sequences to sample from different statistical distributions: BitGenerators: Objects that generate random numbers.
The following solution does not use sort:
l = np.array([True]*50 + [False]*315)
total = np.arange(1,366)
sample_dates = np.array([total[np.random.permutation(l)] for _ in range(10000)])
Hence it seems to be faster than the other suggested solutions (takes 0.44 seconds on my computer versus 0.77 for "Nils Werner"'s solution. The OP's solution took 0.81 seconds).
Considering the shapes of the arrays, I thought iterating on columns might provide some improvement. So my idea was to generate 10k numbers - with replacement. Then, on a loop, generate another 10k numbers and check for row-wise duplicates. If there are any, eliminate those and generate that many random numbers. This is also called hit and miss algorithm, if I remember correctly.
Here's the working code:
arr = np.random.choice(365, 10000)
for i in range(49):
arr2 = np.random.choice(365, 10000)
comp = (arr2 == arr)
while comp.any():
duplicate = comp if i==0 else comp.any(axis=0)
arr2[duplicate] = np.random.choice(365, duplicate.sum())
comp = (arr2 == arr)
arr = np.vstack([arr, arr2])
arr = arr.T
arr.sort(axis=1)
This takes 93.4ms to complete. Your attempt takes 590ms on my computer so it provides ~6x improvement.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With