I am currently writing an app in python that needs to generate large amount of random numbers, FAST. Currently I have a scheme going that uses numpy to generate all of the numbers in a giant batch (about ~500,000 at a time). While this seems to be faster than python's implementation. I still need it to go faster. Any ideas? I'm open to writing it in C and embedding it in the program or doing w/e it takes. Constraints on the random numbers: <ul> <li>A Set of 7 numbers that can all have different bounds: <ul> <li>eg: [0-X1, 0-X2, 0-X3, 0-X4, 0-X5, 0-X6, 0-X7]</li> <li>Currently I am generating a list of 7 numbers with random values from [0-1) then multiplying by [X1..X7]</li> </ul> </li> <li>A Set of 13 numbers that all add up to 1 <ul> <li>Currently just generating 13 numbers then dividing by their sum</li> </ul> </li> </ul> Any ideas? Would pre calculating these numbers and storing them in a file make this faster? Thanks!

You can speed things up a bit from what mtrw posted above just by doing what you initially described (generating a bunch of random numbers and multiplying and dividing accordingly)... Also, you probably already know this, but be sure to do the operations in-place (*=, /=, +=, etc) when working with large-ish numpy arrays. It makes a huge difference in memory usage with large arrays, and will give a considerable speed increase, too. <pre class="prettyprint"><code>In [53]: def rand_row_doubles(row_limits, num): ....: ncols = len(row_limits) ....: x = np.random.random((num, ncols)) ....: x *= row_limits ....: return x ....: In [59]: %timeit rand_row_doubles(np.arange(7) + 1, 1000000) 10 loops, best of 3: 187 ms per loop </code></pre> As compared to: <pre class="prettyprint"><code>In [66]: %timeit ManyRandDoubles(np.arange(7) + 1, 1000000) 1 loops, best of 3: 222 ms per loop </code></pre> It's not a huge difference, but if you're really worried about speed, it's something. Just to show that it's correct: <pre class="prettyprint"><code>In [68]: x.max(0) Out[68]: array([ 0.99999991, 1.99999971, 2.99999737, 3.99999569, 4.99999836, 5.99999114, 6.99999738]) In [69]: x.min(0) Out[69]: array([ 4.02099599e-07, 4.41729377e-07, 4.33480302e-08, 7.43497138e-06, 1.28446819e-05, 4.27614385e-07, 1.34106753e-05]) </code></pre> Likewise, for your "rows sum to one" part... <pre class="prettyprint"><code>In [70]: def rand_rows_sum_to_one(nrows, ncols): ....: x = np.random.random((ncols, nrows)) ....: y = x.sum(axis=0) ....: x /= y ....: return x.T ....: In [71]: %timeit rand_rows_sum_to_one(1000000, 13) 1 loops, best of 3: 455 ms per loop In [72]: x = rand_rows_sum_to_one(1000000, 13) In [73]: x.sum(axis=1) Out[73]: array([ 1., 1., 1., ..., 1., 1., 1.]) </code></pre> Honestly, even if you re-implement things in C, I'm not sure you'll be able to beat numpy by much on this one... I could be very wrong, though!

EDIT Created functions that return the full set of numbers, not just one row at a time. EDIT 2 Make the functions more pythonic (and faster), add solution for second question For the first set of numbers, you might consider <code>numpy.random.randint</code> or <code>numpy.random.uniform</code>, which take <code>low</code> and <code>high</code> parameters. Generating an array of 7 x 1,000,000 numbers in a specified range seems to take < 0.7 second on my 2 GHz machine: <pre class="prettyprint"><code>def LimitedRandInts(XLim, N): rowlen = (1,N) return [np.random.randint(low=0,high=lim,size=rowlen) for lim in XLim] def LimitedRandDoubles(XLim, N): rowlen = (1,N) return [np.random.uniform(low=0,high=lim,size=rowlen) for lim in XLim] >>> import numpy as np >>> N = 1000000 #number of randoms in each range >>> xLim = [x*500 for x in range(1,8)] #convenient limit generation >>> fLim = [x/7.0 for x in range(1,8)] >>> aa = LimitedRandInts(xLim, N) >>> ff = LimitedRandDoubles(fLim, N) </code></pre> This returns integers in [0,xLim-1] or floats in [0,fLim). The integer version took ~0.3 seconds, the double ~0.66, on my 2 GHz single-core machine. For the second set, I used @Joe Kingston's suggestion. <pre class="prettyprint"><code>def SumToOneRands(NumToSum, N): aa = np.random.uniform(low=0,high=1.0,size=(NumToSum,N)) #13 rows by 1000000 columns, for instance s = np.reciprocal(aa.sum(0)) aa *= s return aa.T #get back to column major order, so aa[k] is the kth set of 13 numbers >>> ll = SumToOneRands(13, N) </code></pre> This takes ~1.6 seconds. In all cases, <code>result[k]</code> gives you the kth set of data.

Fastest Way to generate 1,000,000+ random numbers in python

Tags:

I am currently writing an app in python that needs to generate large amount of random numbers, FAST. Currently I have a scheme going that uses numpy to generate all of the numbers in a giant batch (about ~500,000 at a time). While this seems to be faster than python's implementation. I still need it to go faster. Any ideas? I'm open to writing it in C and embedding it in the program or doing w/e it takes.

Constraints on the random numbers:

A Set of 7 numbers that can all have different bounds:
- eg: [0-X1, 0-X2, 0-X3, 0-X4, 0-X5, 0-X6, 0-X7]
- Currently I am generating a list of 7 numbers with random values from [0-1) then multiplying by [X1..X7]
A Set of 13 numbers that all add up to 1
- Currently just generating 13 numbers then dividing by their sum

Any ideas? Would pre calculating these numbers and storing them in a file make this faster?

Thanks!

314

asked Apr 25 '10 20:04

Sandro

2 Answers

You can speed things up a bit from what mtrw posted above just by doing what you initially described (generating a bunch of random numbers and multiplying and dividing accordingly)...

Also, you probably already know this, but be sure to do the operations in-place (*=, /=, +=, etc) when working with large-ish numpy arrays. It makes a huge difference in memory usage with large arrays, and will give a considerable speed increase, too.

In [53]: def rand_row_doubles(row_limits, num):    ....:     ncols = len(row_limits)    ....:     x = np.random.random((num, ncols))    ....:     x *= row_limits                      ....:     return x                              ....:                                        In [59]: %timeit rand_row_doubles(np.arange(7) + 1, 1000000) 10 loops, best of 3: 187 ms per loop

As compared to:

In [66]: %timeit ManyRandDoubles(np.arange(7) + 1, 1000000) 1 loops, best of 3: 222 ms per loop

It's not a huge difference, but if you're really worried about speed, it's something.

Just to show that it's correct:

In [68]: x.max(0) Out[68]: array([ 0.99999991,  1.99999971,  2.99999737,  3.99999569,  4.99999836,         5.99999114,  6.99999738])  In [69]: x.min(0) Out[69]: array([  4.02099599e-07,   4.41729377e-07,   4.33480302e-08,          7.43497138e-06,   1.28446819e-05,   4.27614385e-07,          1.34106753e-05])

Likewise, for your "rows sum to one" part...

In [70]: def rand_rows_sum_to_one(nrows, ncols):    ....:     x = np.random.random((ncols, nrows))    ....:     y = x.sum(axis=0)    ....:     x /= y    ....:     return x.T    ....:  In [71]: %timeit rand_rows_sum_to_one(1000000, 13) 1 loops, best of 3: 455 ms per loop  In [72]: x = rand_rows_sum_to_one(1000000, 13)  In [73]: x.sum(axis=1) Out[73]: array([ 1.,  1.,  1., ...,  1.,  1.,  1.])

Honestly, even if you re-implement things in C, I'm not sure you'll be able to beat numpy by much on this one... I could be very wrong, though!

170

answered Oct 25 '22 01:10

Joe Kington

EDIT Created functions that return the full set of numbers, not just one row at a time. EDIT 2 Make the functions more pythonic (and faster), add solution for second question

For the first set of numbers, you might consider numpy.random.randint or numpy.random.uniform, which take low and high parameters. Generating an array of 7 x 1,000,000 numbers in a specified range seems to take < 0.7 second on my 2 GHz machine:

def LimitedRandInts(XLim, N):     rowlen = (1,N)     return [np.random.randint(low=0,high=lim,size=rowlen) for lim in XLim]  def LimitedRandDoubles(XLim, N):     rowlen = (1,N)     return [np.random.uniform(low=0,high=lim,size=rowlen) for lim in XLim]  >>> import numpy as np >>> N = 1000000 #number of randoms in each range >>> xLim = [x*500 for x in range(1,8)] #convenient limit generation >>> fLim = [x/7.0 for x in range(1,8)] >>> aa = LimitedRandInts(xLim, N) >>> ff = LimitedRandDoubles(fLim, N)

This returns integers in [0,xLim-1] or floats in [0,fLim). The integer version took ~0.3 seconds, the double ~0.66, on my 2 GHz single-core machine.

For the second set, I used @Joe Kingston's suggestion.

def SumToOneRands(NumToSum, N):     aa = np.random.uniform(low=0,high=1.0,size=(NumToSum,N)) #13 rows by 1000000 columns, for instance     s = np.reciprocal(aa.sum(0))     aa *= s     return aa.T #get back to column major order, so aa[k] is the kth set of 13 numbers  >>> ll = SumToOneRands(13, N)

This takes ~1.6 seconds.

In all cases, result[k] gives you the kth set of data.

answered Oct 25 '22 01:10

mtrw

Related questions
                            
                                Is gzip compression useful for mobile devices?
                            
                                Python's enum equivalent [duplicate]
                            
                                Curl giving "Invalid UTF-8 JSON" error from CouchDb although JSON is fine? Any ideas?
                            
                                Database Engines and ANSI SQL Compliance
                            
                                Django: How to send HTML emails with embedded images
                            
                                Comparison-based ranking algorithm
                            
                                Unable to push commits from a git submodule?
                            
                                Form input field names containing square brackets like field[index]
                            
                                change eclipse shortcuts to match Visual Studio
                            
                                What does the b in front of string literals do?
                            
                                How do I debug Internet Explorer on Windows Phone 7?
                            
                                How to deal with interface overuse in TDD?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With