How to speed up numpy array-filling in python?

Tags:

I'm trying to fill a preallocated bytearray using the following code:

# preallocate a block array
dt = numpy.dtype('u8')
in_memory_blocks = numpy.zeros(_AVAIL_IN_MEMORY_BLOCKS, dt)

...

# write all the blocks out, flushing only as desired
blocks_per_flush_xrange = xrange(0, blocks_per_flush)
for _ in xrange(0, num_flushes):
    for block_index in blocks_per_flush_xrange:
        in_memory_blocks[block_index] = random.randint(0, _BLOCK_MAX)

    print('flushing bytes stored in memory...')

    # commented out for SO; exists in actual code
    # removing this doesn't make an order-of-magnitude difference in time
    # m.update(in_memory_blocks[:blocks_per_flush])

    in_memory_blocks[:blocks_per_flush].tofile(f)

Some points:

num_flushes is low, at around 4 - 10
blocks_per_flush is a large number, on the order of millions
in_memory_blocks can be a fairly large buffer (I've set it as low as 1MB and as high as 100MB) but the timing is very consitent...
_BLOCK_MAX is the max for an 8-byte unsigned int
m is a hashilib.md5()

Generating 1MB using the above code takes ~1s; 500MB takes ~376s. By comparison, my simple C program that uses rand() can create a 500MB file in 8s.

How can I improve the performance in the above loop? I'm pretty sure I'm ignoring something obvious that's causing this massive difference in runtime.

411

asked Apr 15 '11 22:04

2 Answers

Due to the fact that 0.._BLOCK_MAX covers all the possible values for numpy.uint8 (I assume that numpy.dtype('u8') (i.e., numpy.uint64 is a typo) you could use:

import numpy as np

for _ in xrange(0, num_flushes):
    in_memory_blocks = np.frombuffer(np.random.bytes(blocks_per_flush),
                                     dtype=np.uint8)

    print('flushing bytes stored in memory...')
    # ...

This variant ~8 times faster than @hgomersall's one:

$ python -mtimeit -s'import numpy as np' '
>     np.uint8(np.random.randint(0,256,20000000))'
10 loops, best of 3: 316 msec per loop

$ python -mtimeit -s'import numpy as np' '
>     np.frombuffer(np.random.bytes(20000000), dtype=np.uint8)'
10 loops, best of 3: 38.6 msec per loop

If numpy.dtype('u8') is not a typo and you indeed require numpy.uint64 then:

a = np.int64(np.random.random_integers(0, _BLOCK_MAX, blocks_per_flush))
in_memory_blocks = a.view(np.uint64) # unsigned

Note: np.int64() doesn't make a copy if the array's dtype is already np.int64. .view(numpy.uint64) forces its interpretation as unsigned (also no copy is performed).

131

answered Sep 22 '22 10:09

jfs

Since you're allocating contiguous blocks, you should be able to do the following (getting rid of the inner loop entirely):

for _ in xrange(0, num_flushes):
    in_memory_blocks[:blocks_per_flush] = numpy.random.randint(
            0, _BLOCK_MAX+1, blocks_per_flush)

    print('flushing bytes stored in memory...')

    # commented out for SO; exists in actual code
    # removing this doesn't make an order-of-magnitude difference in time
    # m.update(in_memory_blocks[:blocks_per_flush])

    in_memory_blocks[:blocks_per_flush].tofile(f)

This uses the numpy.random.randint function which allocates a whole block of memory and fills it with random integers (note J.F. Sebastian's comment below about numpy.random.randint versus random.randint). There's no way (as far as I can see) to fill a preallocated array using the numpy random routines. The other problem is that numpy's randint returns int64 arrays. If you need integers of some other size then you can use the numpy typing methods, for example numpy.uint8. If you want randints to cover the whole range of the type, then @J. F. Sebastian's method below using numpy.random.bytes is going to be the best (in almost any case!).

However, simple tests show reasonable times (of the same order of magnitude as the C code). The following code tests the time to allocate uint8 arrays of 20,000,000 random integers using the numpy method:

from timeit import Timer
t = Timer(stmt='a=numpy.uint8(numpy.random.randint(0, 100, 20000000))',
        setup='import numpy')
test_runs = 50
time = t.timeit(test_runs)/test_runs
print time

I get it taking about 0.7 seconds per allocation on my 4 year old Core2 laptop (it runs 50 times so it'll take longer to run the whole test). That's 0.7 s per allocation of 20,000,000 random uint8 integers, so I'd expect something around 20s for the whole 500MB.

More memory would mean you could allocate bigger chunks at once, but you're still effectively wasting time allocating and writing 64 bits for each int when you only need 8 (I haven't quantified this effect). If its still not fast enough, you could call your C implementation using the numpy ctypes interface. This is really rather easy to use and you'd get virtually no slowdown over pure C.

The general take home message is that with numpy, always try to use the numpy routines where they exist, remembering that falling back to C with ctypes is not too painful. In general, this methodology allows really quite effective use of python with very little slowdown for numerical processing.

Edit: Something else that just occurred to me: as its implemented above, I think you'd be making an additional unnecessary copy. If in_memory_blocks is of length blocks_per_flush, then you'd do better just to assign it the return from numpy.random.randint, rather than allocate it to a certain subarray (which in the general case must be a copy). So:

in_memory_blocks = numpy.random.randint(0, _BLOCK_MAX+1, blocks_per_flush)

rather than:

in_memory_blocks[:blocks_per_flush] = numpy.random.randint(
        0, _BLOCK_MAX+1, blocks_per_flush)

However, having timed this, the first case doesn't lead to a significant increase in speed (only about 2%), so its probably not worth worrying about too much. I guess the overwhelming amount of time is spent actually generating random numbers (which is what I would expect).

answered Sep 19 '22 10:09

Henry Gomersall

Related questions
                            
                                ImportError: No module named QtWebKit
                            
                                Which Dynamic Internationalisation system to use in Django projects?
                            
                                Reading a Turtle/N3 RDF File with Python
                            
                                Python, mechanize, proper syntax for setting multiple headers?
                            
                                Check that a function raises a warning with nose tests
                            
                                customize ticks for AxesImage?
                            
                                Issue with SqlAlchemy - "Parent instance <SomeClass> is not bound to a Session; lazy load operation..."
                            
                                Equality of Python classes using slots
                            
                                Python Interactive Interpreter always returns "Invalid syntax" on Windows
                            
                                Passing Command line argument to Python program using IDLE? [duplicate]
                            
                                Get pid of the process which has triggered some signal
                            
                                Python using 'with' to delete a file after use
                            
                                Using cPickle to serialize a large dictionary causes MemoryError
                            
                                Persistent ssh session to Cisco router
                            
                                Is this the best way to ensure that a python unicode "string" is encoded in utf-8?
                            
                                Django DateTimeField Defined as blank=True, null=True but doesn't allow null
                            
                                Python 3.2 idle : range function - print or list?
                            
                                Docx to pdf using openoffice headless way too slow
                            
                                How can locals() be exploited in python code?
                            
                                python: change printing of class [duplicate]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to speed up numpy array-filling in python?

Tags:

python

optimization

numpy

Allen George

People also ask

2 Answers

jfs

Henry Gomersall

Recent Activity

Donate For Us