How to quickly fill a numpy array with values from separate calls to a function

Q: How do I fill an empty NumPy array?

If we have an array and want to append rows to it inside a loop, we can easily use the np. empty() function. Now, we can then append new rows to the empty array with numpy. append() function.

Q: Is appending to NumPy array faster than list?

array(a) . List append is faster than array append .

Q: How do you fill an existing array in python?

You can fill an existing array with a specific value using numpy. fill() . Alternatively, you can initialize a new array with a specific value using numpy. full() .

Tags:

I want to fill a numpy array with generated values. These values are generated by a generator function. The array length is not too long, <100 usually, but this array is generated many times, so I wanted to know if it can be optimized with some fancy usage of numpy.

So far I can already do it with vanilla python:

def generate():
   return generated_data

array = np.asarray([generate() for _ in range(array_length)])

I've also tried to use np.full(shape, fill_value):

np.full((array_length, generated_data_size), generate())

But this is calls the generate() function only once, not once for every index in the array.

I've also tried np.vectorize(), but I couldn't make it generate a appropriately shaped array.

291

asked Apr 11 '19 10:04

Maxis

1 Answers

There is nothing NumPy can do to accelerate the process of repeatedly calling a function not designed to interact with NumPy.

The "fancy usage of numpy" way to optimize this is to manually rewrite your generate function to use NumPy operations to generate entire arrays of output instead of only supporting single values. That's how NumPy works, and how NumPy has to work; any solution that involves calling a Python function over and over again for every array cell is going to be limited by Python overhead. NumPy can only accelerate work that actually happens in NumPy.

If NumPy's provided operations are too limited to rewrite generate in terms of them, there are options like rewriting generate with Cython, or using @numba.jit on it. These mostly help with computations that involve complex dependencies from one loop iteration to the next; they don't help with external dependencies you can't rewrite.

If you cannot rewrite generate, all you can do is try to optimize the process of getting the return values into your array. Depending on array size, you may be able to save some time by reusing a single array object:

In [32]: %timeit x = numpy.array([random.random() for _ in range(10)])
The slowest run took 5.13 times longer than the fastest. This could mean that an
 intermediate result is being cached.
100000 loops, best of 5: 5.44 µs per loop
In [33]: %%timeit x = numpy.empty(10)
   ....: for i in range(10):
   ....:     x[i] = random.random()
   ....: 
The slowest run took 4.26 times longer than the fastest. This could mean that an
 intermediate result is being cached.
100000 loops, best of 5: 2.88 µs per loop

but the benefit vanishes for larger arrays:

In [34]: %timeit x = numpy.array([random.random() for _ in range(100)])
10000 loops, best of 5: 21.9 µs per loop
In [35]: %%timeit x = numpy.empty(100)
   ....: for i in range(100):
   ....:     x[i] = random.random()
   ....: 
10000 loops, best of 5: 22.8 µs per loop

answered Oct 18 '22 21:10

user2357112 supports Monica

Related questions
                            
                                Pandas index title in line with column headers
                            
                                How to save estimator in Tensorflow for later use?
                            
                                Keep only alphabetic characters (multilingual) in a string
                            
                                Python loading old version of sklearn
                            
                                PyPDF2 write doesn't work on some PDF files (Python 3.5.1)
                            
                                Javascript - parse string to long
                            
                                convert PIL Image object to File object
                            
                                Python - Manipulate and read browser from current browser
                            
                                Filtering Outliers - how to make median-based Hampel Function faster?
                            
                                pyspark add new row to dataframe
                            
                                How does one scrape all the products from a random website?
                            
                                Downloading multiple S3 objects in parallel in Python
                            
                                How to fix the error "failed to parse date field " in Elasticsearch
                            
                                Compile Python code to statically linked executable with Cython
                            
                                Increasing pie chart size with matplotlib, radius parameter appears to do nothing
                            
                                How to set up HTTPHandler for python logging
                            
                                What is the resizing factor of lists in Python
                            
                                Why Pandas gives AttributeError: 'SeriesGroupBy' object has no attribute 'pct'?
                            
                                Aligning x-axis with sharex using subplots and colorbar with matplotlib
                            
                                torch.nn.sequential vs. combination of multiple torch.nn.linear [duplicate]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to quickly fill a numpy array with values from separate calls to a function

Tags:

python

python-3.x

numpy

Maxis

People also ask

1 Answers

user2357112 supports Monica

Recent Activity

Donate For Us