numpy float: 10x slower than builtin in arithmetic operations?

Tags:

I am getting really weird timings for the following code:

import numpy as np s = 0 for i in range(10000000):     s += np.float64(1) # replace with np.float32 and built-in float

built-in float: 4.9 s
float64: 10.5 s
float32: 45.0 s

Why is float64 twice slower than float? And why is float32 5 times slower than float64?

Is there any way to avoid the penalty of using np.float64, and have numpy functions return built-in float instead of float64?

I found that using numpy.float64 is much slower than Python's float, and numpy.float32 is even slower (even though I'm on a 32-bit machine).

numpy.float32 on my 32-bit machine. Therefore, every time I use various numpy functions such as numpy.random.uniform, I convert the result to float32 (so that further operations would be performed at 32-bit precision).

Is there any way to set a single variable somewhere in the program or in the command line, and make all numpy functions return float32 instead of float64?

EDIT #1:

numpy.float64 is 10 times slower than float in arithmetic calculations. It's so bad that even converting to float and back before the calculations makes the program run 3 times faster. Why? Is there anything I can do to fix it?

I want to emphasize that my timings are not due to any of the following:

the function calls
the conversion between numpy and python float
the creation of objects

I updated my code to make it clearer where the problem lies. With the new code, it would seem I see a ten-fold performance hit from using numpy data types:

from datetime import datetime import numpy as np  START_TIME = datetime.now()  # one of the following lines is uncommented before execution #s = np.float64(1) #s = np.float32(1) #s = 1.0  for i in range(10000000):     s = (s + 8) * s % 2399232  print(s) print('Runtime:', datetime.now() - START_TIME)

The timings are:

float64: 34.56s
float32: 35.11s
float: 3.53s

Just for the hell of it, I also tried:

from datetime import datetime import numpy as np

START_TIME = datetime.now()  s = np.float64(1) for i in range(10000000):     s = float(s)     s = (s + 8) * s % 2399232     s = np.float64(s)  print(s) print('Runtime:', datetime.now() - START_TIME)

The execution time is 13.28 s; it's actually 3 times faster to convert the float64 to float and back than to use it as is. Still, the conversion takes its toll, so overall it's more than 3 times slower compared to the pure-python float.

My machine is:

Intel Core 2 Duo T9300 (2.5GHz)
WinXP Professional (32-bit)
ActiveState Python 3.1.3.5
Numpy 1.5.1

EDIT #2:

Thank you for the answers, they help me understand how to deal with this problem.

But I still would like to know the precise reason (based on the source code perhaps) why the code below runs 10 times slow with float64 than with float.

EDIT #3:

I rerun the code under the Windows 7 x64 (Intel Core i7 930 @ 3.8GHz).

Again, the code is:

from datetime import datetime import numpy as np  START_TIME = datetime.now()  # one of the following lines is uncommented before execution #s = np.float64(1) #s = np.float32(1) #s = 1.0  for i in range(10000000):     s = (s + 8) * s % 2399232  print(s) print('Runtime:', datetime.now() - START_TIME)

The timings are:

float64: 16.1s
float32: 16.1s
float: 3.2s

Now both np floats (either 64 or 32) are 5 times slower than the built-in float. Still, a significant difference. I'm trying to figure out where it comes from.

END OF EDITS

901

asked May 10 '11 21:05

max

1 Answers

CPython floats are allocated in chunks

The key problem with comparing numpy scalar allocations to the float type is that CPython always allocates the memory for float and int objects in blocks of size N.

Internally, CPython maintains a linked list of blocks each large enough to hold N float objects. When you call float(1) CPython checks if there is space available in the current block; if not it allocates a new block. Once it has space in the current block it simply initializes that space and returns a pointer to it.

On my machine each block can hold 41 float objects, so there is some overhead for the first float(1) call but the next 40 run much faster as the memory is allocated and ready.

Slow numpy.float32 vs. numpy.float64

It appears that numpy has 2 paths it can take when creating a scalar type: fast and slow. This depends on whether the scalar type has a Python base class to which it can defer for argument conversion.

For some reason numpy.float32 is hard-coded to take the slower path (defined by the _WORK0 macro), while numpy.float64 gets a chance to take the faster path (defined by the _WORK1 macro). Note that scalartypes.c.src is a template which generates scalartypes.c at build time.

You can visualize this in Cachegrind. I've included screen captures showing how many more calls are made to construct a float32 vs. float64:

float64 takes the fast path

float64 takes the fast path

float32 takes the slow path

float32 takes the slow path

Updated - Which type takes the slow/fast path may depend on whether the OS is 32-bit vs 64-bit. On my test system, Ubuntu Lucid 64-bit, the float64 type is 10 times faster than float32.

179

answered Sep 19 '22 10:09

samplebias

Related questions
                            
                                How to handle a broken pipe (SIGPIPE) in python?
                            
                                How to unquote a urlencoded unicode string in python?
                            
                                Permission problems when creating a dir with os.makedirs in Python
                            
                                Python pandas: Add a column to my dataframe that counts a variable
                            
                                constants in Python: at the root of the module or in a namespace inside the module?
                            
                                Automatic version number both in setup.py (setuptools) AND source code?
                            
                                Replace value for a selected cell in pandas DataFrame without using index
                            
                                .pyw files in python program
                            
                                python, unittest: is there a way to pass command line options to the app
                            
                                Finding count of distinct elements in DataFrame in each column
                            
                                Generating random dates within a given range in pandas
                            
                                Python3 and hmac . How to handle string not being binary
                            
                                Python ImportError: No module named wx
                            
                                Why was p[:] designed to work differently in these two situations?
                            
                                How to calculate precision and recall in Keras
                            
                                How do I get a thread safe print in Python 2.6?
                            
                                Possible to return two lists from a list comprehension?
                            
                                cython issue: 'bool' is not a type identifier
                            
                                How to prevent python requests from percent encoding my URLs?
                            
                                NameError: name 'raw_input' is not defined [duplicate]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

numpy float: 10x slower than builtin in arithmetic operations?

Tags:

performance

python

floating-point

numpy

max

People also ask

1 Answers

samplebias

Recent Activity

Donate For Us