Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Performance of np.empty, np.zeros and np.ones

I was curious about how much difference it really made to use np.empty instead of np.zeros, and also about the difference with respect to np.ones. I run this small script to benchmark the time it took for each of these to create a large array:

import numpy as np
from timeit import timeit

N = 10_000_000
dtypes = [np.int8, np.int16, np.int32, np.int64,
          np.uint8, np.uint16, np.uint32, np.uint64,
          np.float16, np.float32, np.float64]
rep= 100
print(f'{"DType":8s} {"Empty":>10s} {"Zeros":>10s} {"Ones":>10s}')
for dtype in dtypes:
    name = dtype.__name__
    time_empty = timeit(lambda: np.empty(N, dtype=dtype), number=rep) / rep
    time_zeros = timeit(lambda: np.zeros(N, dtype=dtype), number=rep) / rep
    time_ones = timeit(lambda: np.ones(N, dtype=dtype), number=rep) / rep
    print(f'{name:8s} {time_empty:10.2e} {time_zeros:10.2e} {time_ones:10.2e}')

And obtained the following table as a result:

DType         Empty      Zeros       Ones
int8       1.39e-04   1.76e-04   5.27e-03
int16      3.72e-04   3.59e-04   1.09e-02
int32      5.85e-04   5.81e-04   2.16e-02
int64      1.28e-03   1.13e-03   3.98e-02
uint8      1.66e-04   1.62e-04   5.22e-03
uint16     2.79e-04   2.82e-04   9.49e-03
uint32     5.65e-04   5.20e-04   1.99e-02
uint64     1.16e-03   1.24e-03   4.18e-02
float16    3.21e-04   2.95e-04   1.06e-02
float32    6.31e-04   6.06e-04   2.32e-02
float64    1.18e-03   1.16e-03   4.85e-02

From this I extract two somewhat surprising conclusions:

  • There is virtually no difference between the performance of np.empty and np.zeros, maybe excepting some difference for int8. I don't understand why this is the case. Creating an empty array is supposed to be faster, and actually I have seen reports of that (e.g. Speed of np.empty vs np.zeros).
  • There is a great difference between np.zeros and np.ones. I suspect this has to do with high-performance means for memory zeroing that do not apply to filling a memory area with a constant, but I don't really know how or at what level that works.

What is the explanation for these results?

I am using NumPy 1.15.4 and Python 3.6 Anaconda on Windows 10 (with MKL), and I have a Intel Core i7-7700K CPU.

EDIT: As per a suggestion in the comments, I tried running the benchmark interleaving each individual trial and averaging at the end, but I couldn't see a significant difference in the results. On a related note, though, I don't know if there are any mechanisms in NumPy to reuse the memory of a just deleted array, which would make the measures unrealistic (although the times do seem to go up with the data type size even for empty arrays).

like image 997
jdehesa Avatar asked Oct 27 '22 17:10

jdehesa


1 Answers

This should really be a comment but it won't fit. Here is a small extension of your script. With some "hand-made" versions of zeros and ones.

import numpy as np
from timeit import timeit

N = 10_000_000
dtypes = [np.int8, np.int16, np.int32, np.int64,
          np.uint8, np.uint16, np.uint32, np.uint64,
          np.float16, np.float32, np.float64]
rep= 100
print(f'{"DType":8s} {"Empty":>10s} {"Zeros":>10s} {"Ones":>10s}')
for dtype in dtypes:
    name = dtype.__name__
    time_empty = timeit(lambda: np.empty(N, dtype=dtype), number=rep) / rep
    time_zeros = timeit(lambda: np.zeros(N, dtype=dtype), number=rep) / rep
    time_ones = timeit(lambda: np.ones(N, dtype=dtype), number=rep) / rep
    time_full_zeros = timeit(lambda: np.full(N, 0, dtype=dtype), number=rep) / rep
    time_full_ones = timeit(lambda: np.full(N, 1, dtype=dtype), number=rep) / rep
    time_empty_zeros = timeit(lambda: np.copyto(np.empty(N, dtype=dtype), 0), number=rep) / rep
    time_empty_ones = timeit(lambda: np.copyto(np.empty(N, dtype=dtype), 1), number=rep) / rep
    print(f'{name:8s} {time_empty:10.2e} {time_zeros:10.2e} {time_ones:10.2e} {time_full_zeros:10.2e} {time_full_ones:10.2e}  {time_empty_zeros:10.2e} {time_empty_ones:10.2e} ')

The timings are suggestive.

DType         Empty      Zeros       Ones
int8       1.37e-06   6.33e-04   5.73e-04   5.76e-04   5.73e-04    6.05e-04   5.82e-04 
int16      1.61e-06   1.55e-03   3.54e-03   3.54e-03   3.56e-03    3.54e-03   3.54e-03 
int32      7.22e-06   6.99e-06   1.24e-02   1.20e-02   1.25e-02    1.19e-02   1.21e-02 
int64      8.26e-06   8.06e-06   2.62e-02   2.64e-02   2.61e-02    2.62e-02   2.62e-02 
uint8      1.32e-06   6.30e-04   5.85e-04   5.86e-04   5.77e-04    5.70e-04   5.83e-04 
uint16     1.32e-06   1.63e-03   3.61e-03   3.65e-03   4.08e-03    4.08e-03   3.58e-03 
uint32     7.08e-06   7.20e-06   1.48e-02   1.41e-02   1.63e-02    1.44e-02   1.32e-02 
uint64     7.14e-06   7.13e-06   2.69e-02   2.67e-02   2.82e-02    2.68e-02   2.72e-02 
float16    1.31e-06   1.55e-03   3.56e-03   3.79e-03   3.54e-03    3.53e-03   3.55e-03 
float32    7.11e-06   6.95e-06   1.36e-02   1.35e-02   1.37e-02    1.35e-02   1.37e-02 
float64    7.27e-06   7.33e-06   3.13e-02   3.00e-02   2.75e-02    2.80e-02   2.75e-02 

Re zeros being faster than ones I seem to remember that as suggested in the comments zeros indeed uses calloc which being a system routine with the sole purpose of allocating blocks of zeros is probably good at that.

like image 70
Paul Panzer Avatar answered Nov 15 '22 07:11

Paul Panzer