I write below code to test cache feature of numba
import numba
import numpy as np
import time
@numba.njit(cache=True)
def sum2d(arr):
M, N = arr.shape
result = 0.0
for i in range(M):
for j in range(N):
result += arr[i,j]
return result
a=np.random.random((1000,100))
print(time.time())
sum2d(a)
print(time.time())
print(time.time())
sum2d(a)
print(time.time())
Though, there are some cache files generated in pycache folder, the timing is always the same like
1576855294.8787484
1576855295.5378428
1576855295.5378428
1576855295.5388253
no matter how many times I run this script, which means that first run of sum2d takes much more time to compile. Then what is usage of cache file in pycache folder?
The following script illustrates the point of cache=True. It first calls a non-cached dummy function that absorbs the time it takes to initialize numba. Then it proceeds with calling twice the sum2d function with no cache and twice the sum2d function with cache.
import numba
import numpy as np
import time
@numba.njit
def dummy():
return None
@numba.njit
def sum2d_nocache(arr):
M, N = arr.shape
result = 0.0
for i in range(M):
for j in range(N):
result += arr[i,j]
return result
@numba.njit(cache=True)
def sum2d_cache(arr):
M, N = arr.shape
result = 0.0
for i in range(M):
for j in range(N):
result += arr[i,j]
return result
start = time.time()
dummy()
end = time.time()
print(f'Dummy timing {end - start}')
a=np.random.random((1000,100))
start = time.time()
sum2d_nocache(a)
end = time.time()
print(f'No cache 1st timing {end - start}')
a=np.random.random((1000,100))
start = time.time()
sum2d_nocache(a)
end = time.time()
print(f'No cache 2nd timing {end - start}')
a=np.random.random((1000,100))
start = time.time()
sum2d_cache(a)
end = time.time()
print(f'Cache 1st timing {end - start}')
a=np.random.random((1000,100))
start = time.time()
sum2d_cache(a)
end = time.time()
print(f'Cache 2nd timing {end - start}')
Output after 1st run:
Dummy timing 0.10361385345458984
No cache 1st timing 0.08893513679504395
No cache 2nd timing 0.00020122528076171875
Cache 1st timing 0.08929300308227539
Cache 2nd timing 0.00015544891357421875
Output after 2nd run:
Dummy timing 0.08973526954650879
No cache 1st timing 0.0809786319732666
No cache 2nd timing 0.0001163482666015625
Cache 1st timing 0.0016787052154541016
Cache 2nd timing 0.0001163482666015625
What does this output tells us?
numba is not negligible.cache=True is for)The point of using cache=True is to avoid repeating the compile time of large and complex functions at each run of a script. In this example the function is simple and the time saving is limited but for a script with a number of more complex functions, using cache can significantly reduce the run-time.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With