Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Fast way to Hash Numpy objects for Caching

Tags:

Implementing a system where, when it comes to the heavy mathematical lifting, I want to do as little as possible.

I'm aware that there are issues with memoisation with numpy objects, and as such implemented a lazy-key cache to avoid the whole "Premature optimisation" argument.

def magic(numpyarg,intarg):     key = str(numpyarg)+str(intarg)      try:         ret = self._cache[key]         return ret     except:         pass      ... here be dragons ...     self._cache[key]=value     return value 

but since string conversion takes quite a while...

t=timeit.Timer("str(a)","import numpy;a=numpy.random.rand(10,10)") t.timeit(number=100000)/100000 = 0.00132s/call 

What do people suggest as being 'the better way' to do it?

like image 832
Bolster Avatar asked Mar 22 '11 04:03

Bolster


People also ask

How can I speed up my NumPy operation?

By explicitly declaring the "ndarray" data type, your array processing can be 1250x faster. This tutorial will show you how to speed up the processing of NumPy arrays using Cython. By explicitly specifying the data types of variables in Python, Cython can give drastic speed increases at runtime.

What is faster than NumPy?

pandas provides a bunch of C or Cython optimized functions that can be faster than the NumPy equivalent function (e.g. reading text from text files). If you want to do mathematical operations like a dot product, calculating mean, and some more, pandas DataFrames are generally going to be slower than a NumPy array.

Are NumPy arrays hashable?

Only immutable types are hashable while mutable types like NumPy arrays are not hashable because they could change and break the lookup based on the hashing algorithm.

Is NumPy slower than C++?

The Python code can't be faster than properly-coded C++ code since Numpy is coded in C, which is often slower than C++ since C++ can do more optimizations.


1 Answers

Borrowed from this answer... so really I guess this is a duplicate:

>>> import hashlib >>> import numpy >>> a = numpy.random.rand(10, 100) >>> b = a.view(numpy.uint8) >>> hashlib.sha1(b).hexdigest() '15c61fba5c969e5ed12cee619551881be908f11b' >>> t=timeit.Timer("hashlib.sha1(a.view(numpy.uint8)).hexdigest()",                     "import hashlib;import numpy;a=numpy.random.rand(10,10)")  >>> t.timeit(number=10000)/10000 2.5790500640869139e-05 
like image 78
senderle Avatar answered Oct 12 '22 21:10

senderle