I am using Theano/NumPy doing some deep learning stuff. I found a very annoying problem. I got a weights matrix A(suppose to be 50*2048), and a feature vector b(2048 dim).
A is initialized using
self.alpha = np.random.random((50, 2048)).astype(np.float32) * 2 - 1.0
b is a 2048 dim numpy.ndarrary from theano.
The problem is
X = numpy.dot(A, b)
Y = [numpy.dot(A[i], b) for i in xrange(50)]
Some rows of X and Y are not strictly equal. I compared them and found that the difference is in 1e-6 to 1e-7.
Currently I prefer to use the second to computed the dot product since it seems that it can learn better weights. But the first is much faster. So I'm wondering why there is such a big difference. Is it caused by different implementations of dot(matrix, vector) and dot(vector, vector)? Thanks a lot!
--edit As uhoh mentioned, this is the code that you can reproduce it.
import numpy as np
test_time = 1000
vector_size = 100
matrix_size = (100, 100)
for i in xrange(test_time):
a = np.random.random(matrix_size).astype(np.float32) * 2 - 1.0
b = np.random.random(vector_size).astype(np.float32)
x = np.dot(a, b)
y = [np.dot(a[i], b) for i in xrange(a.shape[0])]
for k in xrange(len(y)):
epsilon = x[k] - y[k]
if abs(epsilon) > 1e-7:
print('Diff: {0}\t{1}\t{2}'.format(x[k], y[k], epsilon))
matmul differs from dot in two important ways. Multiplication by scalars is not allowed. Stacks of matrices are broadcast together as if the matrices were elements.
Numpy with PythonThis function returns the dot product of two arrays. For 2-D vectors, it is the equivalent to matrix multiplication. For 1-D arrays, it is the inner product of the vectors. For N-dimensional arrays, it is a sum product over the last axis of a and the second-last axis of b.
Because the Numpy array is densely packed in memory due to its homogeneous type, it also frees the memory faster. So overall a task executed in Numpy is around 5 to 100 times faster than the standard python list, which is a significant leap in terms of speed.
Because np. dot executes the actual arithmetic operations and the enclosing loop in compiled code, which is much faster than the Python interpreter.
Well there is usually a trade-off between performance and precision. You may have to compensate one in favor or the other. Although I personally do not believe a difference of 0.0000001 is a big deal in most applications. If you seek higher precision you'd better go with float64
, but note that float64
operations are extremely slow on the GPUs, especially NVIDIA 9xx series GPUs.
I may note that the mentioned issue seems to depend on your hardware settings too cause I do not encounter such problem on my machine.
You may also use np.allclose(x, y)
to see if the difference is tangible.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With