Suppose we take np.dot
of two 'float32'
2D arrays:
res = np.dot(a, b) # see CASE 1
print(list(res[0])) # list shows more digits
[-0.90448684, -1.1708503, 0.907136, 3.5594249, 1.1374011, -1.3826287]
Numbers. Except, they can change:
CASE 1: slice a
np.random.seed(1)
a = np.random.randn(9, 6).astype('float32')
b = np.random.randn(6, 6).astype('float32')
for i in range(1, len(a)):
print(list(np.dot(a[:i], b)[0])) # full shape: (i, 6)
[-0.9044868, -1.1708502, 0.90713596, 3.5594249, 1.1374012, -1.3826287]
[-0.90448684, -1.1708503, 0.9071359, 3.5594249, 1.1374011, -1.3826288]
[-0.90448684, -1.1708503, 0.9071359, 3.5594249, 1.1374011, -1.3826288]
[-0.90448684, -1.1708503, 0.907136, 3.5594249, 1.1374011, -1.3826287]
[-0.90448684, -1.1708503, 0.907136, 3.5594249, 1.1374011, -1.3826287]
[-0.90448684, -1.1708503, 0.907136, 3.5594249, 1.1374011, -1.3826287]
[-0.90448684, -1.1708503, 0.907136, 3.5594249, 1.1374011, -1.3826287]
[-0.90448684, -1.1708503, 0.907136, 3.5594249, 1.1374011, -1.3826287]
Results differ, even though the printed slice derives from the exact same numbers multiplied.
a
, take a 1D version of b
, then slice a
:
np.random.seed(1)
a = np.random.randn(9, 6).astype('float32')
b = np.random.randn(1, 6).astype('float32')
for i in range(1, len(a)):
a_flat = np.expand_dims(a[:i].flatten(), -1) # keep 2D
print(list(np.dot(a_flat, b)[0])) # full shape: (i*6, 6)
[-0.3393164, 0.9528787, 1.3627989, 1.5124314, 0.46389243, 1.437775]
[-0.3393164, 0.9528787, 1.3627989, 1.5124314, 0.46389243, 1.437775]
[-0.3393164, 0.9528787, 1.3627989, 1.5124314, 0.46389243, 1.437775]
[-0.3393164, 0.9528787, 1.3627989, 1.5124314, 0.46389243, 1.437775]
[-0.3393164, 0.9528787, 1.3627989, 1.5124314, 0.46389243, 1.437775]
[-0.3393164, 0.9528787, 1.3627989, 1.5124314, 0.46389243, 1.437775]
[-0.3393164, 0.9528787, 1.3627989, 1.5124314, 0.46389243, 1.437775]
[-0.3393164, 0.9528787, 1.3627989, 1.5124314, 0.46389243, 1.437775]
CASE 3: stronger control; set all non-involved entires to zero: add a[1:] = 0
to CASE 1 code. Result: discrepancies persist.
CASE 4: check indices other than [0]
; like for [0]
, results begin to stabilize a fixed # of array enlargements from their point of creation. Output
np.random.seed(1)
a = np.random.randn(9, 6).astype('float32')
b = np.random.randn(6, 6).astype('float32')
for j in range(len(a) - 2):
for i in range(1, len(a)):
res = np.dot(a[:i], b)
try: print(list(res[j]))
except: pass
print()
Hence, for the 2D * 2D case, results differ - but are consistent for 1D * 1D. From some of my readings, this appears to stem from 1D-1D using simple addition, whereas 2D-2D uses 'fancier', performance-boosting addition that may be less precise (e.g. pairwise addition does the opposite). Nonetheless, I'm unable to understand why discrepancies vanish in case 1 once a
is sliced past a set 'threshold'; the larger a
and b
, the later this threshold seems to lie, but it always exists.
All said: why is np.dot
imprecise (and inconsistent) for ND-ND arrays? Relevant Git
Additional info:
Possible culprit library: Numpy MKL - also BLASS libraries; thanks to Bi Rico for noting
Stress-test code: as noted, discrepancies exacerbate in frequency w/ larger arrays; if above isn't reproducible, below should be (if not, try larger dims). My output
np.random.seed(1)
a = (0.01*np.random.randn(9, 9999)).astype('float32') # first multiply then type-cast
b = (0.01*np.random.randn(9999, 6)).astype('float32') # *0.01 to bound mults to < 1
for i in range(1, len(a)):
print(list(np.dot(a[:i], b)[0]))
Problem severity: shown discrepancies are 'small', but no longer so when operating on a neural network with billions of numbers multiplied over a few seconds, and trillions over the entire runtime; reported model accuracy differs by entire 10's of percents, per this thread.
Below is a gif of arrays resulting from feeding to a model what's basically a[0]
, w/ len(a)==1
vs. len(a)==32
:
OTHER PLATFORMS results, according and with thanks to Paul's testing:
Case 1 reproduced (partly):
Note: these yield much lower error than shown above; two entries on the first row are off by 1 in the least significant digit from corresponding entries in the other rows.
Case 1 not reproduced:
Notes:
np.show_config()
too long to post, but in summary: IPython envs are BLAS/LAPACK-based; Colab is OpenBLAS-based. In IPython Linux envs, BLAS libraries are system-installed -- in Jupyter and Colab, they come from /opt/conda/libUPDATE: the accepted answer is accurate, but broad and incomplete. The question remains open for anyone who can explain the behavior at the code level - namely, an exact algorithm used by np.dot
, and how it explains 'consistent inconsistencies' observed in above results (also see comments). Here are some direct implementations beyond my deciphering: sdot.c -- arraytypes.c.src
The matmul() function broadcasts the array like a stack of matrices as elements residing in the last two indexes, respectively. The numpy. dot() function, on the other hand, performs multiplication as the sum of products over the last axis of the first array and the second-to-last of the second.
When multiplication of two similar matrices 1*2 like [1,2] , [3,5] is carried out using numpy. dot , it gives a result, when in fact it should be giving a shape and dimension error like while multiplying two similar arrays. What is going on under the hood? To be picky, both of your inputs are lists.
Because np. dot executes the actual arithmetic operations and the enclosing loop in compiled code, which is much faster than the Python interpreter.
dot(a, b, out=None) Dot product of two arrays. Specifically, If both a and b are 1-D arrays, it is inner product of vectors (without complex conjugation). If both a and b are 2-D arrays, it is matrix multiplication, but using matmul or a @ b is preferred.
This looks like unavoidable numeric imprecision. As explained here, NumPy uses a highly-optimized, carefully-tuned BLAS method for matrix multiplication. This means that probably the sequence of operations (sum and products) followed to multiply 2 matrices, changes when the size of the matrix changes.
Trying to be clearer, we know that, mathematically, each element of the resulting matrix can be calculated as the dot product of two vectors (equal-length sequences of numbers). But this is not how NumPy calculates an element of the resulting matrix. Infact there are more efficient but complex algorithms, like the Strassen algorithm, that obtain the same result without computing directly the row-column dot product .
When using such algorithms, even if the element C ij of a resulting matrix C = A B is mathematically defined as the dot product of the i-th row of A with the j-th column of B, if you multiply a matrix A2 having the same i-th row as A with a matrix B2 having the same j-th column as B, the element C2 ij will be actually computed following a different sequence of operations (that depends on the whole A2 and B2 matrices), possibly leading to different numerical errors.
That's why, even if mathematically C ij = C2 ij (like in your CASE 1), the different sequence of operations followed by the algorithm in the calculations (due to change in matrix size) leads to different numerical errors. The numerical error explains also the slightly different results depending on the environment and the fact that, in some cases, for some environments, the numerical error might be absent.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With