Suppose we take <code>np.dot</code> of two <code>'float32'</code> 2D arrays: <pre class="prettyprint lang-py prettyprint-override"><code>res = np.dot(a, b) # see CASE 1 print(list(res[0])) # list shows more digits </code></pre> <pre class="prettyprint lang-py prettyprint-override"><code>[-0.90448684, -1.1708503, 0.907136, 3.5594249, 1.1374011, -1.3826287] </code></pre> Numbers. Except, they can change: <hr> CASE 1: slice <code>a</code> <pre class="prettyprint lang-py prettyprint-override"><code>np.random.seed(1) a = np.random.randn(9, 6).astype('float32') b = np.random.randn(6, 6).astype('float32') for i in range(1, len(a)): print(list(np.dot(a[:i], b)[0])) # full shape: (i, 6) </code></pre> <pre class="prettyprint"><code>[-0.9044868, -1.1708502, 0.90713596, 3.5594249, 1.1374012, -1.3826287] [-0.90448684, -1.1708503, 0.9071359, 3.5594249, 1.1374011, -1.3826288] [-0.90448684, -1.1708503, 0.9071359, 3.5594249, 1.1374011, -1.3826288] [-0.90448684, -1.1708503, 0.907136, 3.5594249, 1.1374011, -1.3826287] [-0.90448684, -1.1708503, 0.907136, 3.5594249, 1.1374011, -1.3826287] [-0.90448684, -1.1708503, 0.907136, 3.5594249, 1.1374011, -1.3826287] [-0.90448684, -1.1708503, 0.907136, 3.5594249, 1.1374011, -1.3826287] [-0.90448684, -1.1708503, 0.907136, 3.5594249, 1.1374011, -1.3826287] </code></pre> Results differ, even though the printed slice derives from the exact same numbers multiplied. <hr> CASE 2: flatten <code>a</code>, take a 1D version of <code>b</code>, then slice <code>a</code>: <pre class="prettyprint lang-py prettyprint-override"><code>np.random.seed(1) a = np.random.randn(9, 6).astype('float32') b = np.random.randn(1, 6).astype('float32') for i in range(1, len(a)): a_flat = np.expand_dims(a[:i].flatten(), -1) # keep 2D print(list(np.dot(a_flat, b)[0])) # full shape: (i*6, 6) </code></pre> <pre class="prettyprint lang-py prettyprint-override"><code>[-0.3393164, 0.9528787, 1.3627989, 1.5124314, 0.46389243, 1.437775] [-0.3393164, 0.9528787, 1.3627989, 1.5124314, 0.46389243, 1.437775] [-0.3393164, 0.9528787, 1.3627989, 1.5124314, 0.46389243, 1.437775] [-0.3393164, 0.9528787, 1.3627989, 1.5124314, 0.46389243, 1.437775] [-0.3393164, 0.9528787, 1.3627989, 1.5124314, 0.46389243, 1.437775] [-0.3393164, 0.9528787, 1.3627989, 1.5124314, 0.46389243, 1.437775] [-0.3393164, 0.9528787, 1.3627989, 1.5124314, 0.46389243, 1.437775] [-0.3393164, 0.9528787, 1.3627989, 1.5124314, 0.46389243, 1.437775] </code></pre> <hr> CASE 3: stronger control; set all non-involved entires to zero: add <code>a[1:] = 0</code> to CASE 1 code. Result: discrepancies persist. <hr> CASE 4: check indices other than <code>[0]</code>; like for <code>[0]</code>, results begin to stabilize a fixed # of array enlargements from their point of creation. Output <pre class="prettyprint lang-py prettyprint-override"><code>np.random.seed(1) a = np.random.randn(9, 6).astype('float32') b = np.random.randn(6, 6).astype('float32') for j in range(len(a) - 2): for i in range(1, len(a)): res = np.dot(a[:i], b) try: print(list(res[j])) except: pass print() </code></pre> <hr> Hence, for the 2D * 2D case, results differ - but are consistent for 1D * 1D. From some of my readings, this appears to stem from 1D-1D using simple addition, whereas 2D-2D uses 'fancier', performance-boosting addition that may be less precise (e.g. pairwise addition does the opposite). Nonetheless, I'm unable to understand why discrepancies vanish in case 1 once <code>a</code> is sliced past a set 'threshold'; the larger <code>a</code> and <code>b</code>, the later this threshold seems to lie, but it always exists. All said: why is <code>np.dot</code> imprecise (and inconsistent) for ND-ND arrays? Relevant Git <hr> Additional info: <ul> <li> Environment: Win-10 OS, Python 3.7.4, Spyder 3.3.6 IDE, Anaconda 3.0 2019/10</li> <li> CPU: i7-7700HQ 2.8 GHz </li> <li> Numpy v1.16.5</li> </ul> Possible culprit library: Numpy MKL - also BLASS libraries; thanks to Bi Rico for noting <hr> Stress-test code: as noted, discrepancies exacerbate in frequency w/ larger arrays; if above isn't reproducible, below should be (if not, try larger dims). My output <pre class="prettyprint lang-py prettyprint-override"><code>np.random.seed(1) a = (0.01*np.random.randn(9, 9999)).astype('float32') # first multiply then type-cast b = (0.01*np.random.randn(9999, 6)).astype('float32') # *0.01 to bound mults to < 1 for i in range(1, len(a)): print(list(np.dot(a[:i], b)[0])) </code></pre> <hr> Problem severity: shown discrepancies are 'small', but no longer so when operating on a neural network with billions of numbers multiplied over a few seconds, and trillions over the entire runtime; reported model accuracy differs by entire 10's of percents, per this thread. Below is a gif of arrays resulting from feeding to a model what's basically <code>a[0]</code>, w/ <code>len(a)==1</code> vs. <code>len(a)==32</code>: <img src="https://i.stack.imgur.com/nOqIG.gif" width="350"> <hr> OTHER PLATFORMS results, according and with thanks to Paul's testing: Case 1 reproduced (partly): <ul> <li>Google Colab VM -- Intel Xeon 2.3 G-Hz -- Jupyter -- Python 3.6.8</li> <li>Win-10 Pro Docker Desktop -- Intel i7-8700K -- jupyter/scipy-notebook -- Python 3.7.3</li> <li>Ubuntu 18.04.2 LTS + Docker -- AMD FX-8150 -- jupyter/scipy-notebook -- Python 3.7.3</li> </ul> Note: these yield much lower error than shown above; two entries on the first row are off by 1 in the least significant digit from corresponding entries in the other rows. Case 1 not reproduced: <ul> <li>Ubuntu 18.04.3 LTS -- Intel i7-8700K -- IPython 5.5.0 -- Python 2.7.15+ and 3.6.8 (2 tests) </li> <li>Ubuntu 18.04.3 LTS -- Intel i5-3320M -- IPython 5.5.0 -- Python 2.7.15+</li> <li>Ubuntu 18.04.2 LTS -- AMD FX-8150 -- IPython 5.5.0 -- Python 2.7.15rc1 </li> </ul> Notes: <ul> <li>The linked Colab notebook and jupyter environments show a far lesser discrepancy (and only for first two rows) than is observed on my system. Also, Case 2 never (yet) showed imprecision.</li> <li>Within this very limited sample, the current (Dockerized) Jupyter environment is more susceptible than the IPython environment. </li> <li> <code>np.show_config()</code> too long to post, but in summary: IPython envs are BLAS/LAPACK-based; Colab is OpenBLAS-based. In IPython Linux envs, BLAS libraries are system-installed -- in Jupyter and Colab, they come from /opt/conda/lib</li> </ul> <hr> UPDATE: the accepted answer is accurate, but broad and incomplete. The question remains open for anyone who can explain the behavior at the code level - namely, an exact algorithm used by <code>np.dot</code>, and how it explains 'consistent inconsistencies' observed in above results (also see comments). Here are some direct implementations beyond my deciphering: sdot.c -- arraytypes.c.src

This looks like unavoidable numeric imprecision. As explained here, NumPy uses a highly-optimized, carefully-tuned BLAS method for matrix multiplication. This means that probably the sequence of operations (sum and products) followed to multiply 2 matrices, changes when the size of the matrix changes. Trying to be clearer, we know that, mathematically, each element of the resulting matrix can be calculated as the dot product of two vectors (equal-length sequences of numbers). But this is not how NumPy calculates an element of the resulting matrix. Infact there are more efficient but complex algorithms, like the Strassen algorithm, that obtain the same result without computing directly the row-column dot product . When using such algorithms, even if the element C ij of a resulting matrix C = A B is mathematically defined as the dot product of the i-th row of A with the j-th column of B, if you multiply a matrix A2 having the same i-th row as A with a matrix B2 having the same j-th column as B, the element C2 ij will be actually computed following a different sequence of operations (that depends on the whole A2 and B2 matrices), possibly leading to different numerical errors. That's why, even if mathematically C ij = C2 ij (like in your CASE 1), the different sequence of operations followed by the algorithm in the calculations (due to change in matrix size) leads to different numerical errors. The numerical error explains also the slightly different results depending on the environment and the fact that, in some cases, for some environments, the numerical error might be absent.

Why is np.dot imprecise? (n-dim arrays)

Tags:

python

arrays

c

precision

numpy

Suppose we take np.dot of two 'float32' 2D arrays:

res = np.dot(a, b)   # see CASE 1
print(list(res[0]))  # list shows more digits

[-0.90448684, -1.1708503, 0.907136, 3.5594249, 1.1374011, -1.3826287]

Numbers. Except, they can change:

CASE 1: slice a

np.random.seed(1)
a = np.random.randn(9, 6).astype('float32')
b = np.random.randn(6, 6).astype('float32')

for i in range(1, len(a)):
    print(list(np.dot(a[:i], b)[0])) # full shape: (i, 6)

[-0.9044868,  -1.1708502, 0.90713596, 3.5594249, 1.1374012, -1.3826287]
[-0.90448684, -1.1708503, 0.9071359,  3.5594249, 1.1374011, -1.3826288]
[-0.90448684, -1.1708503, 0.9071359,  3.5594249, 1.1374011, -1.3826288]
[-0.90448684, -1.1708503, 0.907136,   3.5594249, 1.1374011, -1.3826287]
[-0.90448684, -1.1708503, 0.907136,   3.5594249, 1.1374011, -1.3826287]
[-0.90448684, -1.1708503, 0.907136,   3.5594249, 1.1374011, -1.3826287]
[-0.90448684, -1.1708503, 0.907136,   3.5594249, 1.1374011, -1.3826287]
[-0.90448684, -1.1708503, 0.907136,   3.5594249, 1.1374011, -1.3826287]

Results differ, even though the printed slice derives from the exact same numbers multiplied.

CASE 2: flatten a, take a 1D version of b, then slice a:

np.random.seed(1)
a = np.random.randn(9, 6).astype('float32')
b = np.random.randn(1, 6).astype('float32')

for i in range(1, len(a)):
    a_flat = np.expand_dims(a[:i].flatten(), -1) # keep 2D
    print(list(np.dot(a_flat, b)[0])) # full shape: (i*6, 6)

[-0.3393164, 0.9528787, 1.3627989, 1.5124314, 0.46389243, 1.437775]
[-0.3393164, 0.9528787, 1.3627989, 1.5124314, 0.46389243, 1.437775]
[-0.3393164, 0.9528787, 1.3627989, 1.5124314, 0.46389243, 1.437775]
[-0.3393164, 0.9528787, 1.3627989, 1.5124314, 0.46389243, 1.437775]
[-0.3393164, 0.9528787, 1.3627989, 1.5124314, 0.46389243, 1.437775]
[-0.3393164, 0.9528787, 1.3627989, 1.5124314, 0.46389243, 1.437775]
[-0.3393164, 0.9528787, 1.3627989, 1.5124314, 0.46389243, 1.437775]
[-0.3393164, 0.9528787, 1.3627989, 1.5124314, 0.46389243, 1.437775]

CASE 3: stronger control; set all non-involved entires to zero: add a[1:] = 0 to CASE 1 code. Result: discrepancies persist.

CASE 4: check indices other than [0]; like for [0], results begin to stabilize a fixed # of array enlargements from their point of creation. Output

np.random.seed(1)
a = np.random.randn(9, 6).astype('float32')
b = np.random.randn(6, 6).astype('float32')

for j in range(len(a) - 2):
    for i in range(1, len(a)):
        res = np.dot(a[:i], b)
        try:    print(list(res[j]))
        except: pass
    print()

Hence, for the 2D * 2D case, results differ - but are consistent for 1D * 1D. From some of my readings, this appears to stem from 1D-1D using simple addition, whereas 2D-2D uses 'fancier', performance-boosting addition that may be less precise (e.g. pairwise addition does the opposite). Nonetheless, I'm unable to understand why discrepancies vanish in case 1 once a is sliced past a set 'threshold'; the larger a and b, the later this threshold seems to lie, but it always exists.

All said: why is np.dot imprecise (and inconsistent) for ND-ND arrays? Relevant Git

Additional info:

Environment: Win-10 OS, Python 3.7.4, Spyder 3.3.6 IDE, Anaconda 3.0 2019/10
CPU: i7-7700HQ 2.8 GHz
Numpy v1.16.5

Possible culprit library: Numpy MKL - also BLASS libraries; thanks to Bi Rico for noting

Stress-test code: as noted, discrepancies exacerbate in frequency w/ larger arrays; if above isn't reproducible, below should be (if not, try larger dims). My output

np.random.seed(1)
a = (0.01*np.random.randn(9, 9999)).astype('float32') # first multiply then type-cast
b = (0.01*np.random.randn(9999, 6)).astype('float32') # *0.01 to bound mults to < 1

for i in range(1, len(a)):
    print(list(np.dot(a[:i], b)[0]))

Problem severity: shown discrepancies are 'small', but no longer so when operating on a neural network with billions of numbers multiplied over a few seconds, and trillions over the entire runtime; reported model accuracy differs by entire 10's of percents, per this thread.

Below is a gif of arrays resulting from feeding to a model what's basically a[0], w/ len(a)==1 vs. len(a)==32:

OTHER PLATFORMS results, according and with thanks to Paul's testing:

Case 1 reproduced (partly):

Google Colab VM -- Intel Xeon 2.3 G-Hz -- Jupyter -- Python 3.6.8
Win-10 Pro Docker Desktop -- Intel i7-8700K -- jupyter/scipy-notebook -- Python 3.7.3
Ubuntu 18.04.2 LTS + Docker -- AMD FX-8150 -- jupyter/scipy-notebook -- Python 3.7.3

Note: these yield much lower error than shown above; two entries on the first row are off by 1 in the least significant digit from corresponding entries in the other rows.

Case 1 not reproduced:

Ubuntu 18.04.3 LTS -- Intel i7-8700K -- IPython 5.5.0 -- Python 2.7.15+ and 3.6.8 (2 tests)
Ubuntu 18.04.3 LTS -- Intel i5-3320M -- IPython 5.5.0 -- Python 2.7.15+
Ubuntu 18.04.2 LTS -- AMD FX-8150 -- IPython 5.5.0 -- Python 2.7.15rc1

Notes:

The linked Colab notebook and jupyter environments show a far lesser discrepancy (and only for first two rows) than is observed on my system. Also, Case 2 never (yet) showed imprecision.
Within this very limited sample, the current (Dockerized) Jupyter environment is more susceptible than the IPython environment.
np.show_config() too long to post, but in summary: IPython envs are BLAS/LAPACK-based; Colab is OpenBLAS-based. In IPython Linux envs, BLAS libraries are system-installed -- in Jupyter and Colab, they come from /opt/conda/lib

UPDATE: the accepted answer is accurate, but broad and incomplete. The question remains open for anyone who can explain the behavior at the code level - namely, an exact algorithm used by np.dot, and how it explains 'consistent inconsistencies' observed in above results (also see comments). Here are some direct implementations beyond my deciphering: sdot.c -- arraytypes.c.src

209

asked Nov 07 '19 02:11

OverLordGoldDragon

1 Answers

This looks like unavoidable numeric imprecision. As explained here, NumPy uses a highly-optimized, carefully-tuned BLAS method for matrix multiplication. This means that probably the sequence of operations (sum and products) followed to multiply 2 matrices, changes when the size of the matrix changes.

Trying to be clearer, we know that, mathematically, each element of the resulting matrix can be calculated as the dot product of two vectors (equal-length sequences of numbers). But this is not how NumPy calculates an element of the resulting matrix. Infact there are more efficient but complex algorithms, like the Strassen algorithm, that obtain the same result without computing directly the row-column dot product .

When using such algorithms, even if the element C ij of a resulting matrix C = A B is mathematically defined as the dot product of the i-th row of A with the j-th column of B, if you multiply a matrix A2 having the same i-th row as A with a matrix B2 having the same j-th column as B, the element C2 ij will be actually computed following a different sequence of operations (that depends on the whole A2 and B2 matrices), possibly leading to different numerical errors.

That's why, even if mathematically C ij = C2 ij (like in your CASE 1), the different sequence of operations followed by the algorithm in the calculations (due to change in matrix size) leads to different numerical errors. The numerical error explains also the slightly different results depending on the environment and the fact that, in some cases, for some environments, the numerical error might be absent.

answered Sep 20 '22 12:09

mmj

Related questions
                            
                                Evaluating a mathematical expression (function) for a large number of input values fast
                            
                                Python Requests: Invalid Header Name
                            
                                Grid Search with Recursive Feature Elimination in scikit-learn pipeline returns an error
                            
                                SparkStreaming, RabbitMQ and MQTT in python using pika
                            
                                Uploading Large files to AWS S3 Bucket with Django on Heroku without 30s request timeout
                            
                                Jupyter conda tab 'An error occurred while retrieving package information.'
                            
                                Fastest way to "grep" big files
                            
                                Error: No matching distribution found for pip
                            
                                How to extract unsupervised clusters from a Dirichlet Process in PyMC3?
                            
                                What exceptions could be returned from Pandas read_sql()
                            
                                How would I achieve this in opencv with an affine transform?
                            
                                pandas Series getting 'Data must be 1-dimensional' error
                            
                                python type hint, return same type as input
                            
                                Broadcasted NumPy arithmetic - why is one method so much more performant?
                            
                                What are the valid values for --platform, --abi, and --implementation for pip download?
                            
                                Numpy import fails on multiarray extension library when called from embedded Python within a C++ application
                            
                                Django m2m_changed signal is never called
                            
                                Do a dry-run of an Alembic upgrade
                            
                                Opencv: Crop out text areas from license
                            
                                How can I minimize/maximize windows in macOS with the Cocoa API from a Python script?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With