Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python matrix provide with numpy.dot()

During my acquaintance with CUDA in Python (numba lib), I implemented matrix provide methods:

  • Just with numpy.dot()
  • Strassen algorithm with numpy.dot()
  • Blocks method on GPU
  • Strassen algorithm on GPU

So I tested it on 2 types of data:

  • numpy.random.randint(0, 5, (N, N)) # with int32 elements
  • numpy.random.random((N, N)) # with float64 elements

For int32 i obtained expected result, where my GPU algroithms performed better than CPU with numpy: enter image description here

However, on float64 type, numpy.dot() outperformed all my GPU methods: enter image description here

So, question is: Why is numpy.dot() so fast with float64 arrays, and does numpy use the GPU?

like image 257
Mikhail Avatar asked Apr 29 '15 11:04

Mikhail


People also ask

What does dot () do in Python?

dot() in Python. The numpy module of Python provides a function to perform the dot product of two arrays. If both the arrays 'a' and 'b' are 1-dimensional arrays, the dot() function performs the inner product of vectors (without complex conjugation).

What is numpy NP dot?

numpy.dot(vector_a, vector_b, out = None) returns the dot product of vectors a and b. It can handle 2D arrays but considers them as matrix and will perform matrix multiplication. For N dimensions it is a sum-product over the last axis of a and the second-to-last of b : dot(a, b)[i,j,k,m] = sum(a[i,j,:] * b[k,:,m])

Is numpy dot matrix multiplication?

dot() will compute the dot product of the inputs. If both inputs are 2-dimensional arrays, then np. dot() will perform matrix multiplication.


1 Answers

A typical installation of numpy will be dynamically linked against a BLAS library, which provides routines for matrix-matrix and matrix-vector multiplication. For example, when you use np.dot() on a pair of float64 arrays, numpy will call the BLAS dgemm routine in the background. Although these library functions run on the CPU rather than the GPU, they are often multithreaded, and are very finely tuned for performance. A good BLAS implementation, such as MKL or OpenBLAS, will probably be hard to beat in terms of performance, even on the GPU*.

However, BLAS only supports floating point types. If you call np.dot() on integer arrays, numpy will fall back on using a very simple internal C++ implementation, which is single-threaded and much slower than a BLAS dot on two floating point arrays.

Without knowing more about how you conducted those benchmarks, I would bet that a plain call to numpy.dot would also comfortably beat your other 3 methods for float32, complex64 and complex128 arrays, which are the other 3 types supported by BLAS.


* One possible way to beat standard BLAS would be to use cuBLAS, which is a BLAS implementation that will run on an NVIDIA GPU. The scikit-cuda library seems to provide Python bindings for it, although I've never used it myself.

like image 173
ali_m Avatar answered Oct 07 '22 14:10

ali_m