Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Improve performance of converting numpy array to MATLAB double

Calling MATLAB from Python is bound to give some performance reduction that I could avoid by rewriting (a lot of) code in Python. However, this isn't a realistic option for me, but it annoys me that a huge loss of efficiency lies in the simple conversion from a numpy array to a MATLAB double.

I'm talking about the following conversion from data1 to data1m, where

data1 = np.random.uniform(low = 0.0, high = 30000.0, size = (1000000,))
data1m = matlab.double(list(data1))

Here matlab.double comes from Mathworks own MATLAB package / engine. The second line of code takes 20 s on my system, which just seems like too much for a conversion that doesn't really do anything other than making the numbers 'edible' for MATLAB.

So basically I'm looking for a trick opposite to the one given here that works for converting MATLAB output back to Python.

like image 753
5Ke Avatar asked Jul 24 '17 15:07

5Ke


People also ask

Which is faster Matlab or NumPy?

The time matlab takes to complete the task is 0.252454 seconds while numpy 0.973672151566, that is almost four times more.

How can I make NumPy faster?

By explicitly declaring the "ndarray" data type, your array processing can be 1250x faster. This tutorial will show you how to speed up the processing of NumPy arrays using Cython. By explicitly specifying the data types of variables in Python, Cython can give drastic speed increases at runtime.

Is appending to NumPy array faster than list?

NumPy Arrays Are NOT Always Faster Than Lists " append() " adds values to the end of both lists and NumPy arrays.

What is faster than NumPy?

pandas provides a bunch of C or Cython optimized functions that can be faster than the NumPy equivalent function (e.g. reading text from text files). If you want to do mathematical operations like a dot product, calculating mean, and some more, pandas DataFrames are generally going to be slower than a NumPy array.


1 Answers

Passing numpy arrays efficiently

Take a look at the file mlarray_sequence.py in the folder PYTHONPATH\Lib\site-packages\matlab\_internal. There you will find the construction of the MATLAB array object. The performance problem comes from copying data with loops within the generic_flattening function.

To avoid this behavior we will edit the file a bit. This fix should work on complex and non-complex datatypes.

  1. Make a backup of the original file in case something goes wrong.

  2. Add import numpy as np to the other imports at the beginning of the file

  3. In line 38 you should find:

    init_dims = _get_size(initializer)
    

    replace this with:

    try:
        init_dims=initializer.shape
    except:
        init_dims = _get_size(initializer)
    
  4. In line 48 you should find:

    if is_complex:
        complex_array = flat(self, initializer,
                             init_dims, typecode)
        self._real = complex_array['real']
        self._imag = complex_array['imag']
    else:
        self._data = flat(self, initializer, init_dims, typecode)
    

    Replace this with:

    if is_complex:
        try:
            self._real = array.array(typecode,np.ravel(initializer, order='F').real)
            self._imag = array.array(typecode,np.ravel(initializer, order='F').imag)
        except:
            complex_array = flat(self, initializer,init_dims, typecode)
            self._real = complex_array['real']
            self._imag = complex_array['imag']
    else:
        try:
            self._data = array.array(typecode,np.ravel(initializer, order='F'))
        except:
            self._data = flat(self, initializer, init_dims, typecode)
    

Now you can pass a numpy array directly to the MATLAB array creation method.

data1 = np.random.uniform(low = 0.0, high = 30000.0, size = (1000000,))
#faster
data1m = matlab.double(data1)
#or slower method
data1m = matlab.double(data1.tolist())

data2 = np.random.uniform(low = 0.0, high = 30000.0, size = (1000000,)).astype(np.complex128)
#faster
data1m = matlab.double(data2,is_complex=True)
#or slower method
data1m = matlab.double(data2.tolist(),is_complex=True)

The performance in MATLAB array creation increases by a factor of 15 and the interface is easier to use now.

like image 53
max9111 Avatar answered Sep 22 '22 01:09

max9111