Calling MATLAB from Python is bound to give some performance reduction that I could avoid by rewriting (a lot of) code in Python. However, this isn't a realistic option for me, but it annoys me that a huge loss of efficiency lies in the simple conversion from a numpy array to a MATLAB double.
I'm talking about the following conversion from data1 to data1m, where
data1 = np.random.uniform(low = 0.0, high = 30000.0, size = (1000000,))
data1m = matlab.double(list(data1))
Here matlab.double comes from Mathworks own MATLAB package / engine. The second line of code takes 20 s on my system, which just seems like too much for a conversion that doesn't really do anything other than making the numbers 'edible' for MATLAB.
So basically I'm looking for a trick opposite to the one given here that works for converting MATLAB output back to Python.
The time matlab takes to complete the task is 0.252454 seconds while numpy 0.973672151566, that is almost four times more.
By explicitly declaring the "ndarray" data type, your array processing can be 1250x faster. This tutorial will show you how to speed up the processing of NumPy arrays using Cython. By explicitly specifying the data types of variables in Python, Cython can give drastic speed increases at runtime.
NumPy Arrays Are NOT Always Faster Than Lists " append() " adds values to the end of both lists and NumPy arrays.
pandas provides a bunch of C or Cython optimized functions that can be faster than the NumPy equivalent function (e.g. reading text from text files). If you want to do mathematical operations like a dot product, calculating mean, and some more, pandas DataFrames are generally going to be slower than a NumPy array.
Passing numpy arrays efficiently
Take a look at the file mlarray_sequence.py
in the folder PYTHONPATH\Lib\site-packages\matlab\_internal
. There you will find the construction of the MATLAB array object. The performance problem comes from copying data with loops within the generic_flattening
function.
To avoid this behavior we will edit the file a bit. This fix should work on complex and non-complex datatypes.
Make a backup of the original file in case something goes wrong.
Add import numpy as np
to the other imports at the beginning of the file
In line 38 you should find:
init_dims = _get_size(initializer)
replace this with:
try:
init_dims=initializer.shape
except:
init_dims = _get_size(initializer)
In line 48 you should find:
if is_complex:
complex_array = flat(self, initializer,
init_dims, typecode)
self._real = complex_array['real']
self._imag = complex_array['imag']
else:
self._data = flat(self, initializer, init_dims, typecode)
Replace this with:
if is_complex:
try:
self._real = array.array(typecode,np.ravel(initializer, order='F').real)
self._imag = array.array(typecode,np.ravel(initializer, order='F').imag)
except:
complex_array = flat(self, initializer,init_dims, typecode)
self._real = complex_array['real']
self._imag = complex_array['imag']
else:
try:
self._data = array.array(typecode,np.ravel(initializer, order='F'))
except:
self._data = flat(self, initializer, init_dims, typecode)
Now you can pass a numpy array directly to the MATLAB array creation method.
data1 = np.random.uniform(low = 0.0, high = 30000.0, size = (1000000,))
#faster
data1m = matlab.double(data1)
#or slower method
data1m = matlab.double(data1.tolist())
data2 = np.random.uniform(low = 0.0, high = 30000.0, size = (1000000,)).astype(np.complex128)
#faster
data1m = matlab.double(data2,is_complex=True)
#or slower method
data1m = matlab.double(data2.tolist(),is_complex=True)
The performance in MATLAB array creation increases by a factor of 15 and the interface is easier to use now.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With