I am currently working on a project where I need do some steps of processing with legacy Matlab code (using the Matlab engine) and the rest in Python (numpy).
I noticed that converting the results from Matlab's matlab.mlarray.double
to numpy's numpy.ndarray
seems horribly slow.
Here is some example code for creating an ndarray with 1000 elements from another ndarray, a list and an mlarray:
import timeit
setup_range = ("import numpy as np\n"
"x = range(1000)")
setup_arange = ("import numpy as np\n"
"x = np.arange(1000)")
setup_matlab = ("import numpy as np\n"
"import matlab.engine\n"
"eng = matlab.engine.start_matlab()\n"
"x = eng.linspace(0., 1000.-1., 1000.)")
print 'From other array'
print timeit.timeit('np.array(x)', setup=setup_arange, number=1000)
print 'From list'
print timeit.timeit('np.array(x)', setup=setup_range, number=1000)
print 'From matlab'
print timeit.timeit('np.array(x)', setup=setup_matlab, number=1000)
Which takes the following times:
From other array
0.00150722111994
From list
0.0705359556928
From matlab
7.0873282467
The conversion takes about 100 times as long as a conversion from list.
Is there any way to speed up the conversion?
Even for the delete operation, the Numpy array is faster. As the array size increase, Numpy gets around 30 times faster than Python List. Because the Numpy array is densely packed in memory due to its homogeneous type, it also frees the memory faster.
The time matlab takes to complete the task is 0.252454 seconds while numpy 0.973672151566, that is almost four times more.
Numpy data structures perform better in: Size - Numpy data structures take up less space. Performance - they have a need for speed and are faster than lists. Functionality - SciPy and NumPy have optimized functions such as linear algebra operations built in.
NumPy Arrays are faster than Python Lists because of the following reasons: An array is a collection of homogeneous data-types that are stored in contiguous memory locations. On the other hand, a list in Python is a collection of heterogeneous data types stored in non-contiguous memory locations.
Moments after posting the question I found the solution.
For one-dimensional arrays, access only the _data
property of the Matlab array.
import timeit
print 'From list'
print timeit.timeit('np.array(x)', setup=setup_range, number=1000)
print 'From matlab'
print timeit.timeit('np.array(x)', setup=setup_matlab, number=1000)
print 'From matlab_data'
print timeit.timeit('np.array(x._data)', setup=setup_matlab, number=1000)
prints
From list
0.0719847538787
From matlab
7.12802865169
From matlab_data
0.118476275533
For multi-dimensional arrays you need to reshape the array afterwards. In the case of two-dimensional arrays this means calling
np.array(x._data).reshape(x.size[::-1]).T
Tim's answer is great for 2D arrays, but a way to adapt it to N dimensional arrays is to use the order
parameter of np.reshape() :
np_x = np.array(x._data).reshape(x.size, order='F')
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With