Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Reason huge performance penalty for >4D arrays MatLab?

Introduction

I have an algorithm that loops billions (trillions) of times and manipulates a matrix stored in 7 dimensions [10x10x10x10x10x10x10], I found that accessing elements in a 7-dimensional matrix was quite slow, curious as I am I ran some tests to identify the performance of accessing elements of multi-dimensional matrices.

Hypothesis

I was reminded that MatLab uses linear indexing under the hood and a friend of mine stated that the performance penalty was probably due to the conversion of 'normal' indexing to linear indexing under the hood Source.

Testing method

To test this hypothesis I tested accessing elements using linear indexing and regular indexing for 2D to 7D matrices. I varied the element I was accessing as well as the matrix size I was accessing, i.e. length of each dimension, but that didn't change the results significantly. The file I used for testing is found below. The hardware used is an Intel(R) Xeon(R) CPU E5-1620 v2 @ 3.70 GHz, 16GB RAM. The MatLab version is R2013B.

Results

Normal indexing 2D: 0.073659s
Linear indexing 2D: 0.064026s
Normal indexing 3D: 0.050719s
Linear indexing 3D: 0.064096s
Normal indexing 4D: 0.055674s
Linear indexing 4D: 0.062112s
Normal indexing 5D: 15.689907s
Linear indexing 5D: 5.265076s
Normal indexing 6D: 16.660496s
Linear indexing 6D: 5.295958s
Normal indexing 7D: 17.029072s
Linear indexing 7D: 5.291139s

It appears that normal indexing is very similar compared to linear indexing in terms of performance up to 4D matrices. For 3D and 4D matrices it appears slightly preferable to use normal indexing. Above 4D matrices linear indexing is highly preferable over normal indexing, but both normal indexing and regular indexing receive a huge performance penalty (~two orders of magnitude).

Conclusion

The moral of the story is to carefully consider the need for more than four dimensions in a matrix when running MatLab when aiming for high performance (apart from the obvious fact that C++ e.g. is for most applications much faster etc. but that's another discussion probably).

Question(s)

As in the title: what is the (potential) cause for this huge hit in performance when exceeding four dimensions in a matrix? And do other languages (such as C++) demonstrate this behaviour as well?

Code used for testing

clear all
clc

% Number if iterations 
n=10000000;

A = rand(10,10);
k1=sub2ind(size(A),3,2);
fprintf('\n')
tic
for ii=1:n
    A1=A(3,2);
end
a=toc;
fprintf('Normal indexing 2D: %fs\n',a)


tic
for ii=1:n
    A2=A(k1);
end
a=toc;
fprintf('Linear indexing 2D: %fs\n',a)

B = rand(10,10,10);
k2=sub2ind(size(B),3,2,1);

tic
for ii=1:n
    B1=B(3,2,1);
end
a=toc;
fprintf('Normal indexing 3D: %fs\n',a)
tic
for ii=1:n
    B2=B(k2);
end
a=toc;
fprintf('Linear indexing 3D: %fs\n',a)

C = rand(10,10,10,10);
k3=sub2ind(size(C),3,2,1,10);

tic
for ii=1:n
    C1=C(3,2,1,10);
end
a=toc;
fprintf('Normal indexing 4D: %fs\n',a)
tic
for ii=1:n
    C2=C(k3);
end
a=toc;
fprintf('Linear indexing 4D: %fs\n',a)

D = rand(10,10,10,10,10);
k4=sub2ind(size(D),3,2,1,10,1);

tic
for ii=1:n
    D1=D(3,2,1,10,1);
end
a=toc;
fprintf('Normal indexing 5D: %fs\n',a)
tic
for ii=1:n
    D2=D(k4);
end
a=toc;
fprintf('Linear indexing 5D: %fs\n',a)

E = rand(10,10,10,10,10,10);
k5=sub2ind(size(E),3,2,1,10,1,2);

tic
for ii=1:n
    E1=E(3,2,1,10,1,2);
end
a=toc;
fprintf('Normal indexing 6D: %fs\n',a)
tic
for ii=1:n
    E2=E(k5);
end
a=toc;
fprintf('Linear indexing 6D: %fs\n',a)

F = rand(10,10,10,10,10,10,10);
k6=sub2ind(size(F),3,2,1,10,1,2,3);

tic
for ii=1:n
    F1=F(3,2,1,10,1,2);
end
a=toc;
fprintf('Normal indexing 7D: %fs\n',a)
tic
for ii=1:n
    F2=F(k6);
end
a=toc;
fprintf('Linear indexing 7D: %fs\n',a)
like image 478
EJG89 Avatar asked Jul 25 '14 13:07

EJG89


People also ask

How to avoid dynamic array resizing performance penalties in MATLAB?

As I have explained last week, the best way to avoid the performance penalties associated with dynamic array resizing (typically, growth) in Matlab is to pre-allocate the array to its expected final size. I have shown different alternatives for such preallocation, but in all cases the performance is much better than using a naïve dynamic resize.

What is the advantage of out-of-bounds indexing in MATLAB?

As can be seen, it is much faster to directly index an out-of-bounds element as a means to force Matlab to enlarge a data array, rather than using the end+1 notation, which needs to recalculate the value of end each time.

How do you initialize a cell array in MATLAB?

First time through the loop, the cell array C is initialized with a single element of 0at position 1. The result is output because there is no terminating semicolon: C = [0] This shows C is a cell array with a single element - the array [0](everything in Matlab is an array, even a scalar).

What are the dimensions of arrays in MATLAB?

The dimensions of arrays on the right side and the left side of the assignment must be the same. Extend A into a 3-by-3-by-3-by-2, four-dimensional array. In the first assignment, MATLAB pads A to fill the unassigned elements in the extended dimension with zeros. The second and third assignments replace the zeros with the specified values.


1 Answers

At some point, there must be an linear address for a value, because that's how most processors work. The processors are great at fetching data from a single location.

Throughout this answer, the terms matrix and array will be used interchangably.

Single Dimension Matrix
The address calculation for a single dimension array:

  address = starting_array_address + (index * sizeof(object_type));

The object_type is the type of the object, such as unsigned char or int.

2 Dimensional Matrix
The calculation for the index position of a value in a 2D array represented as a 1D array is:

  index = Dimension2_value * Dimension1_capacity + Dimension1_value;

The address of the value can be calculated using the index and the equation for the Single Dimensional object.

3 Dimensional Matrix
The Dimension3_value can be represented as the number of planes inside a cube.
So, this extends the equation by another term:

index = Dimension_3_value * (Dimension2_capacity)
      + Dimension2_value * Dimension1_capacity
      + Dimension1_value;

4D & above Matrices The calculation for 4D and above matrices can be extended in a similar manner.

The bottleneck in the index calculation are the multiplications.

Some platforms can calculate the terms in parallel, thus giving some speed advantages.

Speed speculations
I guess that Matlab has optimized its index calculations for 2D and 3D matrices but not larger ones. The 2D and 3D matrices are more widely used than 4D and larger.

A typical optimization would be to assign the values of each term to different registers, then sum up the registers. This is to keep the values as close to the processor as possible. Some processors may not have enough registers for the 4D calculation and so the compiler may have to place the values in temporary memory (outside the processor), thus slowing down performance.

Fetching the value
For 1D, 2D and 3D matrices, the processor may be able to contain the data in its data cache, thus speeding up performances. As the dimensions grow, the size of matrix must shrink in order for it to fit into the processor's data cache. Anytime that a value is not in the cache is called a cache-miss and the processor must load the value from memory external to the processor. Depending on how the data is reloaded, it may reload a "strip" of data or a lot more. In either case, it's a reduction in performance.

Summary
Calculating the index takes more processing time for each additional dimension of a matrix. The multiplications are the bottleneck of the calculation (even though they may be one instruction, a multiplication takes longer than an addition). Although each term of the index calculation can be performed in parallel, there may not be enough threads or cores or processing units to perform the index calculation of all the dimensions at one time. There may not be enough registers or data cache for the optimal calculation, so the processor will have to fetch from external memory, slowing down performance. If the data item is not in the processor's data cache, the processor will waste time reloading the entire or portions of the data cache (the time is wasted because the processor could be performing calculations instead of reloading the cache).

In general, operating on determinants (smaller sub-matrices) is usually usually results in better performance than operating on a large multi-dimensional matrix.

like image 118
Thomas Matthews Avatar answered Sep 28 '22 22:09

Thomas Matthews