Vectorization for meshgrid in Matlab (or Octave)

Question

Vectorized code in Matlab runs much faster than a for loop (see Parallel computing in Octave on a single machine -- package and example for concrete results in Octave)

With that said, is there a way to vectorize the code shown next in Matlab or Octave?

x = -2:0.01:2;
y = -2:0.01:2;
[xx,yy] = meshgrid(x,y);
z = sin(xx.^2-yy.^2);

Edric · Accepted Answer

As pointed out by @Jonas, there are a few options available in MATLAB, and which works best depends on a few factors such as:

How large is your problem
How many machines you have available
Do you have a GPU
Does MATLAB already multithread the operations

Many elementwise operations are multithreaded in MATLAB now - in which case, there's generally little point using PARFOR (unless you have multiple machines and MATLAB Distributed Computing Server licences available).

Truly huge problems that need the memory of multiple machines can benefit from distributed arrays.

Using the GPU can beat the multithreaded performance of a single machine if your problem is of a suitable size and type for GPU computation. Vectorized code tends to be the most natural fit for parallelization via the GPU. For example, you could write your code using gpuArrays from Parallel Computing Toolbox like so and have everything run on the GPU.

x = parallel.gpu.GPUArray.colon(-2,0.01,2);
y = x;
[xx,yy] = meshgrid(x,y); % xx and yy are on the GPU
z = arrayfun( @(u, v) sin(u.*u-v.*v), xx, yy );

I converted the final line there into an arrayfun call as that is more efficient when using gpuArrays.

Divakar · Answer

Vectorization for `meshgrid` and `ndgrid`

If you are still interested in finding a vectorized implementation to make the meshgrid based code in the problem faster, let me suggest a vectorized method with bsxfun and it's GPU ported version. I strongly believe that people must look into vectorization with GPUs as a promising option to speed up MATLAB codes. Codes that employ meshgrid or ndgrid and whose outputs are to be operated with some elementwise operation setup a perfect ground to employ bsxfun into those codes. To add to that, the use of GPU with bsxfun, that lets it work on the elements independently with hundreds and thousands of CUDA cores available, makes it just perfect for GPU implementation.

For your specific problem, the inputs were -

x = -2:0.01:2;
y = -2:0.01:2;

Next, you had -

[xx,yy] = meshgrid(x,y);
z = sin(xx.^2-yy.^2);

With bsxfun, this becomes a one-liner -

z = sin(bsxfun(@minus,x.^2,y.^2.'));

Benchmarking

GPU benchmarking tips were taken from Measure and Improve GPU Performance.

%// Warm up GPU call with insignificant small scalar inputs
temp1 = sin_sqdiff_vect2(0,0);

N_arr = [50 100 200 500 1000 2000 3000]; %// array elements for N (datasize)
timeall = zeros(3,numel(N_arr));

for k = 1:numel(N_arr)
    N = N_arr(k);
    x = linspace(-20,20,N);
    y = linspace(-20,20,N);

    f = @() sin_sqdiff_org(x,y);%// Original CPU code
    timeall(1,k) = timeit(f);
    clear f

    f = @() sin_sqdiff_vect1(x,y);%// Vectorized CPU code
    timeall(2,k) = timeit(f);
    clear f

    f = @() sin_sqdiff_vect2(x,y);%// Vectorized GPU(GTX 750Ti) code
    timeall(3,k) = gputimeit(f);
    clear f
end

%// Display benchmark results
figure,hold on, grid on
plot(N_arr,timeall(1,:),'-b.')
plot(N_arr,timeall(2,:),'-ro')
plot(N_arr,timeall(3,:),'-kx')
legend('Original CPU','Vectorized CPU','Vectorized GPU (GTX 750 Ti)')
xlabel('Datasize (N) ->'),ylabel('Time(sec) ->')

Associated functions

%// Original code
function z = sin_sqdiff_org(x,y)
[xx,yy] = meshgrid(x,y);
z = sin(xx.^2-yy.^2);
return;

%// Vectorized CPU code
function z = sin_sqdiff_vect1(x,y)
z = sin(bsxfun(@minus,x.^2,y.^2.')); %//'
return;

%// Vectorized GPU code
function z = sin_sqdiff_vect2(x,y)
gx = gpuArray(x);
gy = gpuArray(y);
gz = sin(bsxfun(@minus,gx.^2,gy.^2.')); %//'
z =  gather(gz);
return;

Results

enter image description here

Conclusions

As the results show, vectorized method with GPU shows good performance improvement which is about 4.3x against the vectorized CPU code and 6x against the original code. Please keep in mind that GPU has to overcome a minimum overhead that is required with it's setting up, so at least a decent sized input is needed to see the improvement. Hopefully, people would explore more of vectorization with GPUs, as it could not be stressed enough!

Jonas · Answer

In Matlab, the only way to get built-in vectorized functions to multithread is to wait for MathWorks to implement them as such.

Alternatively, you can write the vectorized computation as a loop, and run them in parallel using parfor.

Finally, a number of functions are GPU-enabled, so with access to the parallel processing toolbox you can parallelize these operations, including the subtraction and the element-wise power.

Vectorization for meshgrid in Matlab (or Octave)

Tags:

vectorization

parallel-processing

matlab

octave

db1234

3 Answers

Edric

Vectorization for `meshgrid` and `ndgrid`

Benchmarking

Results

Conclusions

Divakar

Jonas

Recent Activity

Donate For Us

Vectorization for meshgrid in Matlab (or Octave)

Tags:

vectorization

parallel-processing

matlab

octave

db1234

3 Answers

Edric

Vectorization for meshgrid and ndgrid

Benchmarking

Results

Conclusions

Divakar

Jonas

Related questions

Recent Activity

Donate For Us

Vectorization for `meshgrid` and `ndgrid`