Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Passing arrays, without overhead (preferably "by reference"), to avoid duplicating complex code blocks, in matlab?

Tags:

matlab

I have complex code blocks, in a Matlab script, that act on large, non-sparse arrays. The code performs many write operations to random elements in the arrays, as well as read operations. The identical code must execute against different (large) arrays (i.e., the same code blocks, except for different array variable names).

I do not want to have long, duplicated code blocks that differ only in the array names.

Unfortunately, when I create a function to perform the operations, so that the code block appears only once, the performance slows down by a factor of 10 or more (presumably due to the copying of the array). However, I do not need the array copied. I would prefer to "pass by reference", so that the purpose of the function call is ONLY to avoid having duplicated code blocks. There seems to be no way to avoid the copy-on-write semantics, however.

Also, it is impossible (so far as I understand) to create a script (not a function) to achieve this, because the script must contain identical variable names as the calling script, so I would need a different script for every array on which I wish to run the script, which gains nothing (I still would have duplicated code blocks).

I have looked into creating an alias variable name to "substitute" for the array variable name of interest, in which case I could call a script and avoid duplicated code. However, I cannot find any way to create an alias in Matlab.

Finally, I have attempted writing a function that utilizes the evalin() function, and passing the string name of the array variable to this function, but although this works, the performance is also vastly slower - about the same as passing the arrays by value to a function (at least a 10 times decay in performance).

I am coming to the conclusion that it is impossible in Matlab to avoid duplicating code blocks when performing complex operations on non-sparse arrays, in the effort to avoid the ghastly overhead that Matlab seems to present using any possible technique of avoiding duplicated code blocks.

I find this hard to believe, but I cannot find a way around it.

Does anybody know of a way to avoid duplicated code blocks when performing identical intricate operations on multiple non-sparse arrays in Matlab?

like image 904
Dan Nissenbaum Avatar asked Oct 25 '12 22:10

Dan Nissenbaum


3 Answers

As noted by Loren on his blog, MATLAB does support in-line operations on matrices, which essentially covers passing arrays by reference, modifying them in a function, and returning the result. You seem to know that, but you erroneously state that because the script must contain identical variable names as the calling script. Here is code example that shows this is wrong. When testing, please copy it verbatim and save as a function:

function inplace_test
y = zeros(1,1e8);
x = zeros(1,1e8);

tic; x = compute(x); toc
tic; y = compute(y); toc
tic; x = computeIP(x); toc
tic; y = computeIP(y); toc
tic; x = x+1; toc
end

function x=computeIP(x)
x = x+1;
end

function y=compute(x)
y = x+1;
end

Time results on my computer:

Elapsed time is 0.243335 seconds.
Elapsed time is 0.251495 seconds.
Elapsed time is 0.090949 seconds.
Elapsed time is 0.088894 seconds.
Elapsed time is 0.090638 seconds.

As you see, the two last calls that use an in-place function are equally fast for both input arrays x and y. Also, they are equally fast as running x = x+1 without a function. The only important thing is that inside the function input and output parameters are the same. And there is one more thing...

If I should guess what is wrong with your code, I'd say you made nested functions that you expect to be in-place. And they are not. So the below code will not work:

function inplace_test
y = zeros(1,1e8);
x = zeros(1,1e8);

tic; x = compute(x); toc
tic; y = compute(y); toc
tic; x = computeIP(x); toc
tic; y = computeIP(y); toc
tic; x = x+1; toc

    function x=computeIP(x)
        x = x+1;
    end

    function y=compute(x)
        y = x+1;
    end
end

Elapsed time is 0.247798 seconds.
Elapsed time is 0.257521 seconds.
Elapsed time is 0.229774 seconds.
Elapsed time is 0.237215 seconds.
Elapsed time is 0.090446 seconds.

The bottom line - be careful with those nested functions..

like image 92
angainor Avatar answered Nov 02 '22 15:11

angainor


You may try to put all of your arrays into a single cell array and use index on it, instead of referring by names. Function will still copy the arrays, but script can do the job.

like image 4
VBel Avatar answered Nov 02 '22 16:11

VBel


The handle solution suggested by Brian L does work although the first call that modifies the wrapped data does take a long time (because it has to make a copy of the original data).

Try this:

SomeData.m

classdef SomeData < handle
    properties        
            X
    end
    methods                
        function obj = SomeData(x)            
            if nargin > 0
                obj.X = x;
            else
                obj.X = [];
            end
        end
    end
end

LargeOp.m

function directArray = LargeOp( someData, directArray )
    if nargin > 1
        directArray(1,1) = rand(1);
    else
        someData.X(1,1) = rand(1);
        directArray = [];    
    end
end

Script to test performance

large = zeros(10000,10000);

data = SomeData(large);

tic
LargeOp(data);
toc

tic
large = LargeOp(data,large);
toc

tic
LargeOp(data);
toc

tic
large = LargeOp(data,large);
toc

Results

Elapsed time is 0.364589 seconds.
Elapsed time is 0.450668 seconds.
Elapsed time is 0.001073 seconds.
Elapsed time is 0.443150 seconds.
like image 2
grantnz Avatar answered Nov 02 '22 17:11

grantnz