I am trying to create a piece of parallel code to speed up the processing of a very large (couple of hundred million rows) array. In order to parallelise this, I chopped my data into 8 (my number of cores) pieces and tried sending each worker 1 piece. Looking at my RAM usage however, it seems each piece is send to each worker, effectively multiplying my RAM usage by 8. A minimum working example:
A = 1:16;
for ii = 1:8
data{ii} = A(2*ii-1:2*ii);
end
Now, when I send this data to workers using parfor
it seems to send the full cell instead of just the desired piece:
output = cell(1,8);
parfor ii = 1:8
output{ii} = data{ii};
end
I actually use some function within the parfor
loop, but this illustrates the case. Does MATLAB actually send the full cell data
to each worker, and if so, how to make it send only the desired piece?
In my personal experience, I found that using parfeval
is better regarding memory usage than parfor
. In addition, your problem seems to be more breakable, so you can use parfeval
for submitting more smaller jobs to MATLAB workers.
Let's say that you have workerCnt
MATLAB workers to which you are gonna handle jobCnt
jobs. Let data
be a cell array of size jobCnt x 1
, and each of its elements corresponds to a data input for function getOutput
which does the analysis on data. The results are then stored in cell array output
of size jobCnt x 1
.
in the following code, jobs are assigned in the first for
loop and the results are retrieved in the second while
loop. The boolean variable doneJobs
indicates which job is done.
poolObj = parpool(workerCnt);
jobCnt = length(data); % number of jobs
output = cell(jobCnt,1);
for jobNo = 1:jobCnt
future(jobNo) = parfeval(poolObj,@getOutput,...
nargout('getOutput'),data{jobNo});
end
doneJobs = false(jobCnt,1);
while ~all(doneJobs)
[idx,result] = fetchnext(future);
output{idx} = result;
doneJobs(idx) = true;
end
Also, you can take this approach one step further if you want to save up more memory. What you could do is that after fetching the results of a done job, you can delete the corresponding member of future
. The reason is that this object stores all the input and output data of getOutput
function which probably is going to be huge. But you need to be careful, as deleting members of future
results index shift.
The following is the code I wrote for this porpuse.
poolObj = parpool(workerCnt);
jobCnt = length(data); % number of jobs
output = cell(jobCnt,1);
for jobNo = 1:jobCnt
future(jobNo) = parfeval(poolObj,@getOutput,...
nargout('getOutput'),data{jobNo});
end
doneJobs = false(jobCnt,1);
while ~all(doneJobs)
[idx,result] = fetchnext(future);
furure(idx) = []; % remove the done future object
oldIdx = 0;
% find the index offset and correct index accordingly
while oldIdx ~= idx
doneJobsInIdxRange = sum(doneJobs((oldIdx + 1):idx));
oldIdx = idx
idx = idx + doneJobsInIdxRange;
end
output{idx} = result;
doneJobs(idx) = true;
end
The comment from @m.s is correct - when parfor
slices an array, then each worker is sent only the slice necessary for the loop iterations that it is working on. However, you might well see the RAM usage increase beyond what you originally expect as unfortunately copies of the data are required as it is passed from the client to the workers via the parfor
communication mechanism.
If you need the data only on the workers, then the best solution is to create/load/access it only on the workers if possible. It sounds like you're after data parallelism rather than task parallelism, for which spmd
is indeed a better fit (as @Kostas suggests).
I would suggest to use the spmd
command of MATLAB.
You can write code almost as it would be for a non-parallel implementation and also have access to the current worker by the labindex
"system" variable.
Have a look here:
http://www.mathworks.com/help/distcomp/spmd.html
And also at this SO question about spmd
vs parfor
:
SPMD vs. Parfor
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With