I am doing parallel computations with MATALB parfor
. The code structure looks pretty much like
%%% assess fitness %%%
% save communication overheads
bitmaps = pop(1, new_indi_idices);
porosities = pop(2, new_indi_idices);
mid_fitnesses = zeros(1, numel(new_indi_idices));
right_fitnesses = zeros(1, numel(new_indi_idices));
% parallelization starts
parfor idx = 1:numel(new_indi_idices) % only assess the necessary
bitmap = bitmaps{idx};
if porosities{idx}>POROSITY_MIN && porosities{idx}<POROSITY_MAX
[mid_dsp, right_dsp] = compute_displacement(bitmap, ['1/' num2str(PIX_NO_PER_SIDE)]);
mid_fitness = 100+mid_dsp;
right_fitness = 100+right_dsp;
else % porosity not even qualified
mid_fitness = 0;
right_fitness = 0;
end
mid_fitnesses(idx) = mid_fitness;
right_fitnesses(idx) = right_fitness;
fprintf('Done.\n');
pause(0.01); % for break
end
I encountered the following weird error.
Error using parallel.internal.pool.deserialize (line 9)
Bad version or endian-key
Error in distcomp.remoteparfor/getCompleteIntervals (line 141)
origErr =
parallel.internal.pool.deserialize(intervalError);
Error in nsga2 (line 57)
parfor idx = 1:numel(new_indi_idices) % only assess the necessary
How should I fix it? A quick Google search returns no solution.
The weirder thing is the following snippet works perfectly under the exactly same settings and the same HPC. I think there might be some subtle differences between them two, causing one to work and the other to fail. The working snippet:
%%% assess fitness %%%
% save communication overheads
bitmaps = pop(1, new_indi_idices);
porosities = pop(2, new_indi_idices);
fitnesses = zeros(1, numel(new_indi_idices));
% parallelization starts
parfor idx = 1:numel(new_indi_idices) % only assess the necessary
bitmap = bitmaps{idx};
if porosities{idx}>POROSITY_MIN && porosities{idx}<POROSITY_MAX
displacement = compute_displacement(bitmap, ['1/' num2str(PIX_NO_PER_SIDE)]);
fitness = 100+displacement;
else % porosity not even qualified
fitness = 0;
end
fitnesses(idx) = fitness;
%fprintf('Done.\n', gen, idx);
pause(0.01); % for break
end
pop(3, new_indi_idices) = num2cell(fitnesses);
Suspecting [mid_dsp, right_dsp] = compute_displacement(bitmap, ['1/' num2str(PIX_NO_PER_SIDE)]);
causes me trouble, I replace it with
mid_dsp = rand();
right_dsp = rand();
Then, it works! This proves that this is indeed caused by this particular line. However, I do have tested the function, and it returns two numbers correctly! Since the function returns value just as rand()
does, I can't see any difference. This confuses me more.
I had the same issue and it came out that Matlab 2015 is reserving all necessary memory resources for each of the loops in the parfor resulting in memory break shortage. The error message is tricky. After fine tuning the code in the loop and providing 120GB of RAM from the SSD through system setting in Pagefile in Windows 10, the parfor executed beautifully.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With