Would be grateful for some pointers. I am reading about 1M rows of data and it is taking almost 24 hours with the following code. How can I improve the execution time?
The array Day
contains the value of the nth day from the start and there are more then one record for a particular day. The program checks if a particular id (stored in unique_id
) is repeated within 180 days.
%// calculating the number of repeats within 180 days
fid2 = 'data_050913/Unique_id_repeat_count1.xlsx';
fid1 = 'data_050913/data_050913_2000.csv';
fid_data = fopen(fid1);
data = fgetl(fid_data); %// the first line, title line
ep = 0; %// position point number
while 1
data = fgetl(fid_data);
if(length(data)<10)
break;
end
ep = ep+1;
id = find(data == ',');
unique_id(ep) = str2num(data(1:id(1)-1));
day(ep) = str2num(data(id(8)+1:id(9)-1));
end
repeat = zeros(ep,1);
tic
i = 1;
count = 0;
while i <= ep
j = i+1;
while ( (j<=ep) && (day(j)<= day(i)+179) )
if unique_id(i) == unique_id(j)
count = 1;
break;
end
j = j+1;
end
repeat(i,1) = count;
count = 0;
i = i+1;
end
toc
i = 1;
k = 1;
while i<=ep
count = repeat(i,1);
j=i;
while (day(j) == day(i))
count = repeat(j,1)+count;
j = j+1;
if j > ep
break;
end
end
day_final(k,1)= day(i);
repeat_final(k,1) = count;
k = k+1;
i = j;
end
xlswrite(fid2,day_final,'Repeat_Count','A2');
xlswrite(fid2,repeat_final,'Repeat_Count','B2');
Thanks
MATLAB may be running slowly because you have a limited amount of RAM (i.e. under 128MB). The RAM used by MATLAB at runtime is between 40MB-60MB. The HELP browser can take up another 12MB. If you have limited memory (RAM), your processor may start using virtual memory (from your hard drive).
The primary objective of MCC, the new MATLAB compiler, is to make MATLAB® programs run faster. Without the compiler, MATLAB is an interpreted computing environment with dynamic storage allocation. Compiling programs eliminates the interpretive overhead and, more importantly, provides faster storage management.
MATLAB automatically runs calculations on the GPU. For more information, see Run MATLAB Functions on a GPU. For example, use diag , expm , mod , round , abs , and fliplr together. gpuE = expm(diag(gpuX,-1)) * expm(diag(gpuX,1)); gpuM = mod(round(abs(gpuE)),2); gpuF = gpuM + fliplr(gpuM);
A slow startup is often caused by issues with the license search path. You probably need to adjust one of the license environment variables (LM_LICENSE_FILE or MLM_LICENSE_FILE) used by MATLAB, or else bypass them all together.
if not already doing this, ensure you allocate all memory up-front where possible. I've seen Matlab scripts go from 24-hours to 8 minutes by doing this.
Use the zeros
function to preallocate memory for all growing arrays (day
, unique_id
, repeat
, day_final
and repeat_final
).
x = zeros(1000); %// Creates a 1000 element array of all zeros
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With