Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does my matlab program use so much memory?

Tags:

memory

matlab

I am writing a matlab program, which reads about 500 files. Each file has 20,000 lines, with 1 number on each line. The program tries to build a matrix of 20,000 * 500 with these numbers. The numbers are stored as Double, so 8 bytes per number. So I would expect this to take 20,000 * 500 * 8 bytes, which is approximately 1E8, i.e. 100MB. And yet this program exhausts my 16GB memory. As the program runs, I see the memory use steadily going up, GB by GB. I am using Matlab R2015b on Ubuntu 14.04.

What's happening? Many thanks for your attention.

Here is the full code

clear all;
% number of rna bits in the file
filesize = 20532

maxFiles = 480;
rnaCounts = NaN(filesize,maxFiles);

myFolder = '~/_STATS/data3/RNASeqV2/UNC__IlluminaHiSeq_RNASeqV2/Level_3';
filePattern = fullfile(myFolder, '*genes.normalized_results');

theFiles = dir(filePattern);

rnaCounts = NaN(filesize,length(theFiles));


for k = 1 : length(theFiles) 
    mrnaFilename = strtrim(theFiles(k).name);
    fprintf(1, 'Now reading mrnaFile %d %s  \n', k, mrnaFilename);

    % read rna file
    fullFileName = fullfile(myFolder, mrnaFilename);
    rnafid = fopen(fullFileName);

    if rnafid < 0 
       fprintf('====ERROR OPENING RNA FILE =====================');
    end
    rnaline = fgets(rnafid);

    lc = 1;  % line counter
    while ischar(rnaline) && feof(rnafid) ~= 1
       rnaline = fgets(rnafid);
       rnaSplit = strsplit(rnaline);

       % write to the matrix
       rnaCounts(lc,k) = str2num(rnaSplit{2});

       lc = lc + 1;
    end
    fclose(rnafid);

end
like image 396
Old_Mortality Avatar asked Feb 19 '16 08:02

Old_Mortality


People also ask

How much memory does it take to run a MATLAB program?

It depends on the size of your data. In Matlab, each “double” number (float) takes 8 Bytes memory. Thus, a vector that contains 10 numbers needs 80 Bytes. With this rule of thumb you can compute how much memory you need for your processes.

How to solve the problem of high memory usage?

Increase physical memory. If the high memory usage is caused by the computer running multiple programs at the same time, users could close the program to solve this problem. Or if a program occupies too much memory, users can also end this program to solve this problem.

How much RAM do I need to run MATLAB on Chrome OS?

However, it’s safer to have more because MATLAB is usually not the only process that’s running. If you run MATLAB and chrome simultaneously, and run an intensive code on MATLAB, you can expect it to run out of memory pretty soon. I would suggest an 8 GB RAM to be safe, and a minimum of 4 GB.

Why does Windows 10 use so much memory on startup?

Sometimes, the Windows 10 high memory usage is caused by memory leak, which is caused by defective software design. Memory leak has great influence on computer server where programs will run for a long time.


2 Answers

As verified by the OP, the str2num function in the Linux version of Matlab 2015b has a memory leak. This function is not very useful anyway as it is designed to parse strings representing entire matrices (1 2; 3 4) rather than the typical use case of parsing a single number (1.234). Use str2double when doing simple number parsing; it is faster even when str2num isn't broken.

It is likely that using a different version of Matlab would also work around the problem, because in my experience, these kinds of memory bugs don't usually persist from one version to the next.

like image 93
drhagen Avatar answered Sep 25 '22 21:09

drhagen


Often, high-level I/O functions, such as dlmread or textscan are useful to read such text formats. Use dlmread if you have only numeric data, and textscan for more complex formats.

The sample data you provided is:

A2LD1|87769 135.5735

As you only need the number in the second column and discard the identifier in the first column, all you have is numeric data, and you can use dlmread.

data = dlmread(fullFileName, '\t', 1, 1);

The \t is to specify that the delimiter (column separator) is a Tab. The two 1s are to specify a row offset and a column offset, i.e. ignore the first row (the header) and the first column (id) of the file.

like image 20
hbaderts Avatar answered Sep 23 '22 21:09

hbaderts