Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to read large matrix from a csv efficiently in Octave

There are many reports of slow performance of Octave's dlmread. I was hoping that this was fixed in 3.2.4, but when I tried to load a csv file that has a size of ca. 8 * 4 mil (32 mil in total), it also took very, very long time. I searched the web but could not find a workaround for this. Does anybody know a good workaround?

like image 488
Enno Shioji Avatar asked Sep 26 '11 08:09

Enno Shioji


2 Answers

I experienced the same problem and had R handy, so my solution was to use "read.csv" in R, and then use the R package "R.matlab" to write a ".mat" file, and then load that in Octave.

"read.csv" can be pretty slow too, but this worked very well in my case.

like image 169
DavidC Avatar answered Dec 15 '22 14:12

DavidC


The reason is that Octave has a bug that adding data to a very large matrix takes more time then adding the same amount of data to a small matrix.

Below is my try. I choose to save data each 50000 lines, so meanwhile I could already take a look instead of being forced to wait. It is slower for small files, but much faster for larger files.

function alldata = load_data(filename)
    fid = fopen(filename,'r');
    s=0;
    data=[];
    alldata=[];
    save "temp.mat" alldata;
    if fid == -1
        disp("Couldn't find file mydata");
    else
        while (~feof(fid))
            line = fgetl(fid);
            [t1,t2,t3,t4,d] = sscanf(line,'%i:%i:%i:%i %f', "C"); #reading time as hh:mm:ss:ms and data as float
            s++;
            t = (t1 * 3600000 + t2 * 60000 + t3 * 1000 + t4);
            data = [data; t, d];
            if (mod(s,10000) == 0)
                #disp(s), disp("  "), disp(t), disp("  "), disp(d), disp("\n");
                disp(s);
                fflush(stdout);
            end
            if (mod(s,50000) == 0)
                load "temp.mat";
                alldata=[alldata; data];
                data=[];
                save "temp.mat" alldata;
                disp("data saved");
                fflush(stdout);
            end
        end
        disp(s);
        load "temp.mat";
        alldata=[alldata; data];
        save "temp.mat" alldata;
        disp("data saved");
        fflush(stdout); 
    end
    fclose(fid);
like image 24
Vincent Hindriksen Avatar answered Dec 15 '22 15:12

Vincent Hindriksen