Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I read a delimited file with strings/numbers with Octave?

Tags:

I am trying to read a text file containing digits and strings using Octave. The file format is something like this:

A B C
a 10 100
b 20 200
c 30 300
d 40 400
e 50 500

but the delimiter can be space, tab, comma or semicolon. The textread function works fine if the delimiter is space/tab:

[A,B,C] = textread ('test.dat','%s %d %d','headerlines',1)

However it does not work if delimiter is comma/semicolon. I tried to use dklmread:

dlmread ('test.dat',';',1,0)

but it does not work because the first column is a string. Basically, with textread I can't specify the delimiter and with dlmread I can't specify the format of the first column. Not with the versions of these functions in Octave, at least. Has anybody ever had this problem before?

like image 558
rs028 Avatar asked Mar 14 '11 16:03

rs028


2 Answers

textread allows you to specify the delimiter-- it honors the property arguments of strread. The following code worked for me:

[A,B,C] = textread( 'test.dat', '%s %d %d' ,'delimiter' , ',' ,1 )
like image 58
Jordan Avatar answered Nov 05 '22 04:11

Jordan


I couldn't find an easy way to do this in Octave currently. You could use fopen() to loop through the file and manually extract the data. I wrote a function that would do this on arbitrary data:

function varargout = coltextread(fname, delim)

    % Initialize the variable output argument
    varargout = cell(nargout, 1);

    % Initialize elements of the cell array to nested cell arrays
    % This syntax is due to {:} producing a comma-separated 
    [varargout{:}] = deal(cell());

    fid = fopen(fname, 'r');

    while true
        % Get the current line
        ln = fgetl(fid);

        % Stop if EOF
        if ln == -1
            break;
        endif

        % Split the line string into components and parse numbers
        elems = strsplit(ln, delim);
        nums = str2double(elems);

        nans = isnan(nums);

        % Special case of all strings (header line)
        if all(nans)
            continue;
        endif

        % Find the indices of the NaNs 
        % (i.e. the indices of the strings in the original data)
        idxnans = find(nans);

        % Assign each corresponding element in the current line
        % into the corresponding cell array of varargout
        for i = 1:nargout
            % Detect if the current index is a string or a num
            if any(ismember(idxnans, i))
                varargout{i}{end+1} = elems{i};
            else
                varargout{i}{end+1} = nums(i);
            endif
        endfor
    endwhile

endfunction

It accepts two arguments: the file name, and the delimiter. The function is governed by the number of return variables that are specified, so, for example, [A B C] = coltextread('data.txt', ';'); will try to parse three different data elements from each row in the file, while A = coltextread('data.txt', ';'); will only parse the first elements. If no return variable is given, then the function won't return anything.

The function ignores rows that have all-strings (e.g. the 'A B C' header). Just remove the if all(nans)... section if you want everything.

By default, the 'columns' are returned as cell arrays, although the numbers within those arrays are actually converted numbers, not strings. If you know that a cell array contains only numbers, then you can easily convert it to a column vector with: cell2mat(A)'.

like image 37
voithos Avatar answered Nov 05 '22 03:11

voithos