Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Substrings from a Cell Array in Matlab

I'm new to MATLAB and I'm struggling to comprehend the subtleties between array-wise and element wise operations. I'm working with a large dataset and I've found the simplest methods aren't always the fastest. I have a very large Cell Array of strings, like in this simplified example:

% A vertical array of same-length strings
CellArrayOfStrings = {'aaa123'; 'bbb123'; 'ccc123'; 'ddd123'};

I'm trying to extract an array of substrings, for example:

'a1'
'b1'
'c1'
'd1'

I'm happy enough with an element-wise reference like this:

% Simple element-wise substring operation
MySubString = CellArrayOfStrings{2}(3:4);  % Expected result is 'b1'

But I can't work out the notation to reference them all in one go, like this:

% Desired result is 'a1','b1','c1','d1'
MyArrayOfSubStrings = CellArrayOfStrings{:}(3:4); % Incorrect notation!

I know that Matlab is capable of performing very fast array-wise operations, such as strcat, so I was hoping for a technique that works at a similar speed:

% An array-wise operation which works quickly
tic
speedTest = strcat(CellArrayOfStrings,'hello');
toc   % About 2 seconds on my machine with >500K array elements

All the for loops and functions which use behind-the-scenes iteration I have tried run too slowly with my dataset. Is there some array-wise notation that would do this? Would somebody be able to correct my understanding of element-wise and array-wise operations?! Many thanks!

like image 885
fodfish Avatar asked Oct 10 '13 15:10

fodfish


1 Answers

I can't work out the notation to reference them all in one go, like this:

MyArrayOfSubStrings = CellArrayOfStrings{:}(3:4); % Incorrect notation!

This is because curly braces ({}) return a comma-separated list, which is equivalent to writing the contents of these cells in the following way:

c{1}, c{2}, and so on....

When the subscript index refers to only one element, MATLAB's syntax allows to use parentheses (()) after the curly braces and further extract a sub-array (a substring in your case). However, this syntax is prohibited when the comma separated lists contains multiple items.

So what are the alternatives?

  1. Use a for loop:

    MyArrayOfSubStrings = char(zeros(numel(CellArrayOfStrings), 2));
    for k = 1:size(MyArrayOfSubStrings, 1)
        MyArrayOfSubStrings(k, :) = CellArrayOfStrings{k}(3:4);
    end
    
  2. Use cellfun (a slight variant of Dang Khoa's answer):

    MyArrayOfSubStrings = cellfun(@(x){x(3:4)}, CellArrayOfStrings);
    MyArrayOfSubStrings = vertcat(MyArrayOfSubStrings{:});
    
  3. If your original cell array contains strings of a fixed length, you can follow Dan's suggestion and convert the cell array into an array of strings (a matrix of characters), reshape it and extract the desired columns:

    MyArrayOfSubStrings =vertcat(CellArrayOfStrings{:});
    MyArrayOfSubStrings = MyArrayOfSubStrings(:, 3:4);
    
  4. Employ more complicated methods, such as regular expressions:

    MyArrayOfSubStrings = regexprep(CellArrayOfStrings, '^..(..).*', '$1');
    MyArrayOfSubStrings = vertcat(MyArrayOfSubStrings{:});
    

There are plenty solutions to pick from, just pick the one that fits you most :) I think that with MATLAB's JIT acceleration, a simple loop would be sufficient in most cases.

Also note that in all my suggestions the obtained cell array of substrings cell is converted into an array of strings (a matrix). This is just for the sake of the example; obviously you can keep the substrings stored in a cell array, should you decide so.

like image 115
Eitan T Avatar answered Sep 29 '22 09:09

Eitan T