Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Strcmp for cell arrays of unequal length in MATLAB

Is there an easy way to find a smaller cell array of strings within a larger one? I've got two lists, one with unique elements, and one with repeating elements. I want to find whole occurrences of the specific pattern of the smaller array within the larger. I'm aware that strcmp will compare two cell arrays, but only if they're equal in length. My first thought was to step through subsets of the larger array using a loop, but there's got to be a better solution.

For example, in the following:

smallcellarray={'string1',...
                'string2',...
                'string3'};
largecellarray={'string1',...
                'string2',...
                'string3',...
                'string1',...
                'string2',...
                'string1',...
                'string2',...
                'string3'};

index=myfunction(largecellarray,smallcellarray)

would return

index=[1 1 1 0 0 1 1 1]
like image 614
Doresoom Avatar asked Jun 30 '10 19:06

Doresoom


People also ask

Can you use == to compare strings in Matlab?

You can compare string arrays for equality with the relational operators == and ~= .

How do I check if two arrays are equal in Matlab?

tf = isequal( A,B ) returns logical 1 ( true ) if A and B are equivalent; otherwise, it returns logical 0 ( false ).

How do you check if two strings are equality in Matlab?

tf = strcmp( s1,s2 ) compares s1 and s2 and returns 1 ( true ) if the two are identical and 0 ( false ) otherwise. Text is considered identical if the size and content of each are the same. The return result tf is of data type logical .

What is returned by strcmp s1 s2 function if s1 and s2 are two character arrays and s1 s2?

The syntax of the strcmp() function is: Syntax: int strcmp (const char* str1, const char* str2); The strcmp() function is used to compare two strings two strings str1 and str2 . If two strings are same then strcmp() returns 0 , otherwise, it returns a non-zero value.


2 Answers

You could actually use the function ISMEMBER to get an index vector for where the cells in largecellarray occur in the smaller array smallcellarray, then use the function STRFIND (which works for both strings and numeric arrays) to find the starting indices of the smaller array within the larger:

>> nSmall = numel(smallcellarray);
>> [~, matchIndex] = ismember(largecellarray,...  %# Find the index of the 
                                smallcellarray);    %#   smallcellarray entry
                                                    %#   that each entry of
                                                    %#   largecellarray matches
>> startIndices = strfind(matchIndex,1:nSmall)  %# Starting indices where the
                                                %#   vector [1 2 3] occurs in
startIndices =                                  %#   matchIndex

     1     6

Then it's a matter of building the vector index from these starting indices. Here's one way you could create this vector:

>> nLarge = numel(largecellarray);
>> endIndices = startIndices+nSmall;  %# Get the indices immediately after
                                      %#   where the vector [1 2 3] ends
>> index = zeros(1,nLarge);           %# Initialize index to zero
>> index(startIndices) = 1;           %# Mark the start index with a 1
>> index(endIndices) = -1;            %# Mark one index after the end with a -1
>> index = cumsum(index(1:nLarge))    %# Take the cumulative sum, removing any
                                      %#   extra entry in index that may occur
index =

     1     1     1     0     0     1     1     1

Another way to create it using the function BSXFUN is given by Amro. Yet another way to create it is:

index = cumsum([startIndices; ones(nSmall-1,numel(startIndices))]);
index = ismember(1:numel(largecellarray),index);
like image 135
gnovice Avatar answered Sep 18 '22 14:09

gnovice


Here's my version (based on the answers of both @yuk and @gnovice):

g = grp2idx([S L])';
idx = strfind(g(numel(S)+1:end),g(1:numel(S)));
idx = bsxfun(@plus,idx',0:numel(S)-1);

index = zeros(size(L));
index(idx(:)) = 1;
like image 43
Amro Avatar answered Sep 19 '22 14:09

Amro