I'm working with a series of long strings (e.g., 'ABAABBFGGBHHSFAFDAFDAFDBB'
), that have different lengths. For each string, I'd like to find the length of the longest consecutive occurrences of a specific substring (for instance 'AFD'
, for which the answer in the above example is 3
). Any elegant way of achieving this with MATLAB ?
Let the input and search strings be -
in_str = 'ABAAAFDAFDBBFGGBHHSFAFDAFDAFDBB'
search_str = 'AFD'
We can use strfind
to get starting indices for search string in input string and from those detect consecutive groups of search string -
idx = strfind(in_str,search_str)
grp_matches = diff(idx)==numel(search_str)
So, now we have "islands" of zeros and ones, where the islands of ones represent the presence of consecutive grouped search stings. Next up, we need to find the islands lengths and the maximum island length would be the desired output -
df1 = diff([0 grp_matches 0]) %// Perform differentiation of matches
The ending of islands would be indicated by "-1" in differentiation result and "1" would denote the starting of those islands. So, (find(df1==-1) - find(df1==1))+1
would be the island lengths. The final output would be the max
of it -
out = max(find(df1==-1) - find(df1==1))+1
Summing up the discussion, the entire code could be made into a compact version like so -
df1 = diff([0 diff(strfind(in_str,search_str))==numel(search_str) 0])
out = max(find(df1==-1) - find(df1==1))+1
Use a regular expression:
str = 'AFDABAABBFGGBHHSFAFDAFDAFDBB'; %// note: two occurrences of repeated 'AFD'
substr = 'AFD'; %// sought substring
r = regexp(str, ['(' substr ')+'], 'match');
lengths = cellfun(@numel, r)/numel(substr);
result = max(lengths);
You can increase speed using 'length'
instead of @numel
, as suggested by Divakar:
lengths = cellfun('length', r)/numel(substr);
In this example,
lengths =
1 3
result =
3
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With