Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Finding the shortest repetitive pattern in a string

I was wondering if there was a way to do pattern matching in Octave / matlab? I know Maple 10 has commands to do this but not sure what I need to do in Octave / Matlab. So if a number was 12341234123412341234 the pattern match would be 1234. I'm trying to find the shortest pattern that upon repetiton generates the whole string.

Please note: the numbers (only numbers will be used) won't be this simple. Also, I won't know the pattern ahead of time (that's what I'm trying to find). Please see the Maple 10 example below which shows that the pattern isn't known ahead of time but the command finds the pattern.

Example of Maple 10 pattern matching:

ns:=convert(12341234123412341234,string);

             ns := "12341234123412341234"

StringTools:-PrimitiveRoot(ns);

             "1234"

How can I do this in Octave / Matlab? Ps: I'm using Octave 3.8.1

like image 867
Rick T Avatar asked Mar 10 '15 12:03

Rick T


2 Answers

To find the shortest pattern that upon repetition generates the whole string, you can use regular expressions as follows:

result = regexp(str, '^(.+?)(?=\1*$)', 'match');

Some examples:

>> str = '12341234123412341234';
>> result = regexp(str, '^(.+?)(?=\1*$)', 'match')
result = 
    '1234'

>> str = '1234123412341234123';
>> result = regexp(str, '^(.+?)(?=\1*$)', 'match')
result = 
    '1234123412341234123'

>> str = 'lullabylullaby';
>> result = regexp(str, '^(.+?)(?=\1*$)', 'match')
result = 
    'lullaby'

>> str = 'lullaby1lullaby2lullaby1lullaby2';
>> result = regexp(str, '^(.+?)(?=\1*$)', 'match')
result = 
    'lullaby1lullaby2'
like image 134
Luis Mendo Avatar answered Oct 06 '22 07:10

Luis Mendo


I'm not sure if this can be accomplished with regular expressions. Here is a script that will do what you need in the case of a repeated word called pattern.

It loops through the characters of a string called str, trying to match against another string called pattern. If matching fails, the pattern string is extended as needed.

EDIT: I made the code more compact.

str = 'lullabylullabylullaby';

pattern = str(1);
matchingState = false;
sPtr = 1;
pPtr = 1;

while sPtr <= length(str)
     if str(sPtr) == pattern(pPtr) %// if match succeeds, keep looping through pattern string
            matchingState = true;
            pPtr = pPtr + 1;
            pPtr = mod(pPtr-1,length(pattern)) + 1;
     else                          %// if match fails, extend pattern string and start again
            if matchingState
                sPtr = sPtr - 1;   %// don't change str index when transitioning out of matching state
            end  
            matchingState = false;
            pattern = str(1:sPtr);
            pPtr = 1;
     end

     sPtr = sPtr + 1;

end

display(pattern);

The output is:

pattern =

lullaby

Note:

This doesn't allow arbitrary delimiters between occurrences of the pattern string. For example, if str = 'lullaby1lullaby2lullaby1lullaby2';, then

pattern =

lullaby1lullaby2

This also allows the pattern to end mid-way through a cycle without changing the result. For example, str = 'lullaby1lullaby2lullaby1'; would still result in

pattern =

lullaby1lullaby2

To fix this you could add the lines

if pPtr ~= length(pattern)
    pattern = str;
end
like image 26
eigenchris Avatar answered Oct 06 '22 07:10

eigenchris