I've seen this Rabin Karp string matching algorithm in the forums on the website and I'm interested in trying to implement it but I was wondering If anyone could tell me why the variables ulong Q and ulong D are 100007 and 256 respectively :S? What significance do these values carry with them? <pre class="prettyprint"><code>static void Main(string[] args) { string A = "String that contains a pattern."; string B = "pattern"; ulong siga = 0; ulong sigb = 0; ulong Q = 100007; ulong D = 256; for (int i = 0; i < B.Length; i++) { siga = (siga * D + (ulong)A[i]) % Q; sigb = (sigb * D + (ulong)B[i]) % Q; } if (siga == sigb) { Console.WriteLine(string.Format(">>{0}<<{1}", A.Substring(0, B.Length), A.Substring(B.Length))); return; } ulong pow = 1; for (int k = 1; k <= B.Length - 1; k++) pow = (pow * D) % Q; for (int j = 1; j <= A.Length - B.Length; j++) { siga = (siga + Q - pow * (ulong)A[j - 1] % Q) % Q; siga = (siga * D + (ulong)A[j + B.Length - 1]) % Q; if (siga == sigb) { if (A.Substring(j, B.Length) == B) { Console.WriteLine(string.Format("{0}>>{1}<<{2}", A.Substring(0, j), A.Substring(j, B.Length), A.Substring(j + B.Length))); return; } } } Console.WriteLine("Not copied!"); } </code></pre>

About the magic numbers Paul's answer is pretty clear. As far as the code is concerned, Rabin Karp's principal idea is to perform an hash comparison between a sliding portion of the string and the pattern. The hash cannot be computed each time on the whole substrings, otherwise the computation complexity would be quadratic <code>O(n^2)</code> instead of linear <code>O(n)</code>. Therefore, a rolling hash function is applied, such as at each iteration only one character is needed to update the hash value of the substring. So, let's comment your code: <pre class="prettyprint"><code>for (int i = 0; i < B.Length; i++) { siga = (siga * D + (ulong)A[i]) % Q; sigb = (sigb * D + (ulong)B[i]) % Q; } if (siga == sigb) { Console.WriteLine(string.Format(">>{0}<<{1}", A.Substring(0, B.Length), A.Substring(B.Length))); return; } </code></pre> <code>^</code> This piece computes the hash of pattern <code>B</code> (<code>sigb</code>), and the hashcode of the initial substring of <code>A</code> of the same length of <code>B</code>. Actually it's not completely correct because hash can collide¹ and so, it is necessary to modify the if statement : <code>if (siga == sigb && A.Substring(0, B.Length) == B)</code>. <pre class="prettyprint"><code>ulong pow = 1; for (int k = 1; k <= B.Length - 1; k++) pow = (pow * D) % Q; </code></pre> <code>^</code> Here's computed <code>pow</code> that is necessary to perform the rolling hash. <pre class="prettyprint"><code>for (int j = 1; j <= A.Length - B.Length; j++) { siga = (siga + Q - pow * (ulong)A[j - 1] % Q) % Q; siga = (siga * D + (ulong)A[j + B.Length - 1]) % Q; if (siga == sigb) { if (A.Substring(j, B.Length) == B) { Console.WriteLine(string.Format("{0}>>{1}<<{2}", A.Substring(0, j), A.Substring(j, B.Length), A.Substring(j + B.Length))); return; } } } </code></pre> <code>^</code> Finally, the remaining string (i.e. from the second character to end), is scanned updating the hash value of the A substring and compared with the hash of B (computed at the beginning). If the two hashes are equal, the substring and the pattern are compared¹ and if they're actually equal a message is returned. <hr> ¹ Hash values can collide; hence, if two strings have different hash values they're definitely different, but if the two hashes are equal they can be equal or not.

Rabin Karp string matching algorithm

Tags:

string

c#

algorithm

rabin-karp

I've seen this Rabin Karp string matching algorithm in the forums on the website and I'm interested in trying to implement it but I was wondering If anyone could tell me why the variables ulong Q and ulong D are 100007 and 256 respectively :S? What significance do these values carry with them?

static void Main(string[] args)
{
    string A = "String that contains a pattern.";
    string B = "pattern";
    ulong siga = 0;
    ulong sigb = 0;
    ulong Q = 100007;
    ulong D = 256;
    for (int i = 0; i < B.Length; i++)
    {
        siga = (siga * D + (ulong)A[i]) % Q;
        sigb = (sigb * D + (ulong)B[i]) % Q;
    }
    if (siga == sigb)
    {
        Console.WriteLine(string.Format(">>{0}<<{1}", A.Substring(0, B.Length), A.Substring(B.Length)));
        return;
    }
    ulong pow = 1;
    for (int k = 1; k <= B.Length - 1; k++)
        pow = (pow * D) % Q;

    for (int j = 1; j <= A.Length - B.Length; j++)
    {
        siga = (siga + Q - pow * (ulong)A[j - 1] % Q) % Q;
        siga = (siga * D + (ulong)A[j + B.Length - 1]) % Q;
        if (siga == sigb)
        {
            if (A.Substring(j, B.Length) == B)
            {
                Console.WriteLine(string.Format("{0}>>{1}<<{2}", A.Substring(0, j),
                                                                    A.Substring(j, B.Length),
                                                                    A.Substring(j + B.Length)));
                return;
            }
        }
    }
    Console.WriteLine("Not copied!");
}

750

asked Apr 26 '12 17:04

c grum

1 Answers

About the magic numbers Paul's answer is pretty clear.

As far as the code is concerned, Rabin Karp's principal idea is to perform an hash comparison between a sliding portion of the string and the pattern.

The hash cannot be computed each time on the whole substrings, otherwise the computation complexity would be quadratic O(n^2) instead of linear O(n).

Therefore, a rolling hash function is applied, such as at each iteration only one character is needed to update the hash value of the substring.

So, let's comment your code:

for (int i = 0; i < B.Length; i++)
{
    siga = (siga * D + (ulong)A[i]) % Q;
    sigb = (sigb * D + (ulong)B[i]) % Q;
}
if (siga == sigb)
{
    Console.WriteLine(string.Format(">>{0}<<{1}", A.Substring(0, B.Length), A.Substring(B.Length)));
    return;
}

^ This piece computes the hash of pattern B (sigb), and the hashcode of the initial substring of A of the same length of B. Actually it's not completely correct because hash can collide¹ and so, it is necessary to modify the if statement : if (siga == sigb && A.Substring(0, B.Length) == B).

ulong pow = 1;
for (int k = 1; k <= B.Length - 1; k++)
    pow = (pow * D) % Q;

^ Here's computed pow that is necessary to perform the rolling hash.

for (int j = 1; j <= A.Length - B.Length; j++)
{
    siga = (siga + Q - pow * (ulong)A[j - 1] % Q) % Q;
    siga = (siga * D + (ulong)A[j + B.Length - 1]) % Q;
    if (siga == sigb)
    {
        if (A.Substring(j, B.Length) == B)
        {
            Console.WriteLine(string.Format("{0}>>{1}<<{2}", A.Substring(0, j),
                                                                A.Substring(j, B.Length),
                                                                A.Substring(j + B.Length)));
            return;
        }
    }
}

^ Finally, the remaining string (i.e. from the second character to end), is scanned updating the hash value of the A substring and compared with the hash of B (computed at the beginning).

If the two hashes are equal, the substring and the pattern are compared¹ and if they're actually equal a message is returned.

¹ Hash values can collide; hence, if two strings have different hash values they're definitely different, but if the two hashes are equal they can be equal or not.

181

answered Sep 24 '22 04:09

digEmAll

Related questions
                            
                                C# Windows Service Main Method
                            
                                How to do the processing and keep GUI refreshed using databinding?
                            
                                WMI: The RPC server is unavailable. (Exception from HRESULT: 0x800706BA) throws when try to connect to remote machine
                            
                                Obtain containing object instance from ModelMetadataProvider in ASP.NET MVC
                            
                                IronPython invocation from C# (with SciPy) fails with ImportException: "No module named mtrand"
                            
                                Converting SVG path data into GDI+ GraphicsPath data
                            
                                Function imports cannot be created for composable functions
                            
                                Why does C# compiler overload resolution algorithm treat static and instance members with equal signature as equal?
                            
                                Winforms binding question
                            
                                Validation best practice for Model and ViewModel
                            
                                How can I serialize dynamic object to JSON in C# MVC Controller action?
                            
                                Autofac: Hiding multiple contravariant implementations behind one composite
                            
                                CodedUI tests - start a browser once for the entire set of tests
                            
                                interacting between a C# project and C++ project in same solution
                            
                                Compiling a lambda expression results in delegate with Closure argument
                            
                                How to avoid spaghetti code when using completion events?
                            
                                using attached events with caliburn micro Message.Attach
                            
                                Is there a good port of leveldb for C#? [closed]
                            
                                Making variables captured by a closure volatile
                            
                                What is Compare And Swap good for?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With