Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regex Match Performance Issue For Spintax Parser

Tags:

c#

regex

spintax

I'm writing an application that is intended to process thousands of articles/entries with large number of spintax in the following format:

{Hello|Hi} {World|There!}, how are you?

However when I run the application with profiler I noticed the part where Regex is handled is hogging a lot of resource and my application eventually crash due to out of memory issue. Can anyone suggest a way to improve my code or a better way to parse the spintax?

public static String Spin(String text)
{
        Regex reg = new Regex(@"\{[^\{\}]*\}");
        Random rand = new Random((int)DateTime.Now.Ticks);
        while (true)
        {
            Match m = reg.Match(text);
            if (!m.Success) break;
            String[] parts = m.Value.TrimStart('{').TrimEnd('}').Split('|');
            int i = rand.Next(parts.Length);
            text = text.Substring(0, m.Index) + parts[i] + text.Substring(m.Index + m.Length);
        }
        return text;
  }
like image 521
user1928346 Avatar asked Dec 16 '22 15:12

user1928346


2 Answers

I have implemented my fast version (no Regex, no Split, no Substring, no Replace and other string manipulation methods). To copy string I'm using String.CopyTo which copies symbols to a plain char array.

This code fully supports nested Spintaxes (potentially unlimited depth). One restriction is maximum number of options per Spintax, currently it is 100, but can be changed to 1000 or more... Another restriction is maximum length of input string, it is 100000 now, but can be increased too.

Concerning performance - my tests showed that this code is >15 times faster than any optimised Regex solution (including Jim Mischel's one) and ~5 times faster than versions which are using Substring and other String manipulation methods. I have tested this in Release mode with Optimize Code setting in VS 2012.

    static int[] partIndices = new int[100];
    static int[] depth = new int[100];
    static char[] symbolsOfTextProcessed = new char[100000];

    public static String SpinEvenMoreFaster(String text)
    {
        int cur = SpinEvenMoreFasterInner(text, 0, text.Length, 0);
        return new String(symbolsOfTextProcessed, 0, cur);
    }

    public static int SpinEvenMoreFasterInner(String text, int start, int end, int symbolIndex)
    {
        int last = start;
        for (int i = start; i < end; i++)
        {
            if (text[i] == '{')
            {
                int k = 1;
                int j = i + 1;
                int index = 0;
                partIndices[0] = i;
                depth[0] = 1;
                for (; j < end && k > 0; j++)
                {
                    if (text[j] == '{')
                        k++;
                    else if (text[j] == '}')
                        k--;
                    else if (text[j] == '|')
                    {
                        if (k == 1)
                        {
                            partIndices[++index] = j;
                            depth[index] = 1;
                        }
                        else
                            depth[index] = k;
                    }
                }
                if (k == 0)
                {
                    partIndices[++index] = j - 1;
                    int part = rand.Next(index);
                    text.CopyTo(last, symbolsOfTextProcessed, symbolIndex, i - last);
                    symbolIndex += i - last;
                    if (depth[part] == 1)
                    {
                        text.CopyTo(partIndices[part] + 1, symbolsOfTextProcessed, symbolIndex, partIndices[part + 1] - partIndices[part] - 1);
                        symbolIndex += partIndices[part + 1] - partIndices[part] - 1;
                    }
                    else
                    {
                        symbolIndex = SpinEvenMoreFasterInner(text, partIndices[part] + 1, partIndices[part + 1], symbolIndex);
                    }
                    i = j - 1;
                    last = j;
                }
            }
        }
        text.CopyTo(last, symbolsOfTextProcessed, symbolIndex, end - last);
        return symbolIndex + end - last;
    }
like image 198
SergeyS Avatar answered Dec 29 '22 15:12

SergeyS


Here's a non-Regex alternative.

Update 2012-12-27 (see new ideone demo)

  1. Optimized OP's code to use static members instead of declaring variables within the loop, use RegexOptions.Compiled, and use Substring instead of TrimLeft and TrimRight. These optimizations cut down execution time of OP's code by nearly 33%.
  2. Updated SpinNoRE to handle arbitrarily nested spintaxes, optimized code, and added comments.
  3. Renamed Spin and SpinFaster to SpinRE and SpinNoRE respectively, for clarity.
  4. Updated test case with nested example. OP's code was much, much slower to handle nested spintaxes (understandably, since every level of nesting forces an additional regex match).

New ideone demo available; code below (comments available in demo; see link):

public static String SpinNoRE(String text)
{
    int i, j, e = -1;
    char[] curls = new char[] {'{', '}'};
    text += '~';

    do
    {
        i =  e;
        e = -1;
        while ((i = text.IndexOf('{', i+1)) != -1)
        {
            j = i;
            while ((j = text.IndexOfAny(curls, j+1)) != -1 && text[j] != '}')
            {
                if (e == -1) e = i;
                i = j;
            }
            if (j != -1)
            {
                parts = text.Substring(i+1, (j-1)-(i+1-1)).Split('|');
                text = text.Remove(i, j-(i-1)).Insert(i, parts[rand.Next(parts.Length)]);
            }
        }
    }
    while (e-- != -1);

    return text.Remove(text.Length-1);
}

Result:

Input Text:       Oh! {{I'm|You're} here!|How are you{ doing{|, {buddy|pal|guy}}|}?}
Testing SpinRE:   Oh! You're here!
Testing SpinRE:   Oh! How are you doing?
Testing SpinRE:   Oh! How are you?
Testing SpinRE:   Oh! How are you doing, buddy?
Testing SpinRE:   Oh! I'm here!
Testing SpinRE:   Oh! How are you doing, guy?
Testing SpinRE:   Oh! How are you doing?
Testing SpinRE:   Oh! I'm here!
Testing SpinRE:   Oh! I'm here!
Testing SpinRE:   Oh! How are you doing?
Testing SpinNoRE: Oh! How are you doing, buddy?
Testing SpinNoRE: Oh! You're here!
Testing SpinNoRE: Oh! How are you?
Testing SpinNoRE: Oh! How are you?
Testing SpinNoRE: Oh! You're here!
Testing SpinNoRE: Oh! I'm here!
Testing SpinNoRE: Oh! How are you doing?
Testing SpinNoRE: Oh! How are you?
Testing SpinNoRE: Oh! How are you doing, buddy?
Testing SpinNoRE: Oh! I'm here!

Time elapsed over 100,000 runs of each in alternation:

SpinRE:           03.686s
SpinNoRE:         00.921s

(It's been more than 6 years since I touched C#. Please forgive and point out any mistakes.)

like image 32
slackwing Avatar answered Dec 29 '22 16:12

slackwing