I'm writing an application that is intended to process thousands of articles/entries with large number of spintax in the following format:
{Hello|Hi} {World|There!}, how are you?
However when I run the application with profiler I noticed the part where Regex is handled is hogging a lot of resource and my application eventually crash due to out of memory issue. Can anyone suggest a way to improve my code or a better way to parse the spintax?
public static String Spin(String text)
{
Regex reg = new Regex(@"\{[^\{\}]*\}");
Random rand = new Random((int)DateTime.Now.Ticks);
while (true)
{
Match m = reg.Match(text);
if (!m.Success) break;
String[] parts = m.Value.TrimStart('{').TrimEnd('}').Split('|');
int i = rand.Next(parts.Length);
text = text.Substring(0, m.Index) + parts[i] + text.Substring(m.Index + m.Length);
}
return text;
}
I have implemented my fast version (no Regex
, no Split
, no Substring
, no Replace
and other string manipulation methods). To copy string I'm using String.CopyTo
which copies symbols to a plain char
array.
This code fully supports nested Spintaxes (potentially unlimited depth). One restriction is maximum number of options per Spintax, currently it is 100, but can be changed to 1000 or more... Another restriction is maximum length of input string, it is 100000 now, but can be increased too.
Concerning performance - my tests showed that this code is >15 times faster than any optimised Regex solution (including Jim Mischel's one) and ~5 times faster than versions which are using Substring and other String manipulation methods. I have tested this in Release mode with Optimize Code setting in VS 2012.
static int[] partIndices = new int[100];
static int[] depth = new int[100];
static char[] symbolsOfTextProcessed = new char[100000];
public static String SpinEvenMoreFaster(String text)
{
int cur = SpinEvenMoreFasterInner(text, 0, text.Length, 0);
return new String(symbolsOfTextProcessed, 0, cur);
}
public static int SpinEvenMoreFasterInner(String text, int start, int end, int symbolIndex)
{
int last = start;
for (int i = start; i < end; i++)
{
if (text[i] == '{')
{
int k = 1;
int j = i + 1;
int index = 0;
partIndices[0] = i;
depth[0] = 1;
for (; j < end && k > 0; j++)
{
if (text[j] == '{')
k++;
else if (text[j] == '}')
k--;
else if (text[j] == '|')
{
if (k == 1)
{
partIndices[++index] = j;
depth[index] = 1;
}
else
depth[index] = k;
}
}
if (k == 0)
{
partIndices[++index] = j - 1;
int part = rand.Next(index);
text.CopyTo(last, symbolsOfTextProcessed, symbolIndex, i - last);
symbolIndex += i - last;
if (depth[part] == 1)
{
text.CopyTo(partIndices[part] + 1, symbolsOfTextProcessed, symbolIndex, partIndices[part + 1] - partIndices[part] - 1);
symbolIndex += partIndices[part + 1] - partIndices[part] - 1;
}
else
{
symbolIndex = SpinEvenMoreFasterInner(text, partIndices[part] + 1, partIndices[part + 1], symbolIndex);
}
i = j - 1;
last = j;
}
}
}
text.CopyTo(last, symbolsOfTextProcessed, symbolIndex, end - last);
return symbolIndex + end - last;
}
Here's a non-Regex alternative.
Update 2012-12-27 (see new ideone demo)
RegexOptions.Compiled
, and use Substring
instead of TrimLeft
and TrimRight
. These optimizations cut down execution time of OP's code by nearly 33%.SpinNoRE
to handle arbitrarily nested spintaxes, optimized code, and added comments.Spin
and SpinFaster
to SpinRE
and SpinNoRE
respectively, for clarity.New ideone demo available; code below (comments available in demo; see link):
public static String SpinNoRE(String text)
{
int i, j, e = -1;
char[] curls = new char[] {'{', '}'};
text += '~';
do
{
i = e;
e = -1;
while ((i = text.IndexOf('{', i+1)) != -1)
{
j = i;
while ((j = text.IndexOfAny(curls, j+1)) != -1 && text[j] != '}')
{
if (e == -1) e = i;
i = j;
}
if (j != -1)
{
parts = text.Substring(i+1, (j-1)-(i+1-1)).Split('|');
text = text.Remove(i, j-(i-1)).Insert(i, parts[rand.Next(parts.Length)]);
}
}
}
while (e-- != -1);
return text.Remove(text.Length-1);
}
Result:
Input Text: Oh! {{I'm|You're} here!|How are you{ doing{|, {buddy|pal|guy}}|}?}
Testing SpinRE: Oh! You're here!
Testing SpinRE: Oh! How are you doing?
Testing SpinRE: Oh! How are you?
Testing SpinRE: Oh! How are you doing, buddy?
Testing SpinRE: Oh! I'm here!
Testing SpinRE: Oh! How are you doing, guy?
Testing SpinRE: Oh! How are you doing?
Testing SpinRE: Oh! I'm here!
Testing SpinRE: Oh! I'm here!
Testing SpinRE: Oh! How are you doing?
Testing SpinNoRE: Oh! How are you doing, buddy?
Testing SpinNoRE: Oh! You're here!
Testing SpinNoRE: Oh! How are you?
Testing SpinNoRE: Oh! How are you?
Testing SpinNoRE: Oh! You're here!
Testing SpinNoRE: Oh! I'm here!
Testing SpinNoRE: Oh! How are you doing?
Testing SpinNoRE: Oh! How are you?
Testing SpinNoRE: Oh! How are you doing, buddy?
Testing SpinNoRE: Oh! I'm here!
Time elapsed over 100,000 runs of each in alternation:
SpinRE: 03.686s
SpinNoRE: 00.921s
(It's been more than 6 years since I touched C#. Please forgive and point out any mistakes.)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With