How can I run lots of RegExes (to find matches) in big strings without causing LOH fragmentation?
It's .NET Framework 4.0 so I'm using StringBuilder
so it's not in the LOH however as soon as I need to run a RegEx on it I have to call StringBuilder.ToString()
which means it'll be in the LOH.
Is there any solution to this problem? It's virtually impossible to have a long running application that deals with big strings and RegExes like this.
An Idea to Solve this problem:
While thinking about this problem, I think I found a dirty solution.
At a given time I only have 5 strings and these 5 strings (bigger than 85KB) will be passed to RegEx.Match
.
Since the fragmentation occurs because new objects won't fit to empty spaces in LOH, this should solve the problem:
PadRight
all strings to a max. accepted size, let's say 1024KB (I might need to do this with StringBuider
)I suppose the biggest problem with this design what happens if other big objects allocate this location in LOH which would cause application to allocate lots of 1024 KB strings maybe with an even worse fragmentation. fixed
statement might help however how can I send a fixed string to RegEx without actually create a new string which is not located in a fixed memory address?
Any ideas about this theory? (Unfortunately I can't reproduce the problem easily, I'm generally trying to use a memory profiler to observe the changes and not sure what kind of isolated test case I can write for this)
OK, here is my attempt solve this problem in a fairly generic way but with some obvious limitations. Since I haven't seen this advice anywhere and everyone is whining about LOH Fragmentation I wanted to share the code to confirm that my design and assumptions are correct.
Theory:
new StringBuilder(ChunkSize * 5);
GCHandle.Alloc(pinnedText, GCHandleType.Pinned)
. Even though LOH objects are normally pinned this seems to improve the performance. Maybe because of unsafe
codeWith this implementation the code below works just like there is no LOH allocation. If I switch to new string(' ')
allocations instead of using a static StringBuilder
or use StringBuilder.ToString()
code can allocate 300% less memory before crashing with outofmemory exception
I also confirmed the results with a memory profiler, that there is no LOH fragmentation in this implementation. I still don't understand why RegEx doesn't cause any unexpected problems. I also tested with different and expensive RegEx patterns and results are same, no fragmentation.
Code:
http://pastebin.com/ZuuBUXk3
using System;
using System.Collections.Generic;
using System.Runtime.InteropServices;
using System.Text;
using System.Text.RegularExpressions;
namespace LOH_RegEx
{
internal class Program
{
private static List<string> storage = new List<string>();
private const int ChunkSize = 100000;
private static StringBuilder _sb = new StringBuilder(ChunkSize * 5);
private static void Main(string[] args)
{
var pinnedText = new string(' ', ChunkSize * 10);
var sourceCodePin = GCHandle.Alloc(pinnedText, GCHandleType.Pinned);
var rgx = new Regex("A", RegexOptions.CultureInvariant | RegexOptions.Compiled);
try
{
for (var i = 0; i < 30000; i++)
{
//Simulate that we read data from stream to SB
UpdateSB(i);
CopyInto(pinnedText);
var rgxMatch = rgx.Match(pinnedText);
if (!rgxMatch.Success)
{
Console.WriteLine("RegEx failed!");
Console.ReadLine();
}
//Extra buffer to fragment LoH
storage.Add(new string('z', 50000));
if ((i%100) == 0)
{
Console.Write(i + ",");
}
}
}
catch (Exception ex)
{
Console.WriteLine(ex.ToString());
Console.WriteLine("OOM Crash!");
Console.ReadLine();
}
}
private static unsafe void CopyInto(string text)
{
fixed (char* pChar = text)
{
int i;
for (i = 0; i < _sb.Length; i++)
{
pChar[i] = _sb[i];
}
pChar[i + 1] = '\0';
}
}
private static void UpdateSB(int extraSize)
{
_sb.Remove(0,_sb.Length);
var rnd = new Random();
for (var i = 0; i < ChunkSize + extraSize; i++)
{
_sb.Append((char)rnd.Next(60, 80));
}
}
}
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With