I have an application where I keep log strings in circular buffers. When a log gets full, for every new insert, old strings will be released for garbage collection and then they are in generation 2 memory. Thus, eventually a generation 2 GC will happen, which I would like to avoid.
I tried to marshal the string into a struct. Surprisingly, I still get generation 2 GC:s. It seems the struct still keeps some reference to the string. Complete console app below. Any help appreciated.
using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.Linq;
using System.Runtime.InteropServices;
using System.Text;
using System.Threading.Tasks;
namespace ConsoleApplication
{
class Program
{
[StructLayout(LayoutKind.Sequential)]
public struct FixedString
{
[MarshalAs(UnmanagedType.ByValTStr, SizeConst = 256)]
private string str;
public FixedString(string str)
{
this.str = str;
}
}
[StructLayout(LayoutKind.Sequential)]
public struct UTF8PackedString
{
private int length;
[MarshalAs(UnmanagedType.ByValArray, SizeConst = 256)]
private byte[] str;
public UTF8PackedString(int length)
{
this.length = length;
str = new byte[length];
}
public static implicit operator UTF8PackedString(string str)
{
var obj = new UTF8PackedString(Encoding.UTF8.GetByteCount(str));
var bytes = Encoding.UTF8.GetBytes(str);
Array.Copy(bytes, obj.str, obj.length);
return obj;
}
}
const int BufferSize = 1000000;
const int LoopCount = 10000000;
static void Main(string[] args)
{
Console.WriteLine("{0}\t{1}\t{2}\t{3}\t{4}",
"Type".PadRight(20), "Time", "GC(0)", "GC(1)", "GC(2)");
Console.WriteLine();
for (int i = 0; i < 5; i++)
{
TestPerformance<string>(s => s);
TestPerformance<FixedString>(s => new FixedString(s));
TestPerformance<UTF8PackedString>(s => s);
Console.WriteLine();
}
Console.ReadKey();
}
private static void TestPerformance<T>(Func<string, T> func)
{
var buffer = new T[BufferSize];
GC.Collect(2);
Stopwatch stopWatch = new Stopwatch();
var initialCollectionCounts = new int[] { GC.CollectionCount(0), GC.CollectionCount(1), GC.CollectionCount(2) };
stopWatch.Reset();
stopWatch.Start();
for (int i = 0; i < LoopCount; i++)
buffer[i % BufferSize] = func(i.ToString());
stopWatch.Stop();
Console.WriteLine("{0}\t{1}\t{2}\t{3}\t{4}",
typeof(T).Name.PadRight(20),
stopWatch.ElapsedMilliseconds,
(GC.CollectionCount(0) - initialCollectionCounts[0]),
(GC.CollectionCount(1) - initialCollectionCounts[1]),
(GC.CollectionCount(2) - initialCollectionCounts[2])
);
}
}
}
Edit: Updated code with UnsafeFixedString that does the required work:
using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.Linq;
using System.Runtime.InteropServices;
using System.Text;
using System.Threading.Tasks;
namespace ConsoleApplication
{
class Program
{
public unsafe struct UnsafeFixedString
{
private int length;
private fixed char str[256];
public UnsafeFixedString(int length)
{
this.length = length;
}
public static implicit operator UnsafeFixedString(string str)
{
var obj = new UnsafeFixedString(str.Length);
for (int i = 0; i < str.Length; i++)
obj.str[i] = str[i];
return obj;
}
}
const int BufferSize = 1000000;
const int LoopCount = 10000000;
static void Main(string[] args)
{
Console.WriteLine("{0}\t{1}\t{2}\t{3}\t{4}",
"Type".PadRight(20), "Time", "GC(0)", "GC(1)", "GC(2)");
Console.WriteLine();
for (int i = 0; i < 5; i++)
{
TestPerformance(s => s);
TestPerformance<UnsafeFixedString>(s => s);
Console.WriteLine();
}
Console.ReadKey();
}
private static void TestPerformance<T>(Func<string, T> func)
{
var buffer = new T[BufferSize];
GC.Collect(2);
Stopwatch stopWatch = new Stopwatch();
var initialCollectionCounts = new int[] { GC.CollectionCount(0), GC.CollectionCount(1), GC.CollectionCount(2) };
stopWatch.Reset();
stopWatch.Start();
for (int i = 0; i < LoopCount; i++)
buffer[i % BufferSize] = func(String.Format("{0}", i));
stopWatch.Stop();
Console.WriteLine("{0}\t{1}\t{2}\t{3}\t{4}",
typeof(T).Name.PadRight(20),
stopWatch.ElapsedMilliseconds,
(GC.CollectionCount(0) - initialCollectionCounts[0]),
(GC.CollectionCount(1) - initialCollectionCounts[1]),
(GC.CollectionCount(2) - initialCollectionCounts[2])
);
}
}
}
Output on my computer is:
Type Time GC(0) GC(1) GC(2)
String 5746 160 71 19
UnsafeFixedString 5345 418 0 0
It should not be a surprise that a struct
with a string
field makes do difference here: a string
field is always simply a reference to an object on the managed heap - specifically, a string
object somewhere. The string
will still exist and still cause GC2 eventually.
The only way to "fix" this is to not have it as an object at all; and the only way to do that (without going completely outside of managed memory) is to use a fixed
buffer:
public unsafe struct FixedString
{
private fixed char str[100];
}
Here, every struct instance FixedString
has 200 bytes reserved for the data. str
is simply a relative offset to the char*
that marks the start of this reservation. However, working with this is tricky - and requires unsafe
code throughout. Also note that every FixedString
reserves the same amount of space regardless of whether you actually want to store 3 characters or 170. To avoid memory issues, you would either need to use null-teriminators, or store the payload length separately.
Note that in .NET 4.5, the <gcAllowVeryLargeObjects>
support makes it possible to have a decent sized array of such values (a FixedString[]
, for example) - but note that you don't want to copy the data very often. To avoid that, you would want to always allow spare space in the array (so you don't copy the entire array just to add one item), and work with individual items via ref
, i.e.
FixedString[] data = ...
int index = ...
ProcessItem(ref data[index]);
void ProcessItem(ref FixedString item) {
// ...
}
Here item
is talking directly to the element in the array - we have not copied the data out at any point.
Now we only have one object - the array itself.
const int BufferSize = 1000000;
Your buffer is simply too large, thus being capable of storing a string reference for too long and allowing them to be promoted past gen#1. Experimenting with the buffer size provides this solution:
const int BufferSize = 180000;
No more GC(2) collections.
You could infer the gen#1 heap size from this. Albeit it is difficult to do for this test program, the string sizes are too variable. Hand-tuning would be required in a real app anyway.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With