Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to avoid long-living strings to cause generation 2 garbage collection

I have an application where I keep log strings in circular buffers. When a log gets full, for every new insert, old strings will be released for garbage collection and then they are in generation 2 memory. Thus, eventually a generation 2 GC will happen, which I would like to avoid.

I tried to marshal the string into a struct. Surprisingly, I still get generation 2 GC:s. It seems the struct still keeps some reference to the string. Complete console app below. Any help appreciated.

using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.Linq;
using System.Runtime.InteropServices;
using System.Text;
using System.Threading.Tasks;

namespace ConsoleApplication
{
    class Program
    {

        [StructLayout(LayoutKind.Sequential)]
        public struct FixedString
        {
            [MarshalAs(UnmanagedType.ByValTStr, SizeConst = 256)]
            private string str;

            public FixedString(string str)
            {
                this.str = str;
            }
        }

        [StructLayout(LayoutKind.Sequential)]
        public struct UTF8PackedString
        {
            private int length;

            [MarshalAs(UnmanagedType.ByValArray, SizeConst = 256)]
            private byte[] str;

            public UTF8PackedString(int length)
            {
                this.length = length;
                str = new byte[length];
            }

            public static implicit operator UTF8PackedString(string str)
            {
                var obj = new UTF8PackedString(Encoding.UTF8.GetByteCount(str));
                var bytes = Encoding.UTF8.GetBytes(str);
                Array.Copy(bytes, obj.str, obj.length);
                return obj;
            }
        }

        const int BufferSize = 1000000;
        const int LoopCount = 10000000;

        static void Main(string[] args)
        {
            Console.WriteLine("{0}\t{1}\t{2}\t{3}\t{4}",
                "Type".PadRight(20), "Time", "GC(0)", "GC(1)", "GC(2)");
            Console.WriteLine();
            for (int i = 0; i < 5; i++)
            {
                TestPerformance<string>(s => s);
                TestPerformance<FixedString>(s => new FixedString(s));
                TestPerformance<UTF8PackedString>(s => s);
                Console.WriteLine();
            }
            Console.ReadKey();
        }

        private static void TestPerformance<T>(Func<string, T> func)
        {
            var buffer = new T[BufferSize];
            GC.Collect(2);
            Stopwatch stopWatch = new Stopwatch();
            var initialCollectionCounts = new int[] { GC.CollectionCount(0), GC.CollectionCount(1), GC.CollectionCount(2) };
            stopWatch.Reset();
            stopWatch.Start();
            for (int i = 0; i < LoopCount; i++)
                buffer[i % BufferSize] = func(i.ToString());
            stopWatch.Stop();
            Console.WriteLine("{0}\t{1}\t{2}\t{3}\t{4}",
                typeof(T).Name.PadRight(20),
                stopWatch.ElapsedMilliseconds,
                (GC.CollectionCount(0) - initialCollectionCounts[0]),
                (GC.CollectionCount(1) - initialCollectionCounts[1]),
                (GC.CollectionCount(2) - initialCollectionCounts[2])
            );
        }
    }
}

Edit: Updated code with UnsafeFixedString that does the required work:

using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.Linq;
using System.Runtime.InteropServices;
using System.Text;
using System.Threading.Tasks;

namespace ConsoleApplication
{
    class Program
    {
        public unsafe struct UnsafeFixedString
        {
            private int length;

            private fixed char str[256];

            public UnsafeFixedString(int length)
            {
                this.length = length;
            }

            public static implicit operator UnsafeFixedString(string str)
            {
                var obj = new UnsafeFixedString(str.Length);
                for (int i = 0; i < str.Length; i++)
                    obj.str[i] = str[i];                
                return obj;
            }
        }

        const int BufferSize = 1000000;
        const int LoopCount = 10000000;

        static void Main(string[] args)
        {
            Console.WriteLine("{0}\t{1}\t{2}\t{3}\t{4}",
                "Type".PadRight(20), "Time", "GC(0)", "GC(1)", "GC(2)");
            Console.WriteLine();
            for (int i = 0; i < 5; i++)
            {
                TestPerformance(s => s);
                TestPerformance<UnsafeFixedString>(s => s);
                Console.WriteLine();
            }
            Console.ReadKey();
        }

        private static void TestPerformance<T>(Func<string, T> func)
        {
            var buffer = new T[BufferSize];
            GC.Collect(2);
            Stopwatch stopWatch = new Stopwatch();
            var initialCollectionCounts = new int[] { GC.CollectionCount(0), GC.CollectionCount(1), GC.CollectionCount(2) };
            stopWatch.Reset();
            stopWatch.Start();
            for (int i = 0; i < LoopCount; i++)
                buffer[i % BufferSize] = func(String.Format("{0}", i));
            stopWatch.Stop();
            Console.WriteLine("{0}\t{1}\t{2}\t{3}\t{4}",
                typeof(T).Name.PadRight(20),
                stopWatch.ElapsedMilliseconds,
                (GC.CollectionCount(0) - initialCollectionCounts[0]),
                (GC.CollectionCount(1) - initialCollectionCounts[1]),
                (GC.CollectionCount(2) - initialCollectionCounts[2])
            );
        }
    }
}

Output on my computer is:

Type                    Time    GC(0)   GC(1)   GC(2)

String                  5746    160     71      19
UnsafeFixedString       5345    418     0       0
like image 597
Johan Nilsson Avatar asked Sep 23 '13 10:09

Johan Nilsson


2 Answers

It should not be a surprise that a struct with a string field makes do difference here: a string field is always simply a reference to an object on the managed heap - specifically, a string object somewhere. The string will still exist and still cause GC2 eventually.

The only way to "fix" this is to not have it as an object at all; and the only way to do that (without going completely outside of managed memory) is to use a fixed buffer:

public unsafe struct FixedString
{
    private fixed char str[100];
}

Here, every struct instance FixedString has 200 bytes reserved for the data. str is simply a relative offset to the char* that marks the start of this reservation. However, working with this is tricky - and requires unsafe code throughout. Also note that every FixedString reserves the same amount of space regardless of whether you actually want to store 3 characters or 170. To avoid memory issues, you would either need to use null-teriminators, or store the payload length separately.

Note that in .NET 4.5, the <gcAllowVeryLargeObjects> support makes it possible to have a decent sized array of such values (a FixedString[], for example) - but note that you don't want to copy the data very often. To avoid that, you would want to always allow spare space in the array (so you don't copy the entire array just to add one item), and work with individual items via ref, i.e.

FixedString[] data = ...
int index = ...
ProcessItem(ref data[index]);

void ProcessItem(ref FixedString item) {
    // ...
}

Here item is talking directly to the element in the array - we have not copied the data out at any point.

Now we only have one object - the array itself.

like image 106
Marc Gravell Avatar answered Sep 24 '22 09:09

Marc Gravell


    const int BufferSize = 1000000;

Your buffer is simply too large, thus being capable of storing a string reference for too long and allowing them to be promoted past gen#1. Experimenting with the buffer size provides this solution:

    const int BufferSize = 180000;

No more GC(2) collections.

You could infer the gen#1 heap size from this. Albeit it is difficult to do for this test program, the string sizes are too variable. Hand-tuning would be required in a real app anyway.

like image 30
Hans Passant Avatar answered Sep 23 '22 09:09

Hans Passant