Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is Where on an Array (of a struct type) optimized to avoid needless copying of struct values?

For memory performance reasons I have an array of structures since the number of items is large and the items get tossed regularly and hence thrashing the GC heap. This is not a question of whether I should use large structures; I have already determined GC trashing is causing performance problems. My question is when I need to process this array of structures, should I avoid using LINQ? Since the structure is not small it is not wise to pass it around by value, and I have no idea if the LINQ code generator is smart enough to do this or not. The structure looks like this:

public struct ManufacturerValue
{
    public int ManufacturerID;
    public string Name;
    public string CustomSlug;
    public string Title;
    public string Description;
    public string Image;
    public string SearchFilters;
    public int TopZoneProduction;
    public int TopZoneTesting;
    public int ActiveProducts;
}

So let's say we have an array of these values and I want to extract a dictionary of custom slugs to manufacturers ID's. Before I changed this to a structure it was a class, so the original code was written with a simple LINQ query:

ManufacturerValue[] = GetManufacturerValues();
var dict = values.Where(p => !string.IsNullOrEmpty(p.CustomSlug))
                 .ToDictionary(p => p.CustomSlug, p => p.ManufacturerID);

My concern is I want to understand how LINQ is going to generate the actual code to build this dictionary. My suspicion is that internally the LINQ code is going to end up something like this naive implementation:

var dict = new Dictionary<string, int>();
for (var i = 0; i < values.Length; i++) {
    var value = values[i];
    if (!string.IsNullOrEmpty(value.CustomSlug)) {
        dict.Add(value.CustomSlug, value.ManufacturerID);
    }
}

which would be bad, because the third line is going to create a local copy of the structure, which will be slow because the structure is large and will instead thrash the memory bus. We also do not need anything but the ID and custom slug from it so it will copy a lot of useless information on every iteration. Rather if I coded it efficiently myself, I would write it like this:

var dict = new Dictionary<string, int>();
for (var i = 0; i < values.Length; i++) {
    if (!string.IsNullOrEmpty(values[i].CustomSlug)) {
        dict.Add(values[i].CustomSlug, values[i].ManufacturerID);
    }
}

So does anyone know if the code generator is smart enough to use simple array indexing like the second example when generator code to run over arrays of structures, or will it implement the more naive but slower first implementation?

What is the best way to decompile this kind of code to find out what the code generator would actually do for this?

UPDATE

These changes are now in production. As it turns out in the process of re-writing the code and using the Dot Memory profiler to identify how much memory was being used and where, I found two memory leaks in the Phalanger PHP compiler code. That was one of the reasons the amount of memory our processes were using kept growing, and one of the memory leaks was really nasty and actually caused by the Microsoft Async code (probably worth a blog or a stack overflow question/answer to help others avoid it).

Anyway, once I found the memory leaks and fixed them I pushed that code live without any of the memory optimizations to convert from classes to structures, and oddly enough this actually caused the GC to thrash even more. I was seeing periods of time when the GC would be using up to 27% of the CPU according to the performance counters. Most likely these big blocks were previously not getting GC'ed due to the memory leaks, so they simply hung around. Once the code was fixed the GC started behaving even worse than before.

Finally we finished up the code to convert these classes to structures using the feedback in this question, and now our total memory usage at peak is about 50% of what it was, it rapidly drops down when the load on the server goes away and more importantly we are seeing only 0.05% of the CPU being used for GC, if even that. So if anyone is wondering whether these changes can have an impact on the real world, they really can, especially if you have objects that normally hang around for a while so get stuck in the 2nd gen heap and then need to get tossed and garbage collected.

like image 906
Kendall Bennett Avatar asked Mar 19 '16 22:03

Kendall Bennett


People also ask

How do you make a struct immutable?

Declare immutable structs as readonly Declare a readonly struct to indicate that a type is immutable. The readonly modifier informs the compiler that your intent is to create an immutable type.

How is an array of structs stored in memory?

An array is a contiguous block of memory in c, so an array of structs will have a size of sizeof(struct b)*number_of_structs . All of the structs are allocated within the array.

Which is faster struct or class?

Classes are reference types, and structs are value types. If class inheritance is not needed, structs are faster and more memory efficient.


1 Answers

What is the best way to decompile this kind of code to find out what the code generator would actually do for this?

There is no need to decompile the code. All LINQ to Objects method implementation can be seen at Reference Source.

Regarding your concrete question. You can expect a lot of struct copy operations when using LINQ (and in general IEnumerable<T> and Func<T, ..> based methods).

For instance, the current element of IEnumerator<T> is accessed via property Current defined as follows

T Current { get; }

so accessing at least involves one copy. But enumerator implementations usually store the current element into a field during the MoveNext method, so I would say you can safely count 2 copy operations.

And of course, every Func<T, ...> will cause another copy because T is input argument.

So in general you should avoid LINQ in such scenarios.

Or, you can use the old school technique of simulating reference via array and index. So instead of this:

var dict = values
    .Where(p => !string.IsNullOrEmpty(p.CustomSlug))
    .ToDictionary(p => p.CustomSlug, p => p.ManufacturerID);

you can avoid struct copy by using this:

var dict = Enumerable.Range(0, values.Length)
    .Where(i => !string.IsNullOrEmpty(values[i].CustomSlug))
    .ToDictionary(i => values[i].CustomSlug, i => values[i].ManufacturerID);

UPDATE: Since seems like there is an interest to the subject, I'll provide you a variation of the last technique which can make your life easier still avoiding the excessive struct copy.

Let say your ManufacturerValue was a class and you have used a lot of LINQ queries like the one in the example. Then you switched to a struct.

You can also create a wrapper struct and helper extension method like this

public struct ManufacturerValue
{
    public int ManufacturerID;
    public string Name;
    public string CustomSlug;
    public string Title;
    public string Description;
    public string Image;
    public string SearchFilters;
    public int TopZoneProduction;
    public int TopZoneTesting;
    public int ActiveProducts;
}

public struct ManufacturerValueRef
{
    public readonly ManufacturerValue[] Source;
    public readonly int Index;
    public ManufacturerValueRef(ManufacturerValue[] source, int index) { Source = source; Index = index; }
    public int ManufacturerID => Source[Index].ManufacturerID;
    public string Name => Source[Index].Name;
    public string CustomSlug => Source[Index].CustomSlug;
    public string Title => Source[Index].Title;
    public string Description => Source[Index].Description;
    public string Image => Source[Index].Image;
    public string SearchFilters => Source[Index].SearchFilters;
    public int TopZoneProduction => Source[Index].TopZoneProduction;
    public int TopZoneTesting => Source[Index].TopZoneTesting;
    public int ActiveProducts => Source[Index].ActiveProducts;
}

public static partial class Utils
{
    public static IEnumerable<ManufacturerValueRef> AsRef(this ManufacturerValue[] values)
    {
        for (int i = 0; i < values.Length; i++)
            yield return new ManufacturerValueRef(values, i);
    }
}

It's additional (one time) effort, but with the following benefits:

(1) It's a struct, but with a fixed size, so the copy overhead will be negligible compared to normal reference (one additional int).
(2) You can extend the actual data struct size w/o worry.
(3) All you need to do with your LINQ queries is to add .AsRef()

Sample:

var dict = values.AsRef()
    .Where(p => !string.IsNullOrEmpty(p.CustomSlug))
    .ToDictionary(p => p.CustomSlug, p => p.ManufacturerID);
like image 99
Ivan Stoev Avatar answered Oct 15 '22 10:10

Ivan Stoev