C# .NET Core 2.1 Span and Memory Performance Considerations

Question

using System.Buffers;

const byte carriageReturn = (byte)'\r';
const int arbitrarySliceStart = 5;

// using Memory<T>
async Task<int> ReadAsyncWithMemory(Stream sourceStream, int bufferSize)
{
    var buffer = ArrayPool<byte>.Shared.Rent(bufferSize);
    var bytesRead = await sourceStream.ReadAsync(buffer);
    var memory = buffer.AsMemory(arbitrarySliceStart, bytesRead);
    var endOfNumberIndex = memory.Span.IndexOf(carriageReturn);
    var memoryChunk = memory.Slice(0, endOfNumberIndex);
    var number = BitConverter.ToInt32(memoryChunk.Span);
    ArrayPool<byte>.Shared.Return(buffer);
    return number;
}

// using Span<T> without assigning to variable
async Task<int> ReadAsyncWithSpan(Stream sourceStream, int bufferSize)
{
    var buffer = ArrayPool<byte>.Shared.Rent(bufferSize);
    var bytesRead = await sourceStream.ReadAsync(buffer);
    var endOfNumberIndex = buffer.AsSpan(arbitrarySliceStart, bytesRead).IndexOf(carriageReturn);
    var number = BitConverter.ToInt32(buffer.AsSpan(arbitrarySliceStart, bytesRead).Slice(0, endOfNumberIndex));
    ArrayPool<byte>.Shared.Return(buffer);
    return number;
}

// using Span<T> with additional local or private function
async Task<int> ReadAsyncWithSpanAndAdditionalFunction(Stream sourceStream, int bufferSize)
{
    var buffer = ArrayPool<byte>.Shared.Rent(bufferSize);
    var bytesRead = await sourceStream.ReadAsync(buffer);

    var number = SliceNumer();
    ArrayPool<byte>.Shared.Return(buffer);
    return number;

    int SliceNumer()
    {
        var span = buffer.AsSpan(arbitrarySliceStart, bytesRead);
        var endOfNumberIndex = span.IndexOf(carriageReturn);
        var numberSlice = span.Slice(0, endOfNumberIndex);
        return BitConverter.ToInt32(numberSlice);
    }
}

I read the MSDN and CodeMag articles about Span<T>, but I still had a question about their performance.

I understand that Span<T> is more performant than Memory<T>, but I guess I'd like to know to what degree. I have 3 example methods posted and I'd like to know which is the best approach.

1. Memory<T> only

The first function, ReadAsyncWithMemory, only uses Memory<T> to handle the work, pretty straightforward.

2. Span<T> with no local variables

In the second function, ReadAsyncWithSpan, Span<T> is used instead, but no local variables are created, and the call buffer.AsSpan(arbitrarySliceStart, bytesRead) is made twice, which seems clunky. However, if Span<T> is more performant than Memory<T>, is it worth the double call?

2. Span<T> with additional function

In the third function, ReadAsyncWithSpanAndAdditionalFunction, a local function is introduced so that Span<T> can be used for memory operations. Now the question is, is calling a new function and introducing a new stack frame worth the performance gains of using Span<T> over Memory<T>?

Final Questions

Does adding a local variable for a span cause additional overhead?
- Is it worth losing readability to just inline the Span<T> without assigning it to a variable?
Is calling an additional function in order to use Span<T> over Memory<T> worth the overhead of the new function and stack frame?
Is Memory<T> significantly less performant than Span<T> when it is constrained to just a stack frame and not allocated to the heap?

SensorSmith · Accepted Answer

Bugs: There are some bugs/distractions in your example (if edited out of the question remove this section).

AsMemory/AsSpan take a start index and length so buffer.AsSpan(arbitrarySliceStart, bytesRead) is a bug and could be just buffer.AsSpan(0, bytesRead). If you intended to skip the first arbitrarySliceStart bytes read it should have been buffer.AsSpan(arbitrarySliceStart, bytesRead-arbitrarySliceStart) with maybe a check for (bytesRead > arbitrarySliceStart).
A full example expecting to read an integer text field starting at a fixed offset into a stream and terminated by a carriage return would need a loop to ensure "enough" data is read (...and handle if "too much" was read, etc.), but that is outside the topic at hand.

This question seems to be about working around the compiler disallowing Span local variables in async functions. Hopefully, future versions will not not enforce this limitation if the Span variables' usage/lifetime does not cross await "calls".

Does adding a local variable for a span cause additional overhead?

No.

Well it could cause an extra assignment/copy operation of the underlying pointer and length fields that compose the Span (though not the memory range to which they refer). But even that should be optimized away or could happen with just the intermediate/temporaries anyway.

This isn't why the compiler "does not like" Span variables. Span variables have to stay on the stack or the referenced memory might get collected out from under them, i.e. so long as they stay on the stack SOMETHING ELSE that references the memory must still be "below them" on the stack. Async/await "functions" return at the point of each await call and then resume as continuations/state machine calls when the "awaited" Task completes.

NOTE: This isn't just about managed memory and the GC otherwise having to inspect Spans for references to GC tracked objects. Spans can refer to unmanaged memory or into chunks of tracked objects.

Is it worth losing readability to just inline the Span without assigning it to a variable?

Well that is directly a style/opinion question. However, "recreating" a Span means a function call but no allocations (just stack manipulation and accessing/copying a few integer sized items); and the call itself will be a good candidate for JIT inlining.

Is calling an additional function in order to use Span over Memory worth the overhead of the new function and stack frame?

Well getting that Memory will require a function call and stack frame (and a heap memory allocation). So it depends on how much you reuse that Memory. And... as normal if it isn't buried in a loop or require IO, then performance is likely a non-issue.

HOWEVER, be careful how you form that extra function. If you close over variables (like in your example), the compiler might emit a heap allocation to make that call.

Is Memory significantly less performant than Span when it is constrained to just a stack frame and not allocated to the heap?

Well, I don't think you CAN stackalloc a Memory<T> (itself), so what does this mean?

However, Span avoids one offset adjustment on indexing compared to Memory, so if you loop through a LOT of indexing, creating the Span outside that loop will pay a dividend. This is probably why methods like IndexOf were provided on Span, but not Memory.

Original Question: Which is best: Memory<T>, no local variables, additional function(s)?

Again this is a style/opinion question (unless you actually profile an under-performing application).

My opinion: Only use Span<T>s at function boundaries. Only use Memory<T>s for member variables. For "interior" code, just use start/length or start/end indexing variables AND NAME THEM CLEARLY. Clear names will help avoid more bugs than making lots of Spans/"Slices". If the function is so long that it is no longer clear what the variables mean, it is time to factor into sub-functions anyway.

C# .NET Core 2.1 Span<T> and Memory<T> Performance Considerations

Tags:

memory-management

c#

.net

memory

.net-core

Sharpiro

1 Answers

SensorSmith

Recent Activity

Donate For Us