Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Buffer.BlockCopy vs Array.Copy curiosity

Tags:

c#

I've been toying around with some .NET features (namely Pipelines, Memory, and Array Pools) for high speed file reading/parsing. I came across something interesting while playing around with Array.Copy, Buffer.BlockCopy and ReadOnlySequence.CopyTo. The IO Pipeline reads data as byte and I'm attempting to efficiently turn it into char.

While playing around with Array.Copy I found that I am able to copy from byte[] to char[] and the compiler (and runtime) are more than happy to do it.

char[] outputBuffer = ArrayPool<char>.Shared.Rent(inputBuffer.Length);
Array.Copy(buffer, 0, outputBuffer, 0, buffer.Length);

This code runs as expected, though I'm sure there are some UTF edge cases not properly handled here.

My curiosity comes with Buffer.BlockCopy

char[] outputBuffer = ArrayPool<char>.Shared.Rent(inputBuffer.Length);
Buffer.BlockCopy(buffer, 0, outputBuffer, 0, buffer.Length);

The resulting contents of outputBuffer are garbage. For example, with the example contents of buffer as

{ 50, 48, 49, 56, 45 }

The contents of outputBuffer after the copy is

{ 12338, 14385, 12333, 11575, 14385 }

I'm just curious what is happening "under the hood" inside the CLR that is causing these 2 commands to output such different results.

like image 344
Pete Garafano Avatar asked Aug 12 '18 15:08

Pete Garafano


1 Answers

Array.Copy() is smarter about the element type. It will try to use the memmove() CRT function when it can. But will fall back to a loop that copies each element when it can't. Converting them as necessary, it considers boxing and primitive type conversions. So one element in the source array will become one element in the destination array.

Buffer.BlockCopy() skips all that and blasts with memmove(). No conversions are considered. Which is why it can be slightly faster. And easier to mislead you about the array content. Do note that utf8 encoded character data is visible in that array, 12338 == 0x3032 = "2 ", 14385 = 0x3831 = "18", etc. Easier to see with Debug > Windows > Memory > Memory 1.

Noteworthy perhaps is that this type-coercion is a feature. Say when you receive an int[] through a socket or pipe but have the data in a byte[] buffer. By far the fastest way to do it.

like image 107
Hans Passant Avatar answered Oct 03 '22 01:10

Hans Passant