Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Complete semantics of Cpblk opcode in MSIL

Tags:

.net

clr

The MSDN documentation for cpblk is a bit sparse:

The cpblk instruction copies a number (type unsigned int32) of bytes from a source address (of type *, native int, or &) to a destination address (of type *, native int, or &). The behavior of cpblk is unspecified if the source and destination areas overlap.

cpblk assumes that both the source and destination addressed are aligned to the natural size of the machine. The cpblk instruction can be immediately preceded by theunaligned. instruction to indicate that either the source or the destination is unaligned.

Ok, compared to other bulk copy operations such as Array.Copy, Marshal.Copy, and Buffer.BlockCopy, we know that:

  • The size is measured in bytes
  • The pointers should be aligned

This leaves me with some questions:

  • Should the buffers be pinned first? Does it matter whether the operand type is native int, "unmanaged pointer" or "managed pointer (&)"?
  • Are there restrictions on the type? (for example, Buffer.BlockCopy only works on primitive types, not structures even if they contain only primitive types)

According to https://stackoverflow.com/a/26380105/103167 pinning is unnecessary, but the supporting explanation is just wrong. (I suspect it is an overgeneralization from the fact that the Large Object Heap isn't compacted)

ECMA-335 isn't very helpful either. The instruction description there contains the same verbiage and adds

[Rationale: cpblk is intended for copying structures (rather than arbitrary byte-runs). All such structures, allocated by the CLI, are naturally aligned for the current platform. Therefore, there is no need for the compiler that generates cpblk instructions to be aware of whether the code will eventually execute on a 32-bit or 64-bit platform. end rationale]

Ok, this sounds like it should accept more types than Buffer.BlockCopy. But still not arbitrary types.

Perhaps the newly released .NET core source code will hold some answers.

like image 446
Ben Voigt Avatar asked Dec 03 '14 19:12

Ben Voigt


1 Answers

cpblk and its companion, initblk, map directly to the intrinsics that any native language compiler depends on to initialize and copy structures. No need to wait for .NETCore source, you can see their semantics from SSCLI20, clr/src/fjit/fjitdef.h. A simple jitter, it converts cpblk directly to a call to memcpy(), initblk to memset(). The same intrinsics that a C compiler uses.

No regard for GC of course, the C# and VB.NET compilers don't use these opcodes at all. But the C++/CLI compiler does, a simple example:

using namespace System;

struct s { int a; int b;  };

int main(array<System::String ^> ^args)
{
    s var = {};        // initblk
    s cpy = var;       // cpblk
    return 0;
}

Optimized MSIL:

.method assembly static int32  main(string[] args) cil managed
{
  // Code size       34 (0x22)
  .maxstack  3
  .locals ([0] valuetype s cpy,
           [1] valuetype s var)
  IL_0000:  ldloca.s   var
  IL_0002:  ldc.i4.0
  IL_0003:  ldc.i4.8
  IL_0004:  initblk
  IL_0006:  ldloca.s   cpy
  IL_0008:  ldloca.s   var
  IL_000a:  ldc.i4.8
  IL_000b:  cpblk
  ...
}

The current .NET jitters generate inline code with simple register moves for small structures, REP STOS/MOVS for large ones. Very similar to what Buffer.Memcpy() does.

like image 77
Hans Passant Avatar answered Nov 15 '22 16:11

Hans Passant