Facts:
The correct encoding for the CIL instruction rethrow
's op-code is the two-byte sequence FE 1A
.
OpCodes.Rethrow.Value
(which has type short
) has value 0xFE1A
on my little-endian machine.
BitConverter
honours the machine's endianness when converting to/from byte sequences.
On my little-endian machine, BitConverter.GetBytes(OpCodes.Rethrow.Value)
results in the byte sequence 1A FE
.
That means, serializing an OpCode.Value
on a little-endian machine using BitConverter
does not produce the correct encoding for the op-code; the byte order is reversed.
Questions:
Is the byte ordering of OpCode.Value
documented (and if so, where?), or is it an "implementation detail"?
Does step 4 above on a big-endian machine also result in the wrong byte ordering? That is, would OpCodes.Rethrow.Value
be 0x1AFE
on a big-endian machine?
The Value property looks like this in the Reference Source:
public short Value
{
get
{
if (m_size == 2)
return (short) (m_s1 << 8 | m_s2);
return (short) m_s2;
}
}
That looks entirely sane of course, m_s2 is always the Least Significant Byte. Looking at ILGenerator:
internal void InternalEmit(OpCode opcode)
{
if (opcode.m_size == 1)
{
m_ILStream[m_length++] = opcode.m_s2;
}
else
{
m_ILStream[m_length++] = opcode.m_s1;
m_ILStream[m_length++] = opcode.m_s2;
}
UpdateStackSize(opcode, opcode.StackChange());
}
Which is want you expected, the 0xfe byte gets emitted first.
So the framework code carefully avoids taking a dependency on endian-ness. CIL doesn't have an endian-ness dependency, no variable length data ever does. True for text files, utf-8 encoding, x86 core machine code instructions. An CIL. So if you convert variable length data to a single value, like the Value property getter does, then that code inevitable does make a conversion from non-endian-ness data to endian-ness data. Which inevitably gets half of the world upset because they think it was the wrong way around. And 100% of all programmers that run into it.
Probably the best way is to do it like the framework does and recover m_s1 and m_s2 as quickly as you can, using your own version of the Opcode type. Easy to do with:
foo.m_s1 = opc.Value >> 8;
foo.m_s2 = opc.Value & 0xff;
foo.m_size = opc.Size;
Which has no endian-ness dependency.
I've reached the conclusion that serializing an op-code representation based on the OpCode.Value
property, i.e.:
OpCode someOpCode = …;
byte[] someOpCodeEncoding = BitConverter.GetBytes(someOpCode.Value);
is a bad idea, but not because of the use of BitConverter.GetBytes(short)
, whose behaviour is well-documented. The main culprit is the OpCode.Value
property, whose documentation is vague in two respects:
It states that this property contains "the value of the immediate operand", which may or may not refer to the op-code's encoding; that term doesn't appear anywhere in the CLI specification.
Even when we assume that it does in fact contain an op-code's encoding, the documentation says nothing about byte order. (Byte order comes into play when converting between byte[]
and short
.)
Why am I basing my argument on MSDN documentation, and not on the CLI standard? Because System.Reflection.Emit
is not part of the Reflection Library as defined by the CLI standard. For this reason, I think it's fairly safe to say that the MSDN reference documentation for this namespace is as close as it gets to an official specification. (But unlike @Hans Passant's answer, I would not take one step further and claim that the reference source is in any way a specification.)
Conclusion:
There are two ways to output the op-code encoding for a given OpCode
object:
Stay with System.Reflection.Emit
functionality and use ILGenerator.Emit(someOpCode)
. This may be too restrictive in some situations.
Create your own mapping between op-code encodings (i.e. byte[]
sequences) and the various OpCode
objects.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With