Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does `OpCode.Value` have the "wrong" endianness?

Facts:

  1. The correct encoding for the CIL instruction rethrow's op-code is the two-byte sequence FE 1A.

  2. OpCodes.Rethrow.Value (which has type short) has value 0xFE1A on my little-endian machine.

  3. BitConverter honours the machine's endianness when converting to/from byte sequences.

  4. On my little-endian machine, BitConverter.GetBytes(OpCodes.Rethrow.Value) results in the byte sequence 1A FE.

That means, serializing an OpCode.Value on a little-endian machine using BitConverter does not produce the correct encoding for the op-code; the byte order is reversed.

Questions:

  • Is the byte ordering of OpCode.Value documented (and if so, where?), or is it an "implementation detail"?

  • Does step 4 above on a big-endian machine also result in the wrong byte ordering? That is, would OpCodes.Rethrow.Value be 0x1AFE on a big-endian machine?

like image 294
stakx - no longer contributing Avatar asked Aug 18 '12 00:08

stakx - no longer contributing


2 Answers

The Value property looks like this in the Reference Source:

public short Value
{
    get
    {
        if (m_size == 2)
            return (short) (m_s1 << 8 | m_s2);
        return (short) m_s2;
    }
}

That looks entirely sane of course, m_s2 is always the Least Significant Byte. Looking at ILGenerator:

    internal void InternalEmit(OpCode opcode)
    {
        if (opcode.m_size == 1)
        {
            m_ILStream[m_length++] = opcode.m_s2;
        }
        else
        {
            m_ILStream[m_length++] = opcode.m_s1;
            m_ILStream[m_length++] = opcode.m_s2;
        }

        UpdateStackSize(opcode, opcode.StackChange());

    }

Which is want you expected, the 0xfe byte gets emitted first.

So the framework code carefully avoids taking a dependency on endian-ness. CIL doesn't have an endian-ness dependency, no variable length data ever does. True for text files, utf-8 encoding, x86 core machine code instructions. An CIL. So if you convert variable length data to a single value, like the Value property getter does, then that code inevitable does make a conversion from non-endian-ness data to endian-ness data. Which inevitably gets half of the world upset because they think it was the wrong way around. And 100% of all programmers that run into it.

Probably the best way is to do it like the framework does and recover m_s1 and m_s2 as quickly as you can, using your own version of the Opcode type. Easy to do with:

foo.m_s1 = opc.Value >> 8;
foo.m_s2 = opc.Value & 0xff;
foo.m_size = opc.Size;

Which has no endian-ness dependency.

like image 171
Hans Passant Avatar answered Oct 04 '22 22:10

Hans Passant


I've reached the conclusion that serializing an op-code representation based on the OpCode.Value property, i.e.:

OpCode someOpCode = …;
byte[] someOpCodeEncoding = BitConverter.GetBytes(someOpCode.Value);

is a bad idea, but not because of the use of BitConverter.GetBytes(short) , whose behaviour is well-documented. The main culprit is the OpCode.Value property, whose documentation is vague in two respects:

  1. It states that this property contains "the value of the immediate operand", which may or may not refer to the op-code's encoding; that term doesn't appear anywhere in the CLI specification.

  2. Even when we assume that it does in fact contain an op-code's encoding, the documentation says nothing about byte order. (Byte order comes into play when converting between byte[] and short.)

Why am I basing my argument on MSDN documentation, and not on the CLI standard? Because System.Reflection.Emit is not part of the Reflection Library as defined by the CLI standard. For this reason, I think it's fairly safe to say that the MSDN reference documentation for this namespace is as close as it gets to an official specification. (But unlike @Hans Passant's answer, I would not take one step further and claim that the reference source is in any way a specification.)

Conclusion:

There are two ways to output the op-code encoding for a given OpCode object:

  • Stay with System.Reflection.Emit functionality and use ILGenerator.Emit(someOpCode). This may be too restrictive in some situations.

  • Create your own mapping between op-code encodings (i.e. byte[] sequences) and the various OpCode objects.

like image 21
stakx - no longer contributing Avatar answered Oct 04 '22 22:10

stakx - no longer contributing