I'm working on a PowerPC machine with in-core crypto. I'm having trouble porting AES key expansion from big endian to little endian using built-ins. Big endian works, but little endian does not. The algorithm below is the snippet presented in an IBM blog article. I think I have the issue isolated to line 2 below: <pre class="prettyprint"><code>typedef __vector unsigned char uint8x16_p8; uint8x64_p8 r0 = {0}; r3 = vec_perm(r1, r1, r5); /* line 1 */ r6 = vec_sld(r0, r1, 12); /* line 2 */ r3 = vcipherlast(r3, r4); /* line 3 */ r1 = vec_xor(r1, r6); /* line 4 */ r6 = vec_sld(r0, r6, 12); /* line 5 */ r1 = vec_xor(r1, r6); /* line 6 */ r6 = vec_sld(r0, r6, 12); /* line 7 */ r1 = vec_xor(r1, r6); /* line 8 */ r4 = vec_add(r4, r4); /* line 9 */ // r1 is ready for next round r1 = vec_xor(r1, r3); /* line 10 */ </code></pre> Upon entering the function, both big endian and little endian have the following parameters: <pre class="prettyprint"><code>(gdb) p r1 $1 = {0x2b, 0x7e, 0x15, 0x16, 0x28, 0xae, 0xd2, 0xa6, 0xab, 0xf7, 0x15, 0x88, 0x9, 0xcf, 0x4f, 0x3c} (gdb) p r5 $2 = {0xd, 0xe, 0xf, 0xc, 0xd, 0xe, 0xf, 0xc, 0xd, 0xe, 0xf, 0xc, 0xd, 0xe, 0xf, 0xc} </code></pre> However, after executing line 2, <code>r6</code> has the value: Little endian machine: <pre class="prettyprint"><code>(gdb) p r6 $3 = {0x28, 0xae, 0xd2, 0xa6, 0xab, 0xf7, 0x15, 0x88, 0x9, 0xcf, 0x4f, 0x3c, 0x0, 0x0, 0x0, 0x0} (gdb) p $vs0 $3 = {uint128 = 0x8815f7aba6d2ae28000000003c4fcf09, v2_double = { 4.9992689728788323e-315, -1.0395462025288474e-269}, v4_float = { 0.0126836384, 0, -1.46188823e-15, -4.51291888e-34}, v4_int32 = { 0x3c4fcf09, 0x0, 0xa6d2ae28, 0x8815f7ab}, v8_int16 = {0xcf09, 0x3c4f, 0x0, 0x0, 0xae28, 0xa6d2, 0xf7ab, 0x8815}, v16_int8 = {0x9, 0xcf, 0x4f, 0x3c, 0x0, 0x0, 0x0, 0x0, 0x28, 0xae, 0xd2, 0xa6, 0xab, 0xf7, 0x15, 0x88}} </code></pre> Big endian machine: <pre class="prettyprint"><code>(gdb) p r6 $4 = {0x0, 0x0, 0x0, 0x0, 0x2b, 0x7e, 0x15, 0x16, 0x28, 0xae, 0xd2, 0xa6, 0xab, 0xf7, 0x15, 0x88} </code></pre> Notice the odd rotation on the little endian machine. When I disassemble on the little endian machine after line 2 executes: <pre class="prettyprint"><code> (gdb) disass $pc <skip multiple pages> 0x0000000010000dc8 <+168>: lxvd2x vs12,r31,r9 0x0000000010000dcc <+172>: xxswapd vs12,vs12 0x0000000010000dd0 <+176>: xxlor vs32,vs0,vs0 0x0000000010000dd4 <+180>: xxlor vs33,vs12,vs12 0x0000000010000dd8 <+184>: vsldoi v0,v0,v1,12 0x0000000010000ddc <+188>: xxlor vs0,vs32,vs32 0x0000000010000de0 <+192>: xxswapd vs0,vs0 0x0000000010000de4 <+196>: li r9,64 0x0000000010000de8 <+200>: stxvd2x vs0,r31,r9 => 0x0000000010000dec <+204>: li r9,48 0x0000000010000df0 <+208>: lxvd2x vs0,r31,r9 0x0000000010000df4 <+212>: xxswapd vs34,vs0 (gdb) p $v0 $5 = void (gdb) p $vs0 $4 = {uint128 = 0x8815f7aba6d2ae28000000003c4fcf09, v2_double = { 4.9992689728788323e-315, -1.0395462025288474e-269}, v4_float = { 0.0126836384, 0, -1.46188823e-15, -4.51291888e-34}, v4_int32 = { 0x3c4fcf09, 0x0, 0xa6d2ae28, 0x8815f7ab}, v8_int16 = {0xcf09, 0x3c4f, 0x0, 0x0, 0xae28, 0xa6d2, 0xf7ab, 0x8815}, v16_int8 = {0x9, 0xcf, 0x4f, 0x3c, 0x0, 0x0, 0x0, 0x0, 0x28, 0xae, 0xd2, 0xa6, 0xab, 0xf7, 0x15, 0x88}} </code></pre> I have no idea why <code>r6</code> is not the expected value. Ideally I would examine the vsx register on both machines. Unfortunately GDB is also problematic on both machines so I can't do things like disassemble and print vector registers. Is <code>vec_sld</code> endian sensitive? Or is there something else wrong?

Little endian with PowerPC/AltiVec can get a little mind-bending at times - if you need to make your code work with both big and little endian then it helps to define some portability macros, e.g. for <code>vec_sld</code>: <pre class="prettyprint"><code>#ifdef __BIG_ENDIAN__ #define VEC_SLD(va, vb, shift) vec_sld(va, vb, shift) #else #define VEC_SLD(va, vb, shift) vec_sld(vb, va, 16 - (shift)) #endif </code></pre> You'll probably find this helpful for all intrinsics which involve horizontal/positional operations or narrowing/widening, e.g. <code>vec_merge</code>, <code>vec_pack</code> et al, <code>vec_unpack</code>, <code>vec_perm</code>, <code>vec_mule</code>/<code>vec_mulo</code>, <code>vec_splat</code>, <code>vec_lvsl</code>/<code>vec_lvsr</code>, etc.

Is vec_sld endian sensitive?

Tags:

c

simd

endianness

powerpc

altivec

I'm working on a PowerPC machine with in-core crypto. I'm having trouble porting AES key expansion from big endian to little endian using built-ins. Big endian works, but little endian does not.

The algorithm below is the snippet presented in an IBM blog article. I think I have the issue isolated to line 2 below:

typedef __vector unsigned char  uint8x16_p8;
uint8x64_p8 r0 = {0};

r3 = vec_perm(r1, r1, r5);       /* line  1 */
r6 = vec_sld(r0, r1, 12);        /* line  2 */
r3 = vcipherlast(r3, r4);        /* line  3 */

r1 = vec_xor(r1, r6);            /* line  4 */
r6 = vec_sld(r0, r6, 12);        /* line  5 */
r1 = vec_xor(r1, r6);            /* line  6 */
r6 = vec_sld(r0, r6, 12);        /* line  7 */
r1 = vec_xor(r1, r6);            /* line  8 */
r4 = vec_add(r4, r4);            /* line  9 */

// r1 is ready for next round
r1 = vec_xor(r1, r3);            /* line 10 */

Upon entering the function, both big endian and little endian have the following parameters:

(gdb) p r1
$1 = {0x2b, 0x7e, 0x15, 0x16, 0x28, 0xae, 0xd2, 0xa6, 0xab, 0xf7, 0x15, 0x88,
  0x9, 0xcf, 0x4f, 0x3c}
(gdb) p r5
$2 = {0xd, 0xe, 0xf, 0xc, 0xd, 0xe, 0xf, 0xc, 0xd, 0xe, 0xf, 0xc, 0xd, 0xe,
  0xf, 0xc}

However, after executing line 2, r6 has the value:

Little endian machine:

(gdb) p r6
$3 = {0x28, 0xae, 0xd2, 0xa6, 0xab, 0xf7, 0x15, 0x88, 0x9, 0xcf, 0x4f, 0x3c,
  0x0, 0x0, 0x0, 0x0}

(gdb) p $vs0
$3 = {uint128 = 0x8815f7aba6d2ae28000000003c4fcf09, v2_double = {
    4.9992689728788323e-315, -1.0395462025288474e-269}, v4_float = {
    0.0126836384, 0, -1.46188823e-15, -4.51291888e-34}, v4_int32 = {
    0x3c4fcf09, 0x0, 0xa6d2ae28, 0x8815f7ab}, v8_int16 = {0xcf09, 0x3c4f, 0x0,
    0x0, 0xae28, 0xa6d2, 0xf7ab, 0x8815}, v16_int8 = {0x9, 0xcf, 0x4f, 0x3c,
    0x0, 0x0, 0x0, 0x0, 0x28, 0xae, 0xd2, 0xa6, 0xab, 0xf7, 0x15, 0x88}}

Big endian machine:

(gdb) p r6
$4 = {0x0, 0x0, 0x0, 0x0, 0x2b, 0x7e, 0x15, 0x16, 0x28, 0xae, 0xd2, 0xa6,
  0xab, 0xf7, 0x15, 0x88}

Notice the odd rotation on the little endian machine.

When I disassemble on the little endian machine after line 2 executes:

 (gdb) disass $pc
 <skip multiple pages>

    0x0000000010000dc8 <+168>:   lxvd2x  vs12,r31,r9
    0x0000000010000dcc <+172>:   xxswapd vs12,vs12
    0x0000000010000dd0 <+176>:   xxlor   vs32,vs0,vs0
    0x0000000010000dd4 <+180>:   xxlor   vs33,vs12,vs12
    0x0000000010000dd8 <+184>:   vsldoi  v0,v0,v1,12
    0x0000000010000ddc <+188>:   xxlor   vs0,vs32,vs32
    0x0000000010000de0 <+192>:   xxswapd vs0,vs0
    0x0000000010000de4 <+196>:   li      r9,64
    0x0000000010000de8 <+200>:   stxvd2x vs0,r31,r9
 => 0x0000000010000dec <+204>:   li      r9,48
    0x0000000010000df0 <+208>:   lxvd2x  vs0,r31,r9
    0x0000000010000df4 <+212>:   xxswapd vs34,vs0

(gdb) p $v0
$5 = void

(gdb) p $vs0
$4 = {uint128 = 0x8815f7aba6d2ae28000000003c4fcf09, v2_double = {
    4.9992689728788323e-315, -1.0395462025288474e-269}, v4_float = {
    0.0126836384, 0, -1.46188823e-15, -4.51291888e-34}, v4_int32 = {
    0x3c4fcf09, 0x0, 0xa6d2ae28, 0x8815f7ab}, v8_int16 = {0xcf09, 0x3c4f, 0x0,
    0x0, 0xae28, 0xa6d2, 0xf7ab, 0x8815}, v16_int8 = {0x9, 0xcf, 0x4f, 0x3c,
    0x0, 0x0, 0x0, 0x0, 0x28, 0xae, 0xd2, 0xa6, 0xab, 0xf7, 0x15, 0x88}}

I have no idea why r6 is not the expected value. Ideally I would examine the vsx register on both machines. Unfortunately GDB is also problematic on both machines so I can't do things like disassemble and print vector registers.

Is vec_sld endian sensitive? Or is there something else wrong?

721

asked Sep 21 '17 10:09

jww

1 Answers

Little endian with PowerPC/AltiVec can get a little mind-bending at times - if you need to make your code work with both big and little endian then it helps to define some portability macros, e.g. for vec_sld:

#ifdef __BIG_ENDIAN__
  #define VEC_SLD(va, vb, shift) vec_sld(va, vb, shift)
#else
  #define VEC_SLD(va, vb, shift) vec_sld(vb, va, 16 - (shift))
#endif

You'll probably find this helpful for all intrinsics which involve horizontal/positional operations or narrowing/widening, e.g. vec_merge, vec_pack et al, vec_unpack, vec_perm, vec_mule/vec_mulo, vec_splat, vec_lvsl/vec_lvsr, etc.

101

answered Oct 15 '22 01:10

Paul R

Related questions
                            
                                Why use integration for a fixed timestep game loop? (Gaffer on Games)
                            
                                Swift call C call Swift?
                            
                                Print a wide unicode character with ncurses
                            
                                Does bidirectional popen() work on Mac OS X in C?
                            
                                How to assemble a float from two bytes?
                            
                                structures with functions and python ctypes
                            
                                Manage multiple RSA keys/certs in a PKCS#12 structure
                            
                                assembly lea instruction of int *q = p++ and int c = a++
                            
                                Can we use zero-copy for TCP send/recv with the default linux TCP/IP-stack?
                            
                                How to debug shared libraries compiled with debug information
                            
                                Formula for memory alignment
                            
                                Is the stack frame required for all functions in C on x86-64?
                            
                                Missing definitions in Headerfile dh.h (openssl 1.1.0f)
                            
                                Execute commands as root without root password or sudo
                            
                                How to take character input in an array in C?
                            
                                Detect overflow when converting integral to floating types
                            
                                embedded perl in C - set working directory
                            
                                How to convert photos to Polaroid-like programmatically?
                            
                                error: unknown conversion type character 'l' in format - scanning long long
                            
                                Making a dynamic array that accepts any type in C

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With