Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

128-bit rotation using ARM Neon intrinsics

I'm trying to optimize my code using Neon intrinsics. I have a 24-bit rotation over a 128-bit array (8 each uint16_t).

Here is my c code:

uint16_t rotated[8];
uint16_t temp[8];
uint16_t j;
for(j = 0; j < 8; j++)
{
     //Rotation <<< 24  over 128 bits (x << shift) | (x >> (16 - shift)
     rotated[j] = ((temp[(j+1) % 8] << 8) & 0xffff) | ((temp[(j+2) % 8] >> 8) & 0x00ff);
}

I've checked the gcc documentation about Neon Intrinsics and it doesn't have instruction for vector rotations. Moreover, I've tried to do this using vshlq_n_u16(temp, 8) but all the bits shifted outside a uint16_t word are lost.

How to achieve this using neon intrinsics ? By the way is there a better documentation about GCC Neon Intrinsics ?

like image 893
Kami Avatar asked Jun 29 '12 09:06

Kami


2 Answers

After some reading on Arm Community Blogs, I've found this :

Neon Arm Bitwise Rotation

VEXT: Extract VEXT extracts a new vector of bytes from a pair of existing vectors. The bytes in the new vector are from the top of the first operand, and the bottom of the second operand. This allows you to produce a new vector containing elements that straddle a pair of existing vectors. VEXT can be used to implement a moving window on data from two vectors, useful in FIR filters. For permutation, it can also be used to simulate a byte-wise rotate operation, when using the same vector for both input operands.

The following Neon GCC Intrinsic does the same as the assembly provided in the picture :

uint16x8_t vextq_u16 (uint16x8_t, uint16x8_t, const int)

So the the 24bit rotation over a full 128bit vector (not over each element) could be done by the following:

uint16x8_t input;
uint16x8_t t0;
uint16x8_t t1;
uint16x8_t rotated;

t0 = vextq_u16(input, input, 1);
t0 = vshlq_n_u16(t0, 8);
t1 = vextq_u16(input, input, 2);
t1 = vshrq_n_u16(t1, 8);
rotated = vorrq_u16(t0, t1);
like image 103
Kami Avatar answered Sep 23 '22 07:09

Kami


I'm not 100% sure but I don't think NEON has rotate instructions.

You can compose the rotation operation you require with a left shift, a right shit and an or, e.g.:

uint8_t ror(uint8_t in, int rotation)
{
    return (in >> rotation) | (in << (8-rotation));
}

Just do the same with the Neon intrinsics for left shift, right shit and or.

uint16x8_t temp;
uint8_t rot;

uint16x8_t rotated =  vorrq_u16 ( vshlq_n_u16(temp, rot) , vshrq_n_u16(temp, 16 - rot) );

See http://en.wikipedia.org/wiki/Circular_shift "Implementing circular shifts."

This will rotate the values inside the lanes. If you want to rotate the lanes themselves use VEXT as described in the other answer.

like image 32
Pete Fordham Avatar answered Sep 19 '22 07:09

Pete Fordham