Arm NEON and poly8_t and poly16_t

Question

I've been looking into neon optimisation with intrinsics recently and I have come across the poly8_t and poly16_t data types. I'm then left wondering what on earth they are.

I've searched all across the net but so far have been unable to find ANY explanation of what they are.

Can anyone explain them to me?

Edit: Thanks for those answers but why, if it is just a different way of multiplying etc, does it have a totally different data type?

Aki Suihkonen · Accepted Answer

Left = regular multiplication, Right = carryless multiplication

        1 1 0 1                              1 1 0 1
     *  1 0 0 1                              1 0 0 1
   ------------        -->              --------------
     (1)1 1 0 1  <-- (1) is carry            1 1 0 1
      0 0 0 0                              0 0 0 0 
    0 0 0 0                              0 0 0 0
  1 1 0 1        +                     1 1 0 1         + GF(2) or XOR
  -------------                        ---------------
  1 1 1 0 1 0 1                        1 1 0 0 1 0 1

Each 1 or 0 in the diagonally descending matrix represents partial product of one source bit from the vector '1101' and one source bit from the other vector '1001'.

The applications of the right one are in CRC, (ECC) Error Correction Code calculations (Reed Solomon, BCH) and cryptography (elliptic curves, internals of AES).

Illustrating the connection to polynomial multiplication, the operation above can summarized as

 1101 == x^3 + x^2 + 0 + 1;
 1001 == x^3 + 0   + 0 + 1;

Regular polynomial multiplication being: p(x) * (x^3 + 1) == p(x)*x^3 + p(x) ==

 (x^3 + x^2 + 1)(x^3 + 1) == x^6+x^5+x^3 + x^3+x^2+1 
                          == 1x^6 + 1x^5 + 0x^4 + 2x^3 + 1^x2 + 0x + 1
                          == "1102101"

In GF(2) each coefficient is simply calculated modulo 2, making 1100101b.

The datatype in GF looks just like uint8_t, uint16_t or perhaps upto 128_t in respect that the datatype for GF(2^8) holds 256 unique bitpatterns. However e.g. the bitpattern '00010001' e.g. has no traditional interpretation. (It's not 17 decimal, but perhaps 123th power of "unity" modulo some other polynomial.) Multiplying this number with the same "unity" modulo the generator polynomial g(x) leads to 124th power and so on. Then the properties (identities) of the finite fields have just interesting applications -- such that one can (remotely) easily calculate what 32-bit number to append to a file to make it's 32-bit CRC to match; or one can use the properties to parallelize crc calculation, or to implement bignum multiplication with Fourier-like transform in Finite fields (Number Theoretic Transform).

Marat Dukhan · Answer

These types are used for carry-less multiplication. It is useful for cryptographic algorithms and CRC hash sums. Here are some white papers about applications (they explore x86 PCLMULQDQ instruction, but the same ideas apply to carry-less multiplication on ARM processors):

Intel Carry-Less Multiplication Instruction and its Usage for Computing the GCM Mode
Fast CRC Computation for Generic Polynomials Using PCLMULQDQ Instruction
Intel Polynomial Multiplication Instruction and its Usage for Elliptic Curve Cryptography

Richard · Answer

You did not describe PMUL vs PMULL.

As I understand it (probably incorrectly) each per element PMUL works on two 8 bit source elements and generates one 8-bit result element.

Each per element PMUL generates 8 partial products and each PP are shifted respectively before XORed. So from the lsb of the first PP to the msb of the last PP. There seem to be 15-bits of result. PMUL can only store an 8-bit result.

Are the most significant 7-bits of 15-bit result discarded?

Arm NEON and poly8_t and poly16_t

Tags:

c++

c

arm

intrinsics

neon

Goz

3 Answers

Aki Suihkonen

Marat Dukhan

Richard

Recent Activity

Donate For Us

Arm NEON and poly8_t and poly16_t

Tags:

c++

c

arm

intrinsics

neon

Goz

3 Answers

Aki Suihkonen

Marat Dukhan

Richard

Related questions

Recent Activity

Donate For Us