Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Arm NEON and poly8_t and poly16_t

I've been looking into neon optimisation with intrinsics recently and I have come across the poly8_t and poly16_t data types. I'm then left wondering what on earth they are.

I've searched all across the net but so far have been unable to find ANY explanation of what they are.

Can anyone explain them to me?

Edit: Thanks for those answers but why, if it is just a different way of multiplying etc, does it have a totally different data type?

like image 719
Goz Avatar asked Mar 06 '14 12:03

Goz


3 Answers

Left = regular multiplication, Right = carryless multiplication

        1 1 0 1                              1 1 0 1
     *  1 0 0 1                              1 0 0 1
   ------------        -->              --------------
     (1)1 1 0 1  <-- (1) is carry            1 1 0 1
      0 0 0 0                              0 0 0 0 
    0 0 0 0                              0 0 0 0
  1 1 0 1        +                     1 1 0 1         + GF(2) or XOR
  -------------                        ---------------
  1 1 1 0 1 0 1                        1 1 0 0 1 0 1

Each 1 or 0 in the diagonally descending matrix represents partial product of one source bit from the vector '1101' and one source bit from the other vector '1001'.

The applications of the right one are in CRC, (ECC) Error Correction Code calculations (Reed Solomon, BCH) and cryptography (elliptic curves, internals of AES).

Illustrating the connection to polynomial multiplication, the operation above can summarized as

 1101 == x^3 + x^2 + 0 + 1;
 1001 == x^3 + 0   + 0 + 1;

Regular polynomial multiplication being: p(x) * (x^3 + 1) == p(x)*x^3 + p(x) ==

 (x^3 + x^2 + 1)(x^3 + 1) == x^6+x^5+x^3 + x^3+x^2+1 
                          == 1x^6 + 1x^5 + 0x^4 + 2x^3 + 1^x2 + 0x + 1
                          == "1102101"

In GF(2) each coefficient is simply calculated modulo 2, making 1100101b.

The datatype in GF looks just like uint8_t, uint16_t or perhaps upto 128_t in respect that the datatype for GF(2^8) holds 256 unique bitpatterns. However e.g. the bitpattern '00010001' e.g. has no traditional interpretation. (It's not 17 decimal, but perhaps 123th power of "unity" modulo some other polynomial.) Multiplying this number with the same "unity" modulo the generator polynomial g(x) leads to 124th power and so on. Then the properties (identities) of the finite fields have just interesting applications -- such that one can (remotely) easily calculate what 32-bit number to append to a file to make it's 32-bit CRC to match; or one can use the properties to parallelize crc calculation, or to implement bignum multiplication with Fourier-like transform in Finite fields (Number Theoretic Transform).

like image 88
Aki Suihkonen Avatar answered Sep 21 '22 12:09

Aki Suihkonen


These types are used for carry-less multiplication. It is useful for cryptographic algorithms and CRC hash sums. Here are some white papers about applications (they explore x86 PCLMULQDQ instruction, but the same ideas apply to carry-less multiplication on ARM processors):

  • Intel Carry-Less Multiplication Instruction and its Usage for Computing the GCM Mode
  • Fast CRC Computation for Generic Polynomials Using PCLMULQDQ Instruction
  • Intel Polynomial Multiplication Instruction and its Usage for Elliptic Curve Cryptography
like image 26
Marat Dukhan Avatar answered Sep 20 '22 12:09

Marat Dukhan


You did not describe PMUL vs PMULL.

As I understand it (probably incorrectly) each per element PMUL works on two 8 bit source elements and generates one 8-bit result element.

Each per element PMUL generates 8 partial products and each PP are shifted respectively before XORed. So from the lsb of the first PP to the msb of the last PP. There seem to be 15-bits of result. PMUL can only store an 8-bit result.

Are the most significant 7-bits of 15-bit result discarded?

like image 41
Richard Avatar answered Sep 21 '22 12:09

Richard