OS: Linux (Debian 10)
CC: GCC 8.3
CPU: i7-5775C
There is a unsigned __int128
/__int128
in GCC, but is there any way to have a uint256_t
/int256_t
in GCC?
I have read of a __m256i
which seems to be from Intel. Is there any header that I can include to get it?
Is it as usable as a hypothetic unsigned __int256
? I mean if you can assign from/to it, compare them, bitwise operations, etc.
What is its signed equivalent (if any)?
EDIT 1:
I achieved this:
#include <immintrin.h>
typedef __m256i uint256_t;
and compiled. If I can do some operations with it, I'll update it here.
EDIT 2:
Issues found:
uint256_t m;
int l = 5;
m = ~((uint256_t)1 << l);
ouput:
error: can’t convert a value of type ‘int’ to vector type ‘__vector(4) long long int’ which has different size
m = ~((uint256_t)1 << l);
A 256-bit private key will have 115,792,089,237,316,195,423,570,985,008,687,907,853,269, 984,665,640,564,039,457,584,007,913,129,639,936 (that's 78 digits) possible combinations.
int is always 32 bits wide. sizeof(T) represents the number of 8-bit bytes (octets) needed to store a variable of type T .
The EVM itself operates on 256 bit words, which means each stack and each storage space is 256 bit wide and usually you can address them that way (there are a few byte addressing methods). The EVM itself doesn't really care how the data is stored then.
The 128-bit data type can handle up to 31 significant digits (compared to 17 handled by the 64-bit long double). However, while this data type can store numbers with more precision than the 64-bit data type, it does not store numbers of greater magnitude.
I did need "uint256_t" only while computing "f(x) = (x^2+a) mod n" in Pollard Rho algorithm. All variables outside function "f" are of builtin type __uint128_t.
I implemented uint256_t for that purpose simply as:
typedef __uint128_t uint256_t[2];
And then I implemented the functions needed for computing "f()":
__uint128_t set_128(unsigned long h, unsigned long l);
void set_256(uint256_t d, __uint128_t l, __uint128_t h);
void add_128(uint256_t d, uint256_t x, __uint128_t a);
void add_256(uint256_t d, uint256_t x, uint256_t a);
void shl_256(uint256_t d, long s);
void sqr_128(uint256_t d, __uint128_t x);
several print functions and macros for printing 128bit and 256bit numbers
__uint128_t mod_256(uint256_t x, __uint128_t n);
__uint128_t f(__uint128_t x);
Find the implementation in this gist:
https://gist.github.com/Hermann-SW/a20af17ee6666467fe0b5c573dae701d
I did benchmark my code against gmplib functions and achieved speedup over gmplib for all (after a lot of work), for details:
https://www.raspberrypi.org/forums/viewtopic.php?f=33&t=311893&p=1873552#p1873552
Runtimes in nanoseconds for 1 million executions of a function:
Clang has _ExtInt
extended integers that supports operations other than division, but SIMD isn't useful for that because of carry between elements1. Other mainstream x86-64 compilers don't even have that; you need a library or something to define a custom type and use the same add-with-carry instructions clang will use. (Or a less efficient emulation in pure C2).
__m256i
is AVX2 SIMD 4x uint64_t
(or a narrower element size like 8x uint32_t
). It's not a 256-bit scalar integer type, you can't use it for scalar operations, __m256i var = 1
won't even compile. There is no x86 SIMD support for integers wider than 64-bit, and the Intel intrinsic types like __m128i
and __m256i
are purely for SIMD.
GCC's __int128
/ unsigned __int128
typically uses scalar add/adc
, and/or scalar mul
/ imul
, because AVX2 is generally not helpful for extended precision. (Only for stuff like bitwise AND/OR/XOR where element boundaries are irrelevant.)
Footnote 1: There actually is some scope for using SIMD for BigInteger types, but only with a specialized format. And more importantly, you have to manually choose when to re-normalize (propagate carry) so your calculations have to be designed around it; it's not a drop-in replacement. See Mysticial's answer on Can long integer routines benefit from SSE?
Footnote 2: Unfortunately C does not provide carry-out from addition / subtraction, so it's not even convenient to write in C. sum = a+b
/ carry = sum<a
works for carry out when there's no carry in, but it's much harder to write a full adder in C. And compiler typically make crap asm that doesn't just use native add-with-carry instructions on machines where they're available. Extended-precision libraries for very big integers, like GMP, are typically written in asm.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With