I work on AVX2 and need to calculate 64-bit x64-bit -> 128-bit widening multiplication and got 64-bit high part in the fastest manner. Since AVX2 has not such an instruction, is it reasonable for me to use Karatsuba algorithm for efficiency and gaining speed?
single-digit products. For example, the traditional algorithm requires (210)2 = 1,048,576 single-digit multiplications to multiply two 1024-digit numbers (n = 1024 = 210), whereas the Karatsuba algorithm requires 310 = 59,049 thus being ~17.758 times faster.
In Karatsuba algorithm, 2-digit multiplication operation can be performed with three multiplications (Dwivedi, 2013). Karatsuba algorithm is used by Intel and other companies to perform faster multiplication, because it requires less number of steps (Madke and Zafar, 2014).
The traditional Karatsuba algorithm for the multiplication of polynomials and multi-precision integers has a time complexity of O(n^{1.59}) and a space complexity of O(n).
This algorithm takes O(n^2) time.
No. On modern architectures the crossover at which Karatsuba beats schoolbook multiplication is usually somewhere between 8 and 24 machine words (e.g. between 512 and 1536 bits on x86_64). For fixed sizes, the threshold is at the smaller end of that range, and the new ADCX/ADOX instructions likely bring it in somewhat further for scalar code, but 64x64 is still too small to benefit from Karatsuba.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With