This question was asked in an interview.
You have an array of small integers. You have to multiply all of them. You need not worry about overflow you have ample support for that. What can you do to speed up the multiplication on your machine?
Would multiple additions be better in this case?
I suggested multiplying using a divide and conquer approach but the interviewer was not impressed. What could be the best possible solution for this?
Multiplying large numbers together is performed by multiplying the ones digit of the bottom number to the entire top number. After the ones place value has finished multiplying, move to the tens place value in the bottom number and multiply that digit by the top number.
Booth's Array Multiplier: Booth's algorithm is a powerful technique to achieve fast multiplication.
The Box/Window Method. To use this strategy, we draw a box (the number of columns and rows depend on the number of digits in the factors), and then write the expanded forms of the factors along the top and the side. Then we multiply each part, and add the parts together when we are finished.
Here are some thoughts:
Divide-and-Conquer with Multithreading: Split the input apart into n different blocks of size b and recursively multiply all the numbers in each block together. Then, recursively multiply all n / b blocks back together. If you have multiple cores and can run parts of this in parallel, you could save a lot of time overall.
Word-Level Parallelism: Let's suppose that your numbers are all bounded from above by some number U, which happens to be a power of two. Now, suppose that you want to multiply together a, b, c, and d. Start off by computing (4U2a + b) × (4U2c + d) = 16U4ac + 4U2ad + 4U2bc + bd. Now, notice that this expression mod U2 is just bd. (Since bd < U2, we don't need to worry about the mod U2 step messing it up). This means that if we compute this product and take it mod U2, we get back bd. Since U2 is a power of two, this can be done with a bitmask.
Next, notice that
4U2ad + 4U2bc + bd < 4U4 + 4U4 + U2 < 9U4 < 16U4
This means that if we divide the entire expression by 16U4 and round down, we will end up getting back just ad. This division can be done with a bitshift, since 16U4 is a power of two.
Consequently, with one multiplication, you can get back the values of both ac and bd by applying a subsequent bitshift and bitmask. Once you have ac and bd, you can directly multiply them together to get back the value of abcd. Assuming that bitmasks and bitshifts are faster than multiplies, this reduces the number of multiplications necessary by 33% (two instead of three here).
Hope this helps!
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With