The Intel Software Developer's Manual Volume 1, Section 7.3.1.2 states that the BSWAP
instruction "speeds execution of decimal arithmetic". It doesn't explain how this is so, and Google isn't helping either.
Can someone enlighten me on this?
It is a bit of a throw-away comment, isn't it?
The BSWAP (byte swap) instruction reverses the byte order in a 32-bit register operand. Bit positions 0 through 7 are exchanged with 24 through 31, and bit positions 8 through 15 are exchanged with 16 through 23. Executing this instruction twice in a row leaves the register with the same value as before. The BSWAP instruction is useful for converting between “big-endian” and “little-endian” data formats. This instruction also speeds execution of decimal arithmetic. (The XCHG instruction can be used to swap the bytes in a word.)
And, on top of that, the last sentence as a parenthetical has nothing to do with the statement it seems to apply to. And, if it's not meant to do so, why have the parentheses at all.
My conclusion is that the doco is still being written by Intel engineers rather than tech writers who would have caught those torturous crimes against the English language :-)
But, as to how it could speed up decimal operations, only one possibility leaps to mind (and, honestly, it was more of a painful crawl than a leap).
If you have large numbers made up of decimal digits, one per character, laid out sequentially in memory, there may be a performance improvement if you can handle them four digits at a time. However, on a little-endian machine, loading the (sequential-in-memory) bytes {0x01, 0x02, 0x03, 0x04}
as a 32-bit value (dword) would give you 0x04030201
.
Doing a bswap
on that could make it easier to manipulate it as a dword, before doing another bswap
before writing it back.
Granted, that's speculation (though I'd like to consider it educated speculation (a)), but Intel aren't giving us much to go on in the document you refer to. Searching through volumes 1, 2a, 2b, 2c, 3a, 3b and 3c for all occurrences of bswap
doesn't appear to clarify it either.
(a) Intel have had some "sneaky" instructions for doing stuff like this for ages, the earliest of which I can remember was the daa/das
instructions for adjusting BCD values (two per byte) after using byte-based, non-BCD addition. It's not beyond the realms of possibility to think they may have something similar for numeric characters within dwords, though I haven't done an exhaustive search.
As a first step, you could probably look into Intel's IDFPL, their library that implements the decimal side of the IEEE754 standard, you may find it used there. If not, it may be used in other high performance decimal libraries.
Usually decimal arithmetic is performed on variable length integer or fixed point strings of packed decimal digits, and it's probably more convenient to store these strings in big-endian mode. The data could be loaded 4 bytes at a time into a register, then BSWAP used to swap the data. However, as mentioned the decimal adjustment instructions are byte oriented, so if using these decimal adjustment instructions, the registers could be rotated 8 bits at a time to perform the byte oriented math. BSWAP could be used instead of the first rotate, but I don't see much advantage here. I don't know if there's some clever trick to implement an 32 bit packed decimal add, if there is, then BSWAP would help.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With