How does the BSWAP instruction "speed execution of decimal arithmetic"?

Question

The Intel Software Developer's Manual Volume 1, Section 7.3.1.2 states that the BSWAP instruction "speeds execution of decimal arithmetic". It doesn't explain how this is so, and Google isn't helping either.

Can someone enlighten me on this?

paxdiablo · Accepted Answer

It is a bit of a throw-away comment, isn't it?

The BSWAP (byte swap) instruction reverses the byte order in a 32-bit register operand. Bit positions 0 through 7 are exchanged with 24 through 31, and bit positions 8 through 15 are exchanged with 16 through 23. Executing this instruction twice in a row leaves the register with the same value as before. The BSWAP instruction is useful for converting between “big-endian” and “little-endian” data formats. This instruction also speeds execution of decimal arithmetic. (The XCHG instruction can be used to swap the bytes in a word.)

And, on top of that, the last sentence as a parenthetical has nothing to do with the statement it seems to apply to. And, if it's not meant to do so, why have the parentheses at all.

My conclusion is that the doco is still being written by Intel engineers rather than tech writers who would have caught those torturous crimes against the English language :-)

But, as to how it could speed up decimal operations, only one possibility leaps to mind (and, honestly, it was more of a painful crawl than a leap).

If you have large numbers made up of decimal digits, one per character, laid out sequentially in memory, there may be a performance improvement if you can handle them four digits at a time. However, on a little-endian machine, loading the (sequential-in-memory) bytes {0x01, 0x02, 0x03, 0x04} as a 32-bit value (dword) would give you 0x04030201.

Doing a bswap on that could make it easier to manipulate it as a dword, before doing another bswap before writing it back.

Granted, that's speculation (though I'd like to consider it educated speculation ^(a)), but Intel aren't giving us much to go on in the document you refer to. Searching through volumes 1, 2a, 2b, 2c, 3a, 3b and 3c for all occurrences of bswap doesn't appear to clarify it either.

^(a) Intel have had some "sneaky" instructions for doing stuff like this for ages, the earliest of which I can remember was the daa/das instructions for adjusting BCD values (two per byte) after using byte-based, non-BCD addition. It's not beyond the realms of possibility to think they may have something similar for numeric characters within dwords, though I haven't done an exhaustive search.

As a first step, you could probably look into Intel's IDFPL, their library that implements the decimal side of the IEEE754 standard, you may find it used there. If not, it may be used in other high performance decimal libraries.

rcgldr · Answer

Usually decimal arithmetic is performed on variable length integer or fixed point strings of packed decimal digits, and it's probably more convenient to store these strings in big-endian mode. The data could be loaded 4 bytes at a time into a register, then BSWAP used to swap the data. However, as mentioned the decimal adjustment instructions are byte oriented, so if using these decimal adjustment instructions, the registers could be rotated 8 bits at a time to perform the byte oriented math. BSWAP could be used instead of the first rotate, but I don't see much advantage here. I don't know if there's some clever trick to implement an 32 bit packed decimal add, if there is, then BSWAP would help.

How does the BSWAP instruction "speed execution of decimal arithmetic"?

Tags:

x86

assembly

endianness

Alex D

2 Answers

paxdiablo

rcgldr

Recent Activity

Donate For Us

How does the BSWAP instruction "speed execution of decimal arithmetic"?

Tags:

x86

assembly

endianness

Alex D

2 Answers

paxdiablo

rcgldr

Related questions

Recent Activity

Donate For Us