Common widsom is that rep movsb
is much slower than rep movsd
(or on 64-bit, rep movsq
) when performing identical operations. However, I've been testing on a few modern machines, and the run times are coming out identical (up to measurement noise) across a huge range of buffer sizes (10 bytes to 2 megs). So far I have just tested on 2 machines (32-bit Intel Atom D510 and 64-bit AMD FX 8120).
Are there any modern x86 (32- or 64-bit) machines where rep movsb
is slower than rep movsd
(or rep movsq
)?
If not, what was the last machine where the difference was significant, and how significant was it?
I'm asking this question from a standpoint of wanting to avoid cargo-culting a bunch of tests to break memory up into unaligned head/tail and aligned middle for the sake of using rep movsd
or rep movsq
if there's no actual benefit to doing this...
The string instructions operate on strings of bytes. Operations include storing strings in memory, loading strings from memory, comparing strings, and scanning strings for substrings. Note – The Solaris mnemonics for certain instructions differ slightly from the Intel/AMD mnemonics.
A string is stored as consecutive characters in memory. If it's ASCII (not UTF-8), each character is a single byte. So you can access them one at a time with byte loads/stores, like movzbl 2(%rsi), %eax to get the 3rd character, if rsi points to the start of the string.
Lots of benchmarks here: instlatx64.atw.hu
For example (Intel Core 2 Duo E6700):
REP MOVSB BW in L1D:13.04 B/c 34829MiB/s
REP MOVSW BW in L1D:13.29 B/c 35493MiB/s
REP MOVSD BW in L1D:13.40 B/c 35783MiB/s
Which shows that there is a difference, but it's tiny.
This one for SandyBridge is a little weird:
REP MOVSB BW in L1D:25.50 B/c 86986MiB/s
REP MOVSW BW in L1D:18.09 B/c 61721MiB/s
REP MOVSD BW in L1D:27.47 B/c 93693MiB/s
Seems there is a big difference on some Atoms (seems to have disappeared with the D5xx, so you just missed it):
REP MOVSB BW in L1D: 0.53 B/c 990MiB/s
REP MOVSW BW in L1D: 1.93 B/c 3598MiB/s
REP MOVSD BW in L1D: 3.74 B/c 6960MiB/s
I haven't found such big difference on anything else that can be considered new.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With