Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Reliable information about x86 string instruction performance?

Common widsom is that rep movsb is much slower than rep movsd (or on 64-bit, rep movsq) when performing identical operations. However, I've been testing on a few modern machines, and the run times are coming out identical (up to measurement noise) across a huge range of buffer sizes (10 bytes to 2 megs). So far I have just tested on 2 machines (32-bit Intel Atom D510 and 64-bit AMD FX 8120).

  • Are there any modern x86 (32- or 64-bit) machines where rep movsb is slower than rep movsd (or rep movsq)?

  • If not, what was the last machine where the difference was significant, and how significant was it?

I'm asking this question from a standpoint of wanting to avoid cargo-culting a bunch of tests to break memory up into unaligned head/tail and aligned middle for the sake of using rep movsd or rep movsq if there's no actual benefit to doing this...

like image 659
R.. GitHub STOP HELPING ICE Avatar asked Sep 10 '12 20:09

R.. GitHub STOP HELPING ICE


People also ask

What is string instructions?

The string instructions operate on strings of bytes. Operations include storing strings in memory, loading strings from memory, comparing strings, and scanning strings for substrings. Note – The Solaris mnemonics for certain instructions differ slightly from the Intel/AMD mnemonics.

How are x86 strings stored?

A string is stored as consecutive characters in memory. If it's ASCII (not UTF-8), each character is a single byte. So you can access them one at a time with byte loads/stores, like movzbl 2(%rsi), %eax to get the 3rd character, if rsi points to the start of the string.


1 Answers

Lots of benchmarks here: instlatx64.atw.hu

For example (Intel Core 2 Duo E6700):

REP MOVSB   BW in L1D:13.04 B/c  34829MiB/s
REP MOVSW   BW in L1D:13.29 B/c  35493MiB/s
REP MOVSD   BW in L1D:13.40 B/c  35783MiB/s

Which shows that there is a difference, but it's tiny.

This one for SandyBridge is a little weird:

REP MOVSB   BW in L1D:25.50 B/c  86986MiB/s
REP MOVSW   BW in L1D:18.09 B/c  61721MiB/s
REP MOVSD   BW in L1D:27.47 B/c  93693MiB/s

Seems there is a big difference on some Atoms (seems to have disappeared with the D5xx, so you just missed it):

REP MOVSB   BW in L1D: 0.53 B/c    990MiB/s
REP MOVSW   BW in L1D: 1.93 B/c   3598MiB/s
REP MOVSD   BW in L1D: 3.74 B/c   6960MiB/s

I haven't found such big difference on anything else that can be considered new.

like image 90
harold Avatar answered Sep 21 '22 00:09

harold