Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

SSE program takes a lot longer on AMD than on Intel

I am working in the optimization of an algorithm using SSE2 instructions. But I have run into this problem when I was testing the performance:

I) Intel e6750

  1. Doing 4 times the non-SSE2 algorithm takes 14.85 seconds
  2. Doing 1 time the SSE2 algorithm(processes the same data) takes 6.89 seconds

II) Phenom II x4 2.8Ghz

  1. Doing 4 times the non-SSE2 algorithm takes 11.43 seconds
  2. Doing 1 time the SSE2 algorithm(processes the same data) takes 12.15 seconds

Anyone can help me why this is happening? I'm really confused about the results.

In both cases I'm compiling with g++ using -O3 as flag.

PS: The algorithm doesn't use floating point math, it uses the SSE's integer instructions.

like image 765
Santiago Alessandri Avatar asked Jun 19 '11 16:06

Santiago Alessandri


1 Answers

Intel has made big improvements to their SSE implementation over the last 5 years or so, which AMD has not really kept up with. Originally both were really just 64 bit execution units, and 128 bit operations were broken down into 2 micro-ops. Ever since Core and Core 2 were introduced though, Intel CPUs have had a full 128 bit SSE implementation, which means that 128 bit operations effectively got a 2x throughput boost (1 micro op versus 2). More recent Intel CPUs also have multiple SSE execution units which means you can get > 1 instruction per clock throughput for 128 bit SIMD instructions.

like image 184
Paul R Avatar answered Oct 19 '22 02:10

Paul R