SSE program takes a lot longer on AMD than on Intel

Question

I am working in the optimization of an algorithm using SSE2 instructions. But I have run into this problem when I was testing the performance:

I) Intel e6750

Doing 4 times the non-SSE2 algorithm takes 14.85 seconds
Doing 1 time the SSE2 algorithm(processes the same data) takes 6.89 seconds

II) Phenom II x4 2.8Ghz

Doing 4 times the non-SSE2 algorithm takes 11.43 seconds
Doing 1 time the SSE2 algorithm(processes the same data) takes 12.15 seconds

Anyone can help me why this is happening? I'm really confused about the results.

In both cases I'm compiling with g++ using -O3 as flag.

PS: The algorithm doesn't use floating point math, it uses the SSE's integer instructions.

Paul R · Accepted Answer

Intel has made big improvements to their SSE implementation over the last 5 years or so, which AMD has not really kept up with. Originally both were really just 64 bit execution units, and 128 bit operations were broken down into 2 micro-ops. Ever since Core and Core 2 were introduced though, Intel CPUs have had a full 128 bit SSE implementation, which means that 128 bit operations effectively got a 2x throughput boost (1 micro op versus 2). More recent Intel CPUs also have multiple SSE execution units which means you can get > 1 instruction per clock throughput for 128 bit SIMD instructions.

SSE program takes a lot longer on AMD than on Intel

Tags:

optimization

sse

Santiago Alessandri

1 Answers

Paul R

Recent Activity

Donate For Us

SSE program takes a lot longer on AMD than on Intel

Tags:

optimization

sse

Santiago Alessandri

1 Answers

Paul R

Related questions

Recent Activity

Donate For Us