I am working in the optimization of an algorithm using SSE2 instructions. But I have run into this problem when I was testing the performance:
I) Intel e6750
II) Phenom II x4 2.8Ghz
Anyone can help me why this is happening? I'm really confused about the results.
In both cases I'm compiling with g++ using -O3 as flag.
PS: The algorithm doesn't use floating point math, it uses the SSE's integer instructions.
Intel has made big improvements to their SSE implementation over the last 5 years or so, which AMD has not really kept up with. Originally both were really just 64 bit execution units, and 128 bit operations were broken down into 2 micro-ops. Ever since Core and Core 2 were introduced though, Intel CPUs have had a full 128 bit SSE implementation, which means that 128 bit operations effectively got a 2x throughput boost (1 micro op versus 2). More recent Intel CPUs also have multiple SSE execution units which means you can get > 1 instruction per clock throughput for 128 bit SIMD instructions.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With