Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Xeon Phi coprocessor vs Xeon Phi host processor?

What is the difference between a host processor and coprocessor? Specifically Xeon Phi coprocessor and Xeon Phi host processor?

I have some performance results on these machines (a parallelized OpenMP code of diffusion equation was being run) which shows that the host processor works much faster when the same number of threads are working. I would like to know differences and relate them to my results.

like image 961
Amir Avatar asked Oct 28 '15 03:10

Amir


1 Answers

Just to re-iterate what Jeff said in the comments, you have a Xeon host with an attached Xeon Phi coprocessor. The current generation of Xeon Phi (Knight's Corner) is only available as a coprocessor, not as a standalone Xeon Phi host (which should be available next generation with Knight's Landing).

When you run your program without offloading from your host Xeon, from this website, it looks like you'll be able to run with up to 16 threads. Note that the speed of each of your cores is about 2.2 GHz.

When you run your program in native execution mode on your Xeon Phi coprocessor, you should be able to run with a lot more threads. The optimal number of threads to use depends on the model of Xeon Phi you have (some work best with 56, others with 60). But note that each Xeon Phi core (roughly 1.2 GHz) is noticeably weaker than a single Xeon core (roughly 2.2 GHz). The benefit of the many-core Xeon Phi technology is exactly that: you can run across many cores.

The last very important thing to consider is that the Xeon Phi has a 512-bit wide SIMD instruction set. Thus, you can support much better SIMD vectorization running on the Xeon Phi coprocessor than on the host. In your case, I believe your Xeon host only has a 256-bit SIMD vector processing unit. Therefore, if you haven't already, you can improve your performance (up to x16 if you're dealing in single-precision) on your Xeon Phi taking advantage of SIMD vectorization. Your Xeon host will only give up to x8 performance. Just to start you on a google trek, OpenMP 4.0 allows you to write things like #pragma omp simd in order to tell the compiler when to vectorize lower-level loops throughout your code. If you really want maximum performance from the Xeon Phi, adding SIMD vectorization is a necessity.

So to directly answer your question: comparing the performance results between your Xeon host and Xeon Phi coprocessor using the same number of cores is useless. We already know that each Xeon Phi core is slower than each Xeon core. You should be comparing the results using the maximum number of cores each allows (60, and 16 respectively) and taking maximum advantage of the vector processing unit if you want a direct comparison.

like image 191
NoseKnowsAll Avatar answered Oct 03 '22 01:10

NoseKnowsAll