Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Which AVX and march should be specified on a cluster with different architectures?

I'm currently trying to compile software for the use on a HPC-Cluster using Intel compilers. The login-node, which is where I compile and prepare the computations uses Intel Xeon Gold 6148 Processors, while the compute nodes use either Haswell- (Intel Xeon E5-2660 v3 / Intel Xeon Processor E5-2680 v3) or Skylake-processors (Intel Xeon Gold 6138).

As far as I understand from the links above, my login-node supports Intel SSE4.2, Intel AVX, Intel AVX2, as well as Intel AVX-512 but my compute nodes only support either Intel AVX2 (Haswell) or Intel AVX-512 (Skylake)

If I compile with the option -xHost on the login node, it should automatically use the highest instruction set available. But which one is the highest? And how can I ensure, that my program runs on both compute-systems with best performance? Do I have to compile two versions? Bonus question: Which -march do I have to specify in this case?

like image 636
Wulle Avatar asked Jun 05 '20 12:06

Wulle


People also ask

What is non-hierarchical clustering?

Non-hierarchical Clustering In this method, the dataset containing N objects is divided into M clusters. In business intelligence, the most widely used non-hierarchical clustering technique is K-means. 2. Hierarchical Clustering In this method, a set of nested clusters are produced.

What are the types of clustering methods?

What are the types of Clustering Methods? Clustering itself can be categorized into two types viz. Hard Clustering and Soft Clustering. In hard clustering, one data point can belong to one cluster only. But in soft clustering, the output provided is a probability likelihood of a data point belonging to each of the pre-defined numbers of clusters.

What is Clara (clustering large applications)?

o CLARA (Clustering Large Applications): – CLARA is an extension to the PAM algorithm where the computation time has been reduced to make it perform better for large data sets. To accomplish this, it selects a certain portion of data arbitrarily among the whole data set as a representative of the actual data.

What is the difference between hard clustering and soft clustering?

In hard clustering, one data point can belong to one cluster only. But in soft clustering, the output provided is a probability likelihood of a data point belonging to each of the pre-defined numbers of clusters. In this method, the clusters are created based upon the density of the data points which are represented in the data space.


2 Answers

Since you are using Intel Compiler, you can use its "Automatic Processor Dispatch" capability in order to create "fat" generic binaries, which contain both SSE-compatible , AVX-compatible and so on versions altogether. So when you run your "fat" binary on SSE-only machine, then only SSE-optimized part (codepath) of your binary will be executed. When you run the SAME "fat" binary on AVX machine, then AVX-optimized part of your binary will be executed. This is very powerful and not so well known feature.

You can eanble it using combination of -ax and -x Intel Compiler compilation flags. The idea is that basically you specify the highest ISA(s) via -ax and the default/"lowest" ISA via -x.

Given "-ax" fat binaries technique is briefly described at https://www.chpc.utah.edu/documentation/software/single-executable.php#submit

More details can be found at page 9 of given nice foil-deck: https://www.alcf.anl.gov/files/ken_intel_compiler_optimization.pdf


Finally, I should mention, that in your description you've slightly confused ISAs relationship. Intel x86 processors with AVX512 - will always be supporting AVX2. AVX2 machines will always support SSE. The super oversimplified explanation of that : AVX512 is kinda super-set of AVX/AVX2, while AVX/AVX2 can be seen as a super set of SSE (de facto it is not, but still SSE is always available on AVX machines, but not vice versa).

Whatever the case you've mentioned Haswell (AVX2 machine, so SSE is in board, but naturally no AVX512 here) and Skylake (AVX512 machine, so AVX2 and SSE are on board). Therefore you probably need something like -axCORE-AVX512 -xCORE-AVX2 (in your list there is no machines below AVX2 - ie no SSE or AVX(1) machines). You seem to only have Skylake server and Haswell server.

like image 87
zam Avatar answered Nov 14 '22 22:11

zam


Take a look at Function Multiversioning. Although it is not a perfect solution for your problem, it seems like a good candidate...

like image 35
Malkocoglu Avatar answered Nov 14 '22 22:11

Malkocoglu