Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to configure high performance BLAS/LAPACK for Breeze on Amazon EMR, EC2

I am trying to set up an environment to support exploratory data analytics on a cluster. Based on an initial survey of what's out there my target is use Scala/Spark with Amazon EMR to provision the cluster.

Currently I'm just trying to get some basic examples up and running to validate that I've got everything configured properly. The problem I am having is that I'm not seeing the performance I expect from the Atlas BLAS libraries on the Amazon machine instance.

Below is a code snippet of my simple benchmark. It's just a square matrix multiply followed by short fat multiply and a tall thin multiply to yield a small matrix that can be printed (I wanted to be sure Scala would not skip any part of the computation due to lazy evaluation).

I'm using Breeze for the linear algebra library and netlib-java to pull in the local native libraries for BLAS/LAPACK

import breeze.linalg.{DenseMatrix, DenseVector} import org.apache.spark.annotation.DeveloperApi import org.apache.spark.rdd.RDD import org.apache.spark.{Partition, SparkContext, TaskContext} import org.apache.spark.SparkConf  import com.github.fommil.netlib.BLAS.{getInstance => blas}  import scala.reflect.ClassTag  object App {    def NaiveMultiplication(n: Int) : Unit = {      val vl = java.text.NumberFormat.getIntegerInstance.format(n)     println(f"Naive Multipication with vector length " + vl)      println(blas.getClass().getName())      val sm: DenseMatrix[Double] = DenseMatrix.rand(n, n)     val a: DenseMatrix[Double] = DenseMatrix.rand(2,n)     val b: DenseMatrix[Double] = DenseMatrix.rand(n,3)      val c: DenseMatrix[Double] = sm * sm     val cNormal: DenseMatrix[Double] = (a *  c)  * b      println(s"Dot product of a and b is \n$cNormal")   } 

Based on a web survey of benchmarks I'm expecting a 3000x3000 matrix multiply to take approx. 2-4s using a native, optimized BLAS library. When I run locally on my MacBook Air this benchmark completes in 1.8s. When I run this on EMR it completes in approx. 11s (using a g2.2xlarge instance, though similar results were obtained on a m3.xlarge instance). As another cross check I ran a prebuilt EC2 AMI from the BIDMach project on the same EC2 instance type, g2.2xlarge, and got 2.2s (note, the GPU benchmark for the same calculation yielded 0.047s).

At this point I suspect that netlib-java is not loading the correct lib, but this is where I am stuck. I've gone through the netlib-java README many times and it seems the ATLAS libs are already installed as required (see below)

[hadoop@ip-172-31-3-69 ~]$ ls /usr/lib64/atlas/ libatlas.a       libcblas.a       libclapack.so      libf77blas.so      liblapack.so      libptcblas.so      libptf77blas.so libatlas.so      libcblas.so      libclapack.so.3    libf77blas.so.3    liblapack.so.3    libptcblas.so.3    libptf77blas.so.3 libatlas.so.3    libcblas.so.3    libclapack.so.3.0  libf77blas.so.3.0  liblapack.so.3.0  libptcblas.so.3.0  libptf77blas.so.3.0 libatlas.so.3.0  libcblas.so.3.0  libf77blas.a       liblapack.a        libptcblas.a      libptf77blas.a [hadoop@ip-172-31-3-69 ~]$ cat /etc/ld.so.conf include ld.so.conf.d/*.conf [hadoop@ip-172-31-3-69 ~]$ ls /etc/ld.so.conf.d atlas-x86_64.conf  kernel-4.4.11-23.53.amzn1.x86_64.conf  kernel-4.4.8-20.46.amzn1.x86_64.conf  mysql55-x86_64.conf  R-x86_64.conf [hadoop@ip-172-31-3-69 ~]$ cat /etc/ld.so.conf.d/atlas-x86_64.conf  /usr/lib64/atlas 

Below I've show 2 examples running the benchmark on Amazon EMR instance. The first shows when the native system BLAS supposedly loads correctly. The second shows when the native BLAS does not load and the package falls back to the reference implementation. So it does appear to be loading a native BLAS based on the messages and the timing. Compared to running locally on my Mac, the no BLAS case runs in approximately the same time, but the native BLAS case runs in 1.8s on my Mac compared to 15s in the case below. The info messages are the same for my Mac compared to EMR (other than specific dir/file names, etc.).

[hadoop@ip-172-31-3-69 ~]$ spark-submit --class "com.cyberatomics.simplespark.App" --conf "spark.driver.extraClassPath=/home/hadoop/simplespark-0.0.1-SNAPSHOT-jar-with-dependencies.jar"   --master local[4] simplespark-0.0.1-SNAPSHOT-jar-with-dependencies.jar  3000 naive Naive Multipication with vector length 3,000 Jun 16, 2016 12:30:39 AM com.github.fommil.jni.JniLoader liberalLoad INFO: successfully loaded /tmp/jniloader2856061049061057802netlib-native_system-linux-x86_64.so com.github.fommil.netlib.NativeSystemBLAS Dot product of a and b is  1.677332076284315E9   1.6768329748988206E9  1.692150656424957E9    1.6999000993276503E9  1.6993872020220244E9  1.7149145239563465E9   Elapsed run time:  15.1s [hadoop@ip-172-31-3-69 ~]$  [hadoop@ip-172-31-3-69 ~]$ spark-submit --class "com.cyberatomics.simplespark.App"  --master local[4] simplespark-0.0.1-SNAPSHOT-jar-with-dependencies.jar  3000 naive Naive Multipication with vector length 3,000 Jun 16, 2016 12:31:32 AM com.github.fommil.netlib.BLAS <clinit> WARNING: Failed to load implementation from: com.github.fommil.netlib.NativeSystemBLAS Jun 16, 2016 12:31:32 AM com.github.fommil.netlib.BLAS <clinit> WARNING: Failed to load implementation from: com.github.fommil.netlib.NativeRefBLAS com.github.fommil.netlib.F2jBLAS Dot product of a and b is  1.6640545115052865E9  1.6814609592261212E9  1.7062846398842275E9   1.64471099826913E9    1.6619129531594608E9  1.6864479674870768E9   Elapsed run time:  28.7s 

At this point my best guess is that it is actually loading a native lib, but it is loading a generic one. Any suggestions on how I can verify which shared library it is picking up at run time? I tried 'ldd' but that seems not to work with spark-submit. Or maybe my expectations for Atlas are wrong, but seems hard to believe AWS would pre-install the libs if they weren't running a reasonably competitive speeds.

If you see that the libs are not linked up correctly on EMR, please provide guidance on what I need to do in order for the Atlas libs to get picked up by netlib-java.

thanks tim

like image 752
Tim Ryan Avatar asked Jun 16 '16 01:06

Tim Ryan


People also ask

What is normalized instance hours in EMR?

Normalized Instance Hours are hours of compute time based on the standard of 1 hour of m1. small usage = 1 hour normalized compute time. You can view our documentation to see a list of different sizes within an instance family, and the corresponding normalization factor per hour.

Does EMR use EC2?

Amazon EMR can quickly process large amounts of data using Amazon EC2. Users can configure Amazon EMR to take advantage of On-Demand, Reserved and Spot Instances.

Which of the following can you do using AWS EMR on S3?

Amazon EMR uses the AWS SDK for Java with Amazon S3 to store input data, log files, and output data. Amazon S3 refers to these storage locations as buckets.


1 Answers

Follow up:

My tentative conclusion is that the Atlas libs installed by default on the Amazon EMR instance is simply slow. Either it is a generic build that has not been optimized for the specific machine type, or it is fundamentally slower than other libraries. Using this script as a guide I built and installed OpenBLAS for the specific machine type where I was running the benchmarks(I also found some helpful info here). Once OpenBLAS was installed my 3000x3000 matrix multiply benchmark completed in 3.9s (as compared to the 15.1s listed above when using the default Atlas libs). This is still slower than the same benchmark run on my Mac (by a factor of x2), but this difference falls in a range that could credibly be due to underlying h/w performance.

Here is a complete listing of the commands I used to install OpenBLAS libs on Amazon's EMR, Spark instance:

sudo yum install git git clone https://github.com/xianyi/OpenBlas.git cd OpenBlas/ make clean make -j4 sudo mkdir /usr/lib64/OpenBLAS sudo chmod o+w,g+w /usr/lib64/OpenBLAS/ make PREFIX=/usr/lib64/OpenBLAS install sudo rm /etc/ld.so.conf.d/atlas-x86_64.conf  sudo ldconfig sudo ln -sf /usr/lib64/OpenBLAS/lib/libopenblas.so /usr/lib64/libblas.so sudo ln -sf /usr/lib64/OpenBLAS/lib/libopenblas.so /usr/lib64/libblas.so.3 sudo ln -sf /usr/lib64/OpenBLAS/lib/libopenblas.so /usr/lib64/libblas.so.3.5 sudo ln -sf /usr/lib64/OpenBLAS/lib/libopenblas.so /usr/lib64/liblapack.so sudo ln -sf /usr/lib64/OpenBLAS/lib/libopenblas.so /usr/lib64/liblapack.so.3 sudo ln -sf /usr/lib64/OpenBLAS/lib/libopenblas.so /usr/lib64/liblapack.so.3.5 
like image 115
Tim Ryan Avatar answered Oct 06 '22 10:10

Tim Ryan