Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to configure Executor in Spark Local Mode

In Short

I want to configure my application to use lz4 compression instead of snappy, what I did is:

session = SparkSession.builder()
        .master(SPARK_MASTER) //local[1]
        .appName(SPARK_APP_NAME)
        .config("spark.io.compression.codec", "org.apache.spark.io.LZ4CompressionCodec")
        .getOrCreate();

but looking at the console output, it's still using snappy in the executor

org.apache.parquet.hadoop.codec.CodecConfig: Compression: SNAPPY

and

[Executor task launch worker-0] compress.CodecPool (CodecPool.java:getCompressor(153)) - Got brand-new compressor [.snappy]

According to this post, what I did here only configure the driver, but not the executor. The solution on the post is to change the spark-defaults.conf file, but I'm running spark in local mode, I don't have that file anywhere.

Some more detail:

I need to run the application in local mode (for the purpose of unit test). The tests works fine locally on my machine, but when I submit the test to a build engine(RHEL5_64), I got the error

snappy-1.0.5-libsnappyjava.so: /usr/lib64/libstdc++.so.6: version `GLIBCXX_3.4.9' not found

I did some research and it seems the simplest fix is to use lz4 instead of snappy for codec, so I try the above solution.

I have been stuck in this issue for several hours, any help is appreciated, thank you.

like image 557
Ning Lin Avatar asked Nov 07 '22 17:11

Ning Lin


1 Answers

what I did here only configure the driver, but not the executor.

In local mode there is only one JVM which hosts both driver and executor threads.

the spark-defaults.conf file, but I'm running spark in local mode, I don't have that file anywhere.

Mode is not relevant here. Spark in local mode uses the same configuration files. If you go to the directory where you store Spark binaries you should see conf directory:

spark-2.2.0-bin-hadoop2.7 $ ls
bin  conf  data  examples  jars  LICENSE  licenses  NOTICE  python  R  README.md  RELEASE  sbin  yarn

In this directory there is a bunch of template files:

spark-2.2.0-bin-hadoop2.7 $ ls conf 
docker.properties.template  log4j.properties.template    slaves.template               spark-env.sh.template

fairscheduler.xml.template metrics.properties.template spark-defaults.conf.template

If you want to set configuration option copy spark-defaults.conf.template to spark-defaults.conf and edit it according to your requirements.

like image 83
Alper t. Turker Avatar answered Nov 15 '22 12:11

Alper t. Turker