I want to configure my application to use lz4 compression instead of snappy, what I did is:
session = SparkSession.builder()
.master(SPARK_MASTER) //local[1]
.appName(SPARK_APP_NAME)
.config("spark.io.compression.codec", "org.apache.spark.io.LZ4CompressionCodec")
.getOrCreate();
but looking at the console output, it's still using snappy in the executor
org.apache.parquet.hadoop.codec.CodecConfig: Compression: SNAPPY
and
[Executor task launch worker-0] compress.CodecPool (CodecPool.java:getCompressor(153)) - Got brand-new compressor [.snappy]
According to this post, what I did here only configure the driver, but not the executor. The solution on the post is to change the spark-defaults.conf file, but I'm running spark in local mode, I don't have that file anywhere.
I need to run the application in local mode (for the purpose of unit test). The tests works fine locally on my machine, but when I submit the test to a build engine(RHEL5_64), I got the error
snappy-1.0.5-libsnappyjava.so: /usr/lib64/libstdc++.so.6: version `GLIBCXX_3.4.9' not found
I did some research and it seems the simplest fix is to use lz4 instead of snappy for codec, so I try the above solution.
I have been stuck in this issue for several hours, any help is appreciated, thank you.
what I did here only configure the driver, but not the executor.
In local
mode there is only one JVM which hosts both driver and executor threads.
the spark-defaults.conf file, but I'm running spark in local mode, I don't have that file anywhere.
Mode is not relevant here. Spark in local
mode uses the same configuration files. If you go to the directory where you store Spark binaries you should see conf
directory:
spark-2.2.0-bin-hadoop2.7 $ ls
bin conf data examples jars LICENSE licenses NOTICE python R README.md RELEASE sbin yarn
In this directory there is a bunch of template files:
spark-2.2.0-bin-hadoop2.7 $ ls conf
docker.properties.template log4j.properties.template slaves.template spark-env.sh.template
fairscheduler.xml.template metrics.properties.template spark-defaults.conf.template
If you want to set configuration option copy spark-defaults.conf.template
to spark-defaults.conf
and edit it according to your requirements.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With