Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Custom log4j.properties on AWS EMR

I am unable to override and use a Custom log4j.properties on Amazon EMR. I am running Spark on EMR (Yarn) and have tried all the below combinations in the Spark-Submit to try and use the custom log4j.

--driver-java-options "-Dlog4j.configuration=hdfs://host:port/user/hadoop/log4j.properties"

--conf "spark.driver.extraJavaOptions=-Dlog4j.configuration=hdfs://host:port/user/hadoop/log4j.properties"

I have also tried picking from local filesystem using file://// instead of hdfs. None of this seem to work. However, I can get this working when running on my local Yarn setup.

Any ideas?

like image 337
Kaptrain Avatar asked Feb 25 '17 06:02

Kaptrain


People also ask

Does EMR use Log4j?

For EMR on EKS, the EMR Runtime for Spark uses Apache Log4j version 1.2.

How do I enable logging in EMR?

Open the Amazon EMR console at https://console.aws.amazon.com/elasticmapreduce/ . Choose Create cluster. Choose Go to advanced options. In the Cluster Configuration section, in the Logging field, choose Enabled.

Where is yarn site XML on EMR?

I was able to find it to be located at /etc/hadoop/conf. empty/yarn-site. xml and capacity-scheduler to be located at /etc/hadoop/conf.


2 Answers

Basically, after chatting with the support and reading the documentation, I see that there are 2 options available to do this:

1 - Pass the log4j.properties through configuration passed when bringing up EMR. Jonathan has mentioned this on his answer.

2 - Include the --files /path/to/log4j.properties switch to your spark-submit command. This will distribute the log4j.properties file to the working directory of each Spark Executor, then change your -Dlog4jconfiguration to point to the filename only: "spark.driver.extraJavaOptions=-Dlog4j.configuration=log4j.properties"

like image 172
Kaptrain Avatar answered Oct 27 '22 13:10

Kaptrain


log4j knows nothing about HDFS, so it can't accept an hdfs:// path as its configuration file. See here for more information about configuring log4j in general.

To configure log4j on EMR, you may use the Configuration API to add key-value pairs to the log4j.properties file that is loaded by the driver and executors. Specifically, you want to add your Properties to the spark-log4j configuration classification.

like image 40
Jonathan Kelly Avatar answered Oct 27 '22 11:10

Jonathan Kelly