Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to specify the location of custom log4j.configuration when spark-submit to Amazon EMR?

I am trying to run a spark job in EMR cluster.

I my spark-submit I have added configs to read from log4j.properties

--files log4j.properties --conf "spark.executor.extraJavaOptions=-Dlog4j.configuration=file:/log4j.properties"

Also I have added

log4j.rootLogger=INFO, file
log4j.appender.file=org.apache.log4j.RollingFileAppender
log4j.appender.file.File=/log/test.log
log4j.appender.file.MaxFileSize=10MB
log4j.appender.file.MaxBackupIndex=10
log4j.appender.file.layout=org.apache.log4j.PatternLayout
log4j.appender.file.layout.ConversionPattern=%d{yyyy-MM-dd HH:mm:ss.SSS} [%thread] %5p %c{7} - %m%n

in my log4j configurations.

Anyhow I see the logs in the console, though I don't see the log file generated. What am I doing wrong here ?

like image 315
nnc Avatar asked May 19 '17 01:05

nnc


People also ask

How do you set a log4j properties file location?

The file is named log4j. properties and is located in the $DGRAPH_HOME/dgraph-hdfs-agent/lib directory. The file defines the ROLLINGFILE appenders for the root logger and also sets the log level for the file.

Where is Spark defaults Conf located EMR?

Normally, there is a spark-defaults. conf file located in /etc/spark/conf after I create a spark cluster on EMR. Following the instructions from http://docs.aws.amazon.com//ElasticMapReduce/latest/ReleaseGuide/emr-configure-apps.html , i'm trying to add a jar to the driver and executor extraClassPath properties.

What is the name of the log4j properties configuration file?

The log4j. properties file is a log4j configuration file which keeps properties in key-value pairs. By default, the LogManager looks for a file named log4j. properties in the CLASSPATH.


1 Answers

Quoting spark-submit --help:

--files FILES Comma-separated list of files to be placed in the working directory of each executor. File paths of these files in executors can be accessed via SparkFiles.get(fileName).

That doesn't much say what to do with the FILES if you cannot use SparkFiles.get(fileName) (which you cannot for log4j).

Quoting SparkFiles.get's scaladoc:

Get the absolute path of a file added through SparkContext.addFile().

That does not give you much either, but suggest to have a look at the source code of SparkFiles.get:

def get(filename: String): String =
  new File(getRootDirectory(), filename).getAbsolutePath()

The nice thing about it is that getRootDirectory() uses an optional property or just the current working directory:

def getRootDirectory(): String =
  SparkEnv.get.driverTmpDir.getOrElse(".")

That gives as something to work on, doesn't it?

On the driver the so-called driverTmpDir directory should be easy to find in Environment tab of web UI (under Spark Properties for spark.files property or Classpath Entries marked as "Added By User" in Source column).

On executors, I'd assume a local directory so rather than using file:/log4j.properties I'd use

-Dlog4j.configuration=file://./log4j.properties

or

-Dlog4j.configuration=file:log4j.properties

Note the dot to specify the local working directory (in the first option) or no leading / (in the latter).

Don't forget about spark.driver.extraJavaOptions to set the Java options for the driver if that's something you haven't thought about yet. You've been focusing on executors only so far.

You may want to add -Dlog4j.debug=true to spark.executor.extraJavaOptions that is supposed to print what locations log4j uses to find log4j.properties.


I have not checked that answer on a EMR or YARN cluster myself but believe that may have given you some hints where to find the answer. Fingers crossed!

like image 192
Jacek Laskowski Avatar answered Oct 06 '22 00:10

Jacek Laskowski