I am trying to run a spark job in EMR cluster.
I my spark-submit I have added configs to read from log4j.properties
--files log4j.properties --conf "spark.executor.extraJavaOptions=-Dlog4j.configuration=file:/log4j.properties"
Also I have added
log4j.rootLogger=INFO, file
log4j.appender.file=org.apache.log4j.RollingFileAppender
log4j.appender.file.File=/log/test.log
log4j.appender.file.MaxFileSize=10MB
log4j.appender.file.MaxBackupIndex=10
log4j.appender.file.layout=org.apache.log4j.PatternLayout
log4j.appender.file.layout.ConversionPattern=%d{yyyy-MM-dd HH:mm:ss.SSS} [%thread] %5p %c{7} - %m%n
in my log4j configurations.
Anyhow I see the logs in the console, though I don't see the log file generated. What am I doing wrong here ?
The file is named log4j. properties and is located in the $DGRAPH_HOME/dgraph-hdfs-agent/lib directory. The file defines the ROLLINGFILE appenders for the root logger and also sets the log level for the file.
Normally, there is a spark-defaults. conf file located in /etc/spark/conf after I create a spark cluster on EMR. Following the instructions from http://docs.aws.amazon.com//ElasticMapReduce/latest/ReleaseGuide/emr-configure-apps.html , i'm trying to add a jar to the driver and executor extraClassPath properties.
The log4j. properties file is a log4j configuration file which keeps properties in key-value pairs. By default, the LogManager looks for a file named log4j. properties in the CLASSPATH.
Quoting spark-submit --help
:
--files FILES Comma-separated list of files to be placed in the working directory of each executor. File paths of these files in executors can be accessed via
SparkFiles.get(fileName)
.
That doesn't much say what to do with the FILES
if you cannot use SparkFiles.get(fileName)
(which you cannot for log4j).
Quoting SparkFiles.get
's scaladoc:
Get the absolute path of a file added through
SparkContext.addFile()
.
That does not give you much either, but suggest to have a look at the source code of SparkFiles.get:
def get(filename: String): String =
new File(getRootDirectory(), filename).getAbsolutePath()
The nice thing about it is that getRootDirectory()
uses an optional property or just the current working directory:
def getRootDirectory(): String =
SparkEnv.get.driverTmpDir.getOrElse(".")
That gives as something to work on, doesn't it?
On the driver the so-called driverTmpDir
directory should be easy to find in Environment tab of web UI (under Spark Properties for spark.files
property or Classpath Entries marked as "Added By User" in Source column).
On executors, I'd assume a local directory so rather than using file:/log4j.properties
I'd use
-Dlog4j.configuration=file://./log4j.properties
or
-Dlog4j.configuration=file:log4j.properties
Note the dot to specify the local working directory (in the first option) or no leading /
(in the latter).
Don't forget about spark.driver.extraJavaOptions
to set the Java options for the driver if that's something you haven't thought about yet. You've been focusing on executors only so far.
You may want to add -Dlog4j.debug=true
to spark.executor.extraJavaOptions
that is supposed to print what locations log4j uses to find log4j.properties
.
I have not checked that answer on a EMR or YARN cluster myself but believe that may have given you some hints where to find the answer. Fingers crossed!
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With