How to specify the location of custom log4j.configuration when spark-submit to Amazon EMR?

Tags:

I am trying to run a spark job in EMR cluster.

I my spark-submit I have added configs to read from log4j.properties

--files log4j.properties --conf "spark.executor.extraJavaOptions=-Dlog4j.configuration=file:/log4j.properties"

Also I have added

log4j.rootLogger=INFO, file
log4j.appender.file=org.apache.log4j.RollingFileAppender
log4j.appender.file.File=/log/test.log
log4j.appender.file.MaxFileSize=10MB
log4j.appender.file.MaxBackupIndex=10
log4j.appender.file.layout=org.apache.log4j.PatternLayout
log4j.appender.file.layout.ConversionPattern=%d{yyyy-MM-dd HH:mm:ss.SSS} [%thread] %5p %c{7} - %m%n

in my log4j configurations.

Anyhow I see the logs in the console, though I don't see the log file generated. What am I doing wrong here ?

315

asked May 19 '17 01:05

nnc

1 Answers

Quoting spark-submit --help:

--files FILES Comma-separated list of files to be placed in the working directory of each executor. File paths of these files in executors can be accessed via SparkFiles.get(fileName).

That doesn't much say what to do with the FILES if you cannot use SparkFiles.get(fileName) (which you cannot for log4j).

Quoting SparkFiles.get's scaladoc:

Get the absolute path of a file added through SparkContext.addFile().

That does not give you much either, but suggest to have a look at the source code of SparkFiles.get:

def get(filename: String): String =
  new File(getRootDirectory(), filename).getAbsolutePath()

The nice thing about it is that getRootDirectory() uses an optional property or just the current working directory:

def getRootDirectory(): String =
  SparkEnv.get.driverTmpDir.getOrElse(".")

That gives as something to work on, doesn't it?

On the driver the so-called driverTmpDir directory should be easy to find in Environment tab of web UI (under Spark Properties for spark.files property or Classpath Entries marked as "Added By User" in Source column).

On executors, I'd assume a local directory so rather than using file:/log4j.properties I'd use

-Dlog4j.configuration=file://./log4j.properties

-Dlog4j.configuration=file:log4j.properties

Note the dot to specify the local working directory (in the first option) or no leading / (in the latter).

Don't forget about spark.driver.extraJavaOptions to set the Java options for the driver if that's something you haven't thought about yet. You've been focusing on executors only so far.

You may want to add -Dlog4j.debug=true to spark.executor.extraJavaOptions that is supposed to print what locations log4j uses to find log4j.properties.

I have not checked that answer on a EMR or YARN cluster myself but believe that may have given you some hints where to find the answer. Fingers crossed!

192

answered Oct 06 '22 00:10

Jacek Laskowski

Related questions
                            
                                JAVA - replaceAll in a regex with $1
                            
                                Maven build fails because of maven-surefire-plugin
                            
                                Instantiating new object vs. implementing a reset() method
                            
                                WireMock fails with NoSuchMethodError HttpServletResponse.getHeader
                            
                                Passing an options/arguments file to Maven compiler plugin
                            
                                Can't Autowire a Bean which is present in a dependent Jar
                            
                                Call javascript that contains 'const' from java?
                            
                                When to use multi-catch and when to use rethrow?
                            
                                Comparing hashed passwords with salt (bcrypt) always returns false
                            
                                Does the catch in try-with-resources cover the code in parentheses?
                            
                                Performance : BufferedOutputStream vs FileOutputStream in Java
                            
                                IInAppBillingService.java does not generate Java file in Android Studio
                            
                                Jooq entity mapping
                            
                                How to determine where to place the currency symbol when formatting a number?
                            
                                Xamarin android image URI to Byte array
                            
                                Why cant I make an enum's inner class public?
                            
                                Designing tail recursion using java 8
                            
                                Hibernate validations on save (insert) only
                            
                                How to put JSONB throught Postgres database with JPA
                            
                                Acknowledge Google Pub/Sub message on Apache Beam

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to specify the location of custom log4j.configuration when spark-submit to Amazon EMR?

Tags:

java

log4j

apache-spark

amazon-emr

nnc

People also ask

1 Answers

Jacek Laskowski

Recent Activity

Donate For Us