As asked, i would like to keep a trace of sparks master logs to keep errors logs when they happend. I know that there are the workers logs on the webUI, but i'm not sure they show the same kind of error than the master.
I find that we have to modify the conf/log4j.properties but my tries doesn't work..
Default configuration + add file :
# Set everything to be logged to the console
log4j.rootCategory=INFO, console, file
log4j.appender.console=org.apache.log4j.ConsoleAppender
log4j.appender.console.target=System.err
log4j.appender.console.layout=org.apache.log4j.PatternLayout
log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c{1}: %m%n
# Settings to quiet third party logs that are too verbose
log4j.logger.org.spark-project.jetty=WARN
log4j.logger.org.spark-
project.jetty.util.component.AbstractLifeCycle=ERROR
log4j.logger.org.apache.spark.repl.SparkIMain$exprTyper=INFO
log4j.logger.org.apache.spark.repl.SparkILoop$SparkILoopInterpreter=INFO
log4j.logger.org.apache.parquet=ERROR
log4j.logger.parquet=ERROR
# SPARK-9183: Settings to avoid annoying messages when looking up
nonexistent UDFs in SparkSQL with Hive support
log4j.logger.org.apache.hadoop.hive.metastore.RetryingHMSHandler=FATAL
log4j.logger.org.apache.hadoop.hive.ql.exec.FunctionRegistry=ERROR
Try to setup the file
###Custom log file
log4j.appender.file=org.apache.log4j.RollingFileAppender
log4j.appender.file.fileName=/var/data/log/MasterLogs/master.log
log4j.appender.file.ImmediateFlush=true
## Set the append to false, overwrite
log4j.appender.file.Append=false
log4j.appender.file.MaxFileSize=100MB
log4j.appender.file.MaxBackupIndex=10
##Define the layout for file appender
log4j.appender.file.layout=org.apache.log4j.PatternLayout
log4j.appender.file.layout.ConversionPattern=%d{yyyy-MM-dd HH:mm:ss} %-5p %c{1}:%L - %m%n
While debugging an application in Eclipse you might have noticed that important logs are lost from the console as its buffer size is reached. One of the ways you can get them is by redirecting them to an external text file, Go-to Eclipse Menu: Run -> Run Configuration. Go-to Common tab.
} To redirect the output from cmd to two files, but not to the console, you can use: This will work for multiple files, given any data source piping to tee: Without the /dev/null redirection, tee will send output to stdout in addition to the files specified. For example, if this is run from the console, you'll see the output there.
We covered several different ways to redirect docker logs to a file for saving and analysis. The best method is the one that works for you. In a development environment, the docker logs command is a powerful tool that works well with other command line tools. You can use Docker’s built-in tools to view, filter, and redirect logs to a file.
One of the ways you can get them is by redirecting them to an external text file, Go-to Eclipse Menu: Run -> Run Configuration. Under Standard Input and Output: Select Output file checkbox, enter log file name and path where you want logs to be redirected. Apply changes and run your application.
You need to create 2 log4j.properties
files for driver and executor. And also path them in java options of driver and executor while submit your application using spark submit as below
spark-submit --class MAIN_CLASS --driver-java-options "-Dlog4j.configuration=file:PATH_OF_LOG4J_PROPERTIES_FOR_DRIVER" --conf "spark.executor.extraJavaOptions=-Dlog4j.configuration=file:PATH_OF_LOG4J_PROPERTIES_FOR_EXECUTOR" --master MASTER_IP:PORT JAR_PATH
Here is an example of a log4j.properties
you might specify:
# Set everything to be logged to the console
log4j.rootCategory=INFO,FILE
log4j.appender.FILE=org.apache.log4j.FileAppender
log4j.appender.FILE.File={Enter path of the file}
log4j.appender.FILE.MaxFileSize=10MB
log4j.appender.FILE.MaxBackupIndex=10
log4j.appender.FILE.layout=org.apache.log4j.PatternLayout
log4j.appender.FILE.layout.ConversionPattern=%d{yyyy-MM-dd HH:mm:ss} %-5p %c{1}:%L – %m%n
# Settings to quiet third party logs that are too verbose
log4j.logger.org.spark-project.jetty=WARN
log4j.logger.org.spark-project.jetty.util.component.AbstractLifeCycle=ERROR
log4j.logger.org.apache.spark.repl.SparkIMain$exprTyper=INFO
log4j.logger.org.apache.spark.repl.SparkILoop$SparkILoopInterpreter=INFO
You can also check this blog for more details https://blog.knoldus.com/2016/02/23/logging-spark-application-on-standalone-cluster/
Follow this command.It will write output and console log into a file
hadoop@osboxes:~/spark-2.0.1-bin-hadoop2.7/bin$ ./spark-submit test.py > tempoutfile.txt 2>&1
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With