Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Redirect Spark console logs into a file

Tags:

apache-spark

As asked, i would like to keep a trace of sparks master logs to keep errors logs when they happend. I know that there are the workers logs on the webUI, but i'm not sure they show the same kind of error than the master.

I find that we have to modify the conf/log4j.properties but my tries doesn't work..

Default configuration + add file :

# Set everything to be logged to the console
log4j.rootCategory=INFO, console, file
log4j.appender.console=org.apache.log4j.ConsoleAppender
log4j.appender.console.target=System.err
log4j.appender.console.layout=org.apache.log4j.PatternLayout
log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c{1}: %m%n

# Settings to quiet third party logs that are too verbose
log4j.logger.org.spark-project.jetty=WARN
log4j.logger.org.spark-
project.jetty.util.component.AbstractLifeCycle=ERROR
log4j.logger.org.apache.spark.repl.SparkIMain$exprTyper=INFO
log4j.logger.org.apache.spark.repl.SparkILoop$SparkILoopInterpreter=INFO
log4j.logger.org.apache.parquet=ERROR
log4j.logger.parquet=ERROR

# SPARK-9183: Settings to avoid annoying messages when looking up 
nonexistent UDFs in SparkSQL with Hive support
log4j.logger.org.apache.hadoop.hive.metastore.RetryingHMSHandler=FATAL
log4j.logger.org.apache.hadoop.hive.ql.exec.FunctionRegistry=ERROR

Try to setup the file

###Custom log file
log4j.appender.file=org.apache.log4j.RollingFileAppender
log4j.appender.file.fileName=/var/data/log/MasterLogs/master.log
log4j.appender.file.ImmediateFlush=true
## Set the append to false, overwrite
log4j.appender.file.Append=false
log4j.appender.file.MaxFileSize=100MB
log4j.appender.file.MaxBackupIndex=10
##Define the layout for file appender
log4j.appender.file.layout=org.apache.log4j.PatternLayout
log4j.appender.file.layout.ConversionPattern=%d{yyyy-MM-dd HH:mm:ss} %-5p %c{1}:%L - %m%n
like image 479
KyBe Avatar asked Apr 04 '17 11:04

KyBe


People also ask

How to redirect debug logs to an external text file?

While debugging an application in Eclipse you might have noticed that important logs are lost from the console as its buffer size is reached. One of the ways you can get them is by redirecting them to an external text file, Go-to Eclipse Menu: Run -> Run Configuration. Go-to Common tab.

How do I redirect CMD output to two files?

} To redirect the output from cmd to two files, but not to the console, you can use: This will work for multiple files, given any data source piping to tee: Without the /dev/null redirection, tee will send output to stdout in addition to the files specified. For example, if this is run from the console, you'll see the output there.

How to redirect Docker logs to a file?

We covered several different ways to redirect docker logs to a file for saving and analysis. The best method is the one that works for you. In a development environment, the docker logs command is a powerful tool that works well with other command line tools. You can use Docker’s built-in tools to view, filter, and redirect logs to a file.

How do I get a specific log file in Eclipse?

One of the ways you can get them is by redirecting them to an external text file, Go-to Eclipse Menu: Run -> Run Configuration. Under Standard Input and Output: Select Output file checkbox, enter log file name and path where you want logs to be redirected. Apply changes and run your application.


2 Answers

You need to create 2 log4j.properties files for driver and executor. And also path them in java options of driver and executor while submit your application using spark submit as below

spark-submit --class MAIN_CLASS --driver-java-options "-Dlog4j.configuration=file:PATH_OF_LOG4J_PROPERTIES_FOR_DRIVER" --conf "spark.executor.extraJavaOptions=-Dlog4j.configuration=file:PATH_OF_LOG4J_PROPERTIES_FOR_EXECUTOR" --master MASTER_IP:PORT JAR_PATH

Here is an example of a log4j.properties you might specify:

# Set everything to be logged to the console
log4j.rootCategory=INFO,FILE
log4j.appender.FILE=org.apache.log4j.FileAppender
log4j.appender.FILE.File={Enter path of the file}
log4j.appender.FILE.MaxFileSize=10MB
log4j.appender.FILE.MaxBackupIndex=10
log4j.appender.FILE.layout=org.apache.log4j.PatternLayout
log4j.appender.FILE.layout.ConversionPattern=%d{yyyy-MM-dd HH:mm:ss} %-5p %c{1}:%L – %m%n

# Settings to quiet third party logs that are too verbose
log4j.logger.org.spark-project.jetty=WARN
log4j.logger.org.spark-project.jetty.util.component.AbstractLifeCycle=ERROR
log4j.logger.org.apache.spark.repl.SparkIMain$exprTyper=INFO
log4j.logger.org.apache.spark.repl.SparkILoop$SparkILoopInterpreter=INFO

You can also check this blog for more details https://blog.knoldus.com/2016/02/23/logging-spark-application-on-standalone-cluster/

like image 127
Sandeep Purohit Avatar answered Oct 26 '22 00:10

Sandeep Purohit


Follow this command.It will write output and console log into a file

hadoop@osboxes:~/spark-2.0.1-bin-hadoop2.7/bin$ ./spark-submit test.py > tempoutfile.txt 2>&1

like image 36
y durga prasad Avatar answered Oct 25 '22 22:10

y durga prasad