Redirect Spark console logs into a file

Tags:

apache-spark

As asked, i would like to keep a trace of sparks master logs to keep errors logs when they happend. I know that there are the workers logs on the webUI, but i'm not sure they show the same kind of error than the master.

I find that we have to modify the conf/log4j.properties but my tries doesn't work..

Default configuration + add file :

# Set everything to be logged to the console
log4j.rootCategory=INFO, console, file
log4j.appender.console=org.apache.log4j.ConsoleAppender
log4j.appender.console.target=System.err
log4j.appender.console.layout=org.apache.log4j.PatternLayout
log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c{1}: %m%n

# Settings to quiet third party logs that are too verbose
log4j.logger.org.spark-project.jetty=WARN
log4j.logger.org.spark-
project.jetty.util.component.AbstractLifeCycle=ERROR
log4j.logger.org.apache.spark.repl.SparkIMain$exprTyper=INFO
log4j.logger.org.apache.spark.repl.SparkILoop$SparkILoopInterpreter=INFO
log4j.logger.org.apache.parquet=ERROR
log4j.logger.parquet=ERROR

# SPARK-9183: Settings to avoid annoying messages when looking up 
nonexistent UDFs in SparkSQL with Hive support
log4j.logger.org.apache.hadoop.hive.metastore.RetryingHMSHandler=FATAL
log4j.logger.org.apache.hadoop.hive.ql.exec.FunctionRegistry=ERROR

Try to setup the file

###Custom log file
log4j.appender.file=org.apache.log4j.RollingFileAppender
log4j.appender.file.fileName=/var/data/log/MasterLogs/master.log
log4j.appender.file.ImmediateFlush=true
## Set the append to false, overwrite
log4j.appender.file.Append=false
log4j.appender.file.MaxFileSize=100MB
log4j.appender.file.MaxBackupIndex=10
##Define the layout for file appender
log4j.appender.file.layout=org.apache.log4j.PatternLayout
log4j.appender.file.layout.ConversionPattern=%d{yyyy-MM-dd HH:mm:ss} %-5p %c{1}:%L - %m%n

479

asked Apr 04 '17 11:04

KyBe

2 Answers

You need to create 2 log4j.properties files for driver and executor. And also path them in java options of driver and executor while submit your application using spark submit as below

spark-submit --class MAIN_CLASS --driver-java-options "-Dlog4j.configuration=file:PATH_OF_LOG4J_PROPERTIES_FOR_DRIVER" --conf "spark.executor.extraJavaOptions=-Dlog4j.configuration=file:PATH_OF_LOG4J_PROPERTIES_FOR_EXECUTOR" --master MASTER_IP:PORT JAR_PATH

Here is an example of a log4j.properties you might specify:

# Set everything to be logged to the console
log4j.rootCategory=INFO,FILE
log4j.appender.FILE=org.apache.log4j.FileAppender
log4j.appender.FILE.File={Enter path of the file}
log4j.appender.FILE.MaxFileSize=10MB
log4j.appender.FILE.MaxBackupIndex=10
log4j.appender.FILE.layout=org.apache.log4j.PatternLayout
log4j.appender.FILE.layout.ConversionPattern=%d{yyyy-MM-dd HH:mm:ss} %-5p %c{1}:%L – %m%n

# Settings to quiet third party logs that are too verbose
log4j.logger.org.spark-project.jetty=WARN
log4j.logger.org.spark-project.jetty.util.component.AbstractLifeCycle=ERROR
log4j.logger.org.apache.spark.repl.SparkIMain$exprTyper=INFO
log4j.logger.org.apache.spark.repl.SparkILoop$SparkILoopInterpreter=INFO

You can also check this blog for more details https://blog.knoldus.com/2016/02/23/logging-spark-application-on-standalone-cluster/

127

answered Oct 26 '22 00:10

Sandeep Purohit

Follow this command.It will write output and console log into a file

hadoop@osboxes:~/spark-2.0.1-bin-hadoop2.7/bin$ ./spark-submit test.py > tempoutfile.txt 2>&1

answered Oct 25 '22 22:10

y durga prasad

Related questions
                            
                                Cannot load main class from JAR file in Spark Submit
                            
                                Spark job did not find table in Hive database
                            
                                Kryo serializer causing exception on underlying Scala class WrappedArray
                            
                                Calculate the running time for spark sql
                            
                                Spark: Is receiver in spark streaming a bottleneck?
                            
                                reduce() vs. fold() in Apache Spark
                            
                                How to convert column to vector type?
                            
                                java.lang.OutOfMemoryError in pyspark
                            
                                Scala-Spark Dynamically call groupby and agg with parameter values
                            
                                How to count number of occurrences by using pyspark
                            
                                How to install Apache Toree for Spark Kernel in Jupyter in (ana)conda environment?
                            
                                Spark random forest binary classifier metrics
                            
                                Spark History Server on S3A FileSystem: ClassNotFoundException
                            
                                Hive on Spark list all partitions for specific hive table and adding a partition
                            
                                value read is not a member of org.apache.spark.SparkContext
                            
                                scala.MatchError: [Ljava.lang.String; (of class [Ljava.lang.String;)
                            
                                Inserting Data Into Cassandra table Using Spark DataFrame
                            
                                foreach function not working in Spark DataFrame
                            
                                Dropping columns by data type in Scala Spark
                            
                                Spark: unpersist RDDs for which I have lost the reference

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With