Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Apache Spark Stderr and Stdout

Tags:

apache-spark

I am running spark-1.0.0 by connecting to a spark standalone cluster which has one master and two slaves. I ran wordcount.py by Spark-submit, actually it reads data from HDFS and also write the results into HDFS. So far everything is fine and the results will correctly be written into HDFS. But the thing makes me concern is that when I check Stdout for each worker, it is empty I dont know whether it is suppose to be empty? and I got following in stderr:

stderr log page for Some(app-20140704174955-0002)

Spark 
Executor Command: "java" "-cp" "::
/usr/local/spark-1.0.0/conf:
/usr/local/spark-1.0.0
/assembly/target/scala-2.10/spark-assembly-1.0.0-hadoop1.2.1.jar:/usr/local/hadoop/conf" "
-XX:MaxPermSize=128m" "-Xms512M" "-Xmx512M" "org.apache.spark.executor.CoarseGrainedExecutorBackend
" "akka.tcp://spark@master:54477/user/CoarseGrainedScheduler" "0" "slave2" "1
" "akka.tcp://sparkWorker@slave2:41483/user/Worker" "app-20140704174955-0002"
========================================


14/07/04 17:50:14 ERROR CoarseGrainedExecutorBackend: 
Driver Disassociated [akka.tcp://sparkExecutor@slave2:33758] -> 
[akka.tcp://spark@master:54477] disassociated! Shutting down.
like image 725
user3789843 Avatar asked Jul 04 '14 10:07

user3789843


2 Answers

Spark always writes everything, even INFO to stderr. People seem to do this to stop stdout buffering messages and causing less predictable logging. It's an acceptable practice when it's known that an application is never going to be used in bash scripting, so especially common for logging.

like image 170
samthebest Avatar answered Nov 13 '22 00:11

samthebest


Try this in log4j.properties passed to Spark (or modify default configuration under Spark/conf)

# Log to stdout and stderr
log4j.rootLogger=INFO, stdout, stderr

# Send TRACE - INFO level to stdout
log4j.appender.stdout=org.apache.log4j.ConsoleAppender
log4j.appender.stdout.Threshold=TRACE
log4j.appender.stdout.Target=System.out
log4j.appender.stdout.layout=org.apache.log4j.PatternLayout
log4j.appender.stdout.filter.filter1=org.apache.log4j.varia.LevelRangeFilter
log4j.appender.stdout.filter.filter1.levelMin=TRACE
log4j.appender.stdout.filter.filter1.levelMax=INFO
log4j.appender.stdout.layout.ConversionPattern=%d{yyyy-MM-dd HH:mm:ss} %-5p %c{1}:%L - %m%n

# Send WARN or higher to stderr
log4j.appender.stderr=org.apache.log4j.ConsoleAppender
log4j.appender.stderr.Threshold=WARN
log4j.appender.stderr.Target  =System.err
log4j.appender.stderr.layout=org.apache.log4j.PatternLayout
log4j.appender.stderr.layout.ConversionPattern=%d{yyyy-MM-dd HH:mm:ss} %-5p %c{1}:%L - %m%n

# Change this to set Spark log level
log4j.logger.org.apache.spark=WARN
log4j.logger.org.apache.spark.util=ERROR

Also, the progress bars shown at INFO level are sent to stderr.

Disable with

spark.ui.showConsoleProgress=false
like image 43
AssHat_ Avatar answered Nov 12 '22 23:11

AssHat_