Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to turn off INFO logging in Spark?

People also ask

What is SparkContext in spark?

A SparkContext represents the connection to a Spark cluster, and can be used to create RDDs, accumulators and broadcast variables on that cluster. Only one SparkContext should be active per JVM. You must stop() the active SparkContext before creating a new one.

What is logging in spark?

Spark uses log4j as the standard library for its own logging. Everything that happens inside Spark gets logged to the shell console and to the configured underlying storage.

How do I enable debug mode in spark submit?

Also how to setup debug mode in spark-shelI. You can pass your own "log4j. properties" path to log messages and pass it to your spark shell command. Then run the spark-shell as following then you should see DEBUG messages.


Just execute this command in the spark directory:

cp conf/log4j.properties.template conf/log4j.properties

Edit log4j.properties:

# Set everything to be logged to the console
log4j.rootCategory=INFO, console
log4j.appender.console=org.apache.log4j.ConsoleAppender
log4j.appender.console.target=System.err
log4j.appender.console.layout=org.apache.log4j.PatternLayout
log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c{1}: %m%n

# Settings to quiet third party logs that are too verbose
log4j.logger.org.eclipse.jetty=WARN
log4j.logger.org.eclipse.jetty.util.component.AbstractLifeCycle=ERROR
log4j.logger.org.apache.spark.repl.SparkIMain$exprTyper=INFO
log4j.logger.org.apache.spark.repl.SparkILoop$SparkILoopInterpreter=INFO

Replace at the first line:

log4j.rootCategory=INFO, console

by:

log4j.rootCategory=WARN, console

Save and restart your shell. It works for me for Spark 1.1.0 and Spark 1.5.1 on OS X.


In Spark 2.0 you can also configure it dynamically for your application using setLogLevel:

    from pyspark.sql import SparkSession
    spark = SparkSession.builder.\
        master('local').\
        appName('foo').\
        getOrCreate()
    spark.sparkContext.setLogLevel('WARN')

In the pyspark console, a default spark session will already be available.


Inspired by the pyspark/tests.py I did

def quiet_logs(sc):
    logger = sc._jvm.org.apache.log4j
    logger.LogManager.getLogger("org"). setLevel( logger.Level.ERROR )
    logger.LogManager.getLogger("akka").setLevel( logger.Level.ERROR )

Calling this just after creating SparkContext reduced stderr lines logged for my test from 2647 to 163. However creating the SparkContext itself logs 163, up to

15/08/25 10:14:16 INFO SparkDeploySchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.0

and it's not clear to me how to adjust those programmatically.


Edit your conf/log4j.properties file and Change the following line:

   log4j.rootCategory=INFO, console

to

    log4j.rootCategory=ERROR, console

Another approach would be to :

Fireup spark-shell and type in the following:

import org.apache.log4j.Logger
import org.apache.log4j.Level

Logger.getLogger("org").setLevel(Level.OFF)
Logger.getLogger("akka").setLevel(Level.OFF)

You won't see any logs after that.


>>> log4j = sc._jvm.org.apache.log4j
>>> log4j.LogManager.getRootLogger().setLevel(log4j.Level.ERROR)

For PySpark, you can also set the log level in your scripts with sc.setLogLevel("FATAL"). From the docs:

Control our logLevel. This overrides any user-defined log settings. Valid log levels include: ALL, DEBUG, ERROR, FATAL, INFO, OFF, TRACE, WARN