How to turn off INFO logging in Spark?

People also ask

What is SparkContext in spark?

A SparkContext represents the connection to a Spark cluster, and can be used to create RDDs, accumulators and broadcast variables on that cluster. Only one SparkContext should be active per JVM. You must stop() the active SparkContext before creating a new one.

What is logging in spark?

Spark uses log4j as the standard library for its own logging. Everything that happens inside Spark gets logged to the shell console and to the configured underlying storage.

How do I enable debug mode in spark submit?

Also how to setup debug mode in spark-shelI. You can pass your own "log4j. properties" path to log messages and pass it to your spark shell command. Then run the spark-shell as following then you should see DEBUG messages.

Just execute this command in the spark directory:

cp conf/log4j.properties.template conf/log4j.properties

Edit log4j.properties:

# Set everything to be logged to the console
log4j.rootCategory=INFO, console
log4j.appender.console=org.apache.log4j.ConsoleAppender
log4j.appender.console.target=System.err
log4j.appender.console.layout=org.apache.log4j.PatternLayout
log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c{1}: %m%n

# Settings to quiet third party logs that are too verbose
log4j.logger.org.eclipse.jetty=WARN
log4j.logger.org.eclipse.jetty.util.component.AbstractLifeCycle=ERROR
log4j.logger.org.apache.spark.repl.SparkIMain$exprTyper=INFO
log4j.logger.org.apache.spark.repl.SparkILoop$SparkILoopInterpreter=INFO

Replace at the first line:

log4j.rootCategory=INFO, console

by:

log4j.rootCategory=WARN, console

Save and restart your shell. It works for me for Spark 1.1.0 and Spark 1.5.1 on OS X.

In Spark 2.0 you can also configure it dynamically for your application using setLogLevel:

    from pyspark.sql import SparkSession
    spark = SparkSession.builder.\
        master('local').\
        appName('foo').\
        getOrCreate()
    spark.sparkContext.setLogLevel('WARN')

In the pyspark console, a default spark session will already be available.

Inspired by the pyspark/tests.py I did

def quiet_logs(sc):
    logger = sc._jvm.org.apache.log4j
    logger.LogManager.getLogger("org"). setLevel( logger.Level.ERROR )
    logger.LogManager.getLogger("akka").setLevel( logger.Level.ERROR )

Calling this just after creating SparkContext reduced stderr lines logged for my test from 2647 to 163. However creating the SparkContext itself logs 163, up to

15/08/25 10:14:16 INFO SparkDeploySchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.0

and it's not clear to me how to adjust those programmatically.

Edit your conf/log4j.properties file and Change the following line:

   log4j.rootCategory=INFO, console

    log4j.rootCategory=ERROR, console

Another approach would be to :

Fireup spark-shell and type in the following:

import org.apache.log4j.Logger
import org.apache.log4j.Level

Logger.getLogger("org").setLevel(Level.OFF)
Logger.getLogger("akka").setLevel(Level.OFF)

You won't see any logs after that.

>>> log4j = sc._jvm.org.apache.log4j
>>> log4j.LogManager.getRootLogger().setLevel(log4j.Level.ERROR)

For PySpark, you can also set the log level in your scripts with sc.setLogLevel("FATAL"). From the docs:

Control our logLevel. This overrides any user-defined log settings. Valid log levels include: ALL, DEBUG, ERROR, FATAL, INFO, OFF, TRACE, WARN

Related questions
                            
                                Python debugging tips [closed]
                            
                                Flask-SQLalchemy update a row's information
                            
                                Add SUM of values of two LISTS into new LIST
                            
                                Why can't non-default arguments follow default arguments?
                            
                                Split string based on a regular expression
                            
                                Why aren't superclass __init__ methods automatically invoked?
                            
                                Convert timedelta to years?
                            
                                How can I remove a pytz timezone from a datetime object?
                            
                                Combining conda environment.yml with pip requirements.txt
                            
                                TensorFlow, why was python the chosen language?
                            
                                High performance fuzzy string comparison in Python, use Levenshtein or difflib [closed]
                            
                                What does |= (ior) do in Python?
                            
                                When should iteritems() be used instead of items()?
                            
                                In Python, how do I index a list with another list?
                            
                                How do I run a Python program in the Command Prompt in Windows 7?
                            
                                Extract first item of each sublist
                            
                                How to solve SyntaxError on autogenerated manage.py?
                            
                                How to enumerate an object's properties in Python? [duplicate]
                            
                                How to suppress Pandas Future warning ?
                            
                                How to log source file name and line number in Python

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to turn off INFO logging in Spark?

Tags:

python

scala

apache-spark

hadoop

pyspark

People also ask

Recent Activity

Donate For Us