Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to turn off INFO from logs in PySpark with no changes to log4j.properties?

I working in a cluster where I do not have the permission to change the file log4j.properties to stop the info logging while using pyspark (as explained in first answer here.) The following solution as explained in the above question's first answer work for spark-shell (scala)

import org.apache.log4j.Logger
import org.apache.log4j.Level

But for spark with python (ie pyspark), it didn't work nor the following

Logger.getLogger("org").setLevel(Level.OFF)
Logger.getLogger("akka").setLevel(Level.OFF)

How can I stop the verbose printing of info in pyspark WITHOUT changing log4j.properties file?

like image 832
hmi2015 Avatar asked Sep 10 '15 22:09

hmi2015


People also ask

How do I use log4j in PySpark?

Updating the configuration of Log4jAdd a file named log4j2. properties to $SPARK_HOME/conf . The code in Listing 1.1 is added to configure an appender that logs to stderr; any output to stdout and stderr is appended to Docker container logs. The last two lines set the format to JSON.


2 Answers

I used sc.setLogLevel("ERROR") because I didn't have write access to our cluster's log4j.properties file. From the docs:

Control our logLevel. This overrides any user-defined log settings. Valid log levels include: ALL, DEBUG, ERROR, FATAL, INFO, OFF, TRACE, WARN

like image 189
Galen Long Avatar answered Sep 28 '22 11:09

Galen Long


This helps for me:

import logging
s_logger = logging.getLogger('py4j.java_gateway')
s_logger.setLevel(logging.ERROR)
spark_context = SparkContext()   
like image 37
Oleg Ladygin Avatar answered Sep 28 '22 09:09

Oleg Ladygin