I working in a cluster where I do not have the permission to change the file log4j.properties to stop the info logging while using pyspark (as explained in first answer here.) The following solution as explained in the above question's first answer work for spark-shell (scala)
import org.apache.log4j.Logger
import org.apache.log4j.Level
But for spark with python (ie pyspark), it didn't work nor the following
Logger.getLogger("org").setLevel(Level.OFF)
Logger.getLogger("akka").setLevel(Level.OFF)
How can I stop the verbose printing of info in pyspark WITHOUT changing log4j.properties file?
Updating the configuration of Log4jAdd a file named log4j2. properties to $SPARK_HOME/conf . The code in Listing 1.1 is added to configure an appender that logs to stderr; any output to stdout and stderr is appended to Docker container logs. The last two lines set the format to JSON.
I used sc.setLogLevel("ERROR")
because I didn't have write access to our cluster's log4j.properties file. From the docs:
Control our logLevel. This overrides any user-defined log settings. Valid log levels include: ALL, DEBUG, ERROR, FATAL, INFO, OFF, TRACE, WARN
This helps for me:
import logging
s_logger = logging.getLogger('py4j.java_gateway')
s_logger.setLevel(logging.ERROR)
spark_context = SparkContext()
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With