Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Disable Ivy Logging when using Spark-submit

Calling Spark-submit will cause the default Ivy logs to display for fetched packages. While relevant for first launch, often caching strategies make logging with Cache Hits not as useful.

What is the best way to disable the logs?

Don't want to see things like:

Ivy Default Cache set to: /root/.ivy2/cache
The jars for the packages stored in: /root/.ivy2/jars
:: loading settings :: url = jar:file:/usr/local/spark-2.0.2-bin-hadoop2.4/jars/ivy-2.4.0.jar!/org/apache/ivy/core/settings/ivysettings.xml
com.amazonaws#aws-java-sdk added as a dependency
org.apache.hadoop#hadoop-aws added as a dependency
...
like image 748
deepelement Avatar asked Oct 19 '25 12:10

deepelement


1 Answers

Solution

On Spark 3.0+ the original answer doesn't work. I have spent an unreasonable amount of time trying to hide the Ivy startup messages and this is only thing that worked:

    with patch(
            "pyspark.java_gateway.Popen",
            side_effect=lambda *args, **kwargs: Popen(*args, **kwargs, stdout=open(os.devnull, 'wb'), stderr=open(os.devnull, 'wb')),
    ):
        spark: SparkSession = spark.builder.getOrCreate()

This is a very blunt, brittle instrument - it intercepts any calls to stdout and stderr that the jvm process creates during spark startup. Ivy writes most of it's report output to stderr, so you can't get away with just suppressing stdout.

Background on Failed Attempts

Newer versions of spark use log4j2. I was not able to make any perceivable impact by restricting the root logger so dug into the Ivy source code to find out what logging engine it used. It turns out when you drill down far enough it's using System.out.println() and a custom MessageLogger class - there are no references that I could find to log4j.

Looking at the spark docs, there is a way to override the ivysettings.xml file, which contains references to the "Report" writer (startup messages). However doing so effectively kills ivy unless you know exactly what to put in there and there was little information on how to change the report output anyway.

Moving on, I next tried to suppress python output of stderr and stdout. This had no impact - the assumption I made was because the spark process runs in a jvm subprocess, outside of the python flow. Thus, patching the call directly was the only way to go.

like image 192
Danten Avatar answered Oct 21 '25 13:10

Danten



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!