Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Separating application logs in Logback from Spark Logs in log4j

I have a Scala Maven project using that uses Spark, and I am trying implement logging using Logback. I am compiling my application to a jar, and deploying to an EC2 instance where the Spark distribution is installed. My pom.xml includes dependencies for Spark and Logback as follows:

        <dependency>
            <groupId>ch.qos.logback</groupId>
            <artifactId>logback-classic</artifactId>
            <version>1.1.7</version>
        </dependency>
        <dependency>
            <groupId>org.slf4j</groupId>
            <artifactId>log4j-over-slf4j</artifactId>
            <version>1.7.7</version>
        </dependency>
        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-core_${scala.binary.version}</artifactId>
            <version>${spark.version}</version>
            <exclusions>
                <exclusion>
                    <groupId>org.slf4j</groupId>
                    <artifactId>slf4j-log4j12</artifactId>
                </exclusion>
                <exclusion>
                    <groupId>log4j</groupId>
                    <artifactId>log4j</artifactId>
                </exclusion>
            </exclusions>
        </dependency>

When submit my Spark application, I print out the slf4j binding on the command line. If I execute the jars code using java, the binding is to Logback. If I use Spark (i.e. spark-submit), however, the binding is to log4j.

  val logger: Logger = LoggerFactory.getLogger(this.getClass)
  val sc: SparkContext = new SparkContext()
  val rdd = sc.textFile("myFile.txt")

  val slb: StaticLoggerBinder = StaticLoggerBinder.getSingleton
  System.out.println("Logger Instance: " + slb.getLoggerFactory)
  System.out.println("Logger Class Type: " + slb.getLoggerFactoryClassStr)

yields

Logger Instance: org.slf4j.impl.Log4jLoggerFactory@a64e035
Logger Class Type: org.slf4j.impl.Log4jLoggerFactory

I understand that both log4j-1.2.17.jar and slf4j-log4j12-1.7.16.jar are in /usr/local/spark/jars, and that Spark is most likely referencing these jars despite the exclusion in my pom.xml, because if I delete them I am given a ClassNotFoundException at runtime of spark-submit.

My question is: Is there a way to implement native logging in my application using Logback while preserving Spark's internal logging capabilities. Ideally, I'd like to write my Logback application logs to a file and allow Spark logs to still be shown at STDOUT.

like image 818
sbrannon Avatar asked Feb 09 '17 00:02

sbrannon


1 Answers

After much struggle I've found another solution: library shading.

After I've shaded org.slf4j, my application logs are separated from spark logs. Furthermore, logback.xml in my application jar is honored.

Here you can find information on library shading in sbt, in this case it comes down to putting:

assemblyShadeRules in assembly += ShadeRule.rename("org.slf4j.**" -> "your_favourite_prefix.@0").inAll

in your build.sbt settings.


Side note: If you are not sure whether shading actually happened, open your jar in some archive browser and check whether directory structure reflects shaded one, in this case your jar should contain path /your_favourite_prefix/org/slf4j, but not /org/slf4j

like image 76
matemaciek Avatar answered Sep 30 '22 13:09

matemaciek