I am very confused with setting up logging with Apache Spark. Apache spark used Log4j for logging and it generates huge amount of log data. Is there a way to setup log4j for spark logs and use logback for application log. I am quite conversant with logback but it seems spark only support log4j. Below piece of code was working fine till i introduced apache spark. Any help in this regard will be helpful.
import com.typesafe.scalalogging.LazyLogging
import scala.util.{Failure, Success}
import scala.xml.{Elem, XML}
object MainApp extends App with LazyLogging {
val currency = new YahooCurrencyLoader() with CurrencyParameters
val ccy = currency.getXML(currency.ccyUrl) match {
case Success(v) => XML.save("PreviousRun.xml",v); logger.info("XML has been saved for use")
case Failure(ex) => logger.error("XML extraction failed. Look at Yahoo extraction class. ${ex.getMessage}" )
}
val xmllocation: String = "./PreviousRun.xml"
val loadxml: Elem = XML.loadFile(xmllocation)
//print(loadxml)
//print(currency.findCurrency(loadxml,"GBP"))
logger.info("USD CAD Cross is " + currency.findCurrency(loadxml,"CAD").head)
I do not know if you use sbt
or maven
but that is where it should all start. Myself I use sbt
so I will give you an example how we have solved this problem.
That is true and it is really problematic if you do not want to use the same logging implementation. But there is help!
First, exclude the following libs from spark
dependencies:
log4j
slf4j-log4j12
For sbt
(using sbt-assembly
) it looks like this:
lazy val spark16 = Seq("spark-core", "spark-sql", "spark-hive")
.map("org.apache.spark" %% _ % "1.6.1")
.map(_.excludeAll(
ExclusionRule(name = "log4j"),
ExclusionRule(name = "slf4j-log4j12")
))
A detailed description can be found here: https://www.slf4j.org/legacy.html
And the module that is in our interest is: log4j-over-slf4j
The
log4j-over-slf4j
module contains replacements of most widely used log4j classes, namely org.apache.log4j.Category, org.apache.log4j.Logger, org.apache.log4j.Priority, org.apache.log4j.Level, org.apache.log4j.MDC, and org.apache.log4j.BasicConfigurator. These replacement classes redirect all work to their corresponding SLF4J classes.
So we can have all the logs redirected back to slf4j
from where some other logging implementation could pick it up.
Easy, simply add this dependency to your application
"org.slf4j" % "log4j-over-slf4j" % "1.7.25"
In our case it was (like yours) logback
, so we added it as dependency:
"ch.qos.logback" % "logback-classic" % "1.2.3"
Add some logback.xml
configuration to your classpath, for example in src/main/resources
and enjoy!
If you need help using Logback
while deploying your app with spark-submit
please follow this answer: https://stackoverflow.com/a/45480145/1549135
I have used the following imports:
import org.slf4j.Logger
import org.slf4j.LoggerFactory
Sample code as shown below.
Object SparkCode {
val logger = LoggerFactory.getLogger(this.getClass.getName)
def main(args: Array[String]) {
logger.info("Connection Started . ")
// Create Spark Context and go on..
}
}
And you are sorted.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With