Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Apache Spark logging within Scala

I am looking for a solution to be able to log additional data when executing code on Apache Spark Nodes that could help investigate later some issues that might appear during execution. Trying to use a traditional solution like for example com.typesafe.scalalogging.LazyLogging fails because the log instance cannot be serialized on a distributed environment like Apache Spark.

I've investigated this problem and for now the solution that I found was to use the org.apache.spark.Logging trait like this :

class SparkExample with Logging {   val someRDD = ...   someRDD.map {     rddElement => logInfo(s"$rddElement will be processed.")     doSomething(rddElement)   } } 

However it looks like the Logging trait is not a permanent solution for Apache Spark because it's marked as @DeveloperApi and the class documentation mentions:

This will likely be changed or removed in future releases.

I am wondering - are they any known logging solution that I can use and will allow me to log data when the RDDs are executed on Apache Spark nodes ?

@Later Edit : Some of the comments from below suggest to use Log4J. I've tried using Log4J but I'm still having issues when using logger from a Scala class (and not a Scala object). Here is my full code :

import org.apache.log4j.Logger import org.apache.spark._  object Main {  def main(args: Array[String]) {   new LoggingTestWithRDD().doTest()  } }  class LoggingTestWithRDD extends Serializable {    val log = Logger.getLogger(getClass.getName)    def doTest(): Unit = {    val conf = new SparkConf().setMaster("local[4]").setAppName("LogTest")    val spark = new SparkContext(conf)     val someRdd = spark.parallelize(List(1, 2, 3))    someRdd.map {      element =>        log.info(s"$element will be processed")        element + 1     }    spark.stop()  } 

}

The exception that I'm seeing is :

Exception in thread "main" org.apache.spark.SparkException: Task not serializable -> Caused by: java.io.NotSerializableException: org.apache.log4j.Logger

like image 680
Bogdan N Avatar asked Mar 23 '15 11:03

Bogdan N


People also ask

How do I log into my Spark application?

The individual login facility is available at the login page of Spark. For new registration, a individual can visit the login page of SPARK through the link www.spark.gov.in/webspark by using any browser.

What logging framework does Spark use?

Spark uses log4j as the standard library for its own logging. Everything that happens inside Spark gets logged to the shell console and to the configured underlying storage.

Does Apache spark require Scala?

Apache Spark is written in Scala. Hence, many if not most data engineers adopting Spark are also adopting Scala, while Python and R remain popular with data scientists. Fortunately, you don't need to master Scala to use Spark effectively.


1 Answers

You can use Akhil's solution proposed in
https://www.mail-archive.com/[email protected]/msg29010.html. I have used by myself and it works.

Akhil Das Mon, 25 May 2015 08:20:40 -0700
Try this way:

object Holder extends Serializable {          @transient lazy val log = Logger.getLogger(getClass.getName)     }   val someRdd = spark.parallelize(List(1, 2, 3)).foreach { element =>    Holder.log.info(element) } 
like image 142
florins Avatar answered Nov 06 '22 11:11

florins