log from spark udf to driver

Tags:

I have a simple UDF in databricks used in spark. I can't use println or log4j or something because it will get outputted to the execution, I need it in the driver. I have a very system log setup

Click to copy

var logMessage = ""

def log(msg: String){
  logMessage += msg + "\n"
}

def writeLog(file: String){
  println("start write")
  println(logMessage)
  println("end write")
}

def warning(msg: String){
  log("*WARNING* " + msg)
}

val CleanText = (s: int) => {
  log("I am in this UDF")
  s+2
}

sqlContext.udf.register("CleanText", CleanText)

How can I get this to function properly and log to driver?

881

asked Aug 28 '18 18:08

test acc

1 Answers

The closest mechanism in Apache Spark to what you're trying to do is accumulators. You can accumulate the log lines on the executors and access the result in driver:

Click to copy

// create a collection accumulator using the spark context:
val logLines: CollectionAccumulator[String] = sc.collectionAccumulator("log")

// log function adds a line to accumulator
def log(msg: String): Unit = logLines.add(msg)

// driver-side function can print the log using accumulator's *value*
def writeLog() {
  import scala.collection.JavaConverters._
  println("start write")
  logLines.value.asScala.foreach(println)
  println("end write")
}

val CleanText = udf((s: Int) => {
  log(s"I am in this UDF, got: $s")
  s+2
})

// use UDF in some transformation:
Seq(1, 2).toDF("a").select(CleanText($"a")).show()

writeLog()    
// prints: 
// start write
// I am in this UDF, got: 1
// I am in this UDF, got: 2
// end write

BUT: this isn't really recommended, especially not for logging purposes. If you log on every record, this accumulator would eventually crash your driver on OutOfMemoryError or just slow you down horribly.

Since you're using Databricks, I would check what options they support for log aggregation, or simply use the Spark UI to view the executor logs.

answered Oct 03 '22 03:10

Tzach Zohar

Related questions
                            
                                Implicit resolution of dependent types in Scala
                            
                                How to use dataflow text io dynamic destinations in java
                            
                                How to convert a `NonEmptyList[Either[Error, User]]` to `Either[Error, NonEmptyList[User]]` with cats?
                            
                                How to override `org.elasticsearch.client.FilterClient#doExecute()` in Scala?
                            
                                http => akka stream => http
                            
                                scala : creating directory and file
                            
                                Scala: Cats, OptionT[Future, T] and ApplicativeError
                            
                                Higher kinded type constructor with upper type bounds doesn't work if bound is abstract type member
                            
                                Why is it legal to call a method that takes Any without any argument?
                            
                                Error handling with Try match inside an udf - and log row where it failed
                            
                                Recommended way to access HBase using Scala
                            
                                Can a class extend itself?
                            
                                Functional listener with state
                            
                                What is an idiomatic way to filter out Either Left in an akka stream?
                            
                                Diffrence between extends from App and object contain main method in scala
                            
                                Mapping joined tables of same type in JOOQ
                            
                                SCALA: Which data structures are optimal in which situations when using ".contains()" or ".exists()"?
                            
                                Beginner: Scala type alias in Scala 2.10?
                            
                                How does Scala know the difference between "def foo" and "def foo()"?
                            
                                Difference when serializing a lazy val with or without @transient

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

log from spark udf to driver

Tags:

scala

apache-spark

databricks

test acc

People also ask

1 Answers

Tzach Zohar

Recent Activity

Donate For Us