Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

how to redirect Scala Spark Dataset.show to log4j logger

The Spark API Doc's show how to get a pretty-print snippit from a dataset or dataframe sent to stdout.

Can this output be directed to a log4j logger? Alternately: can someone share code which will create output formatted similarly to the df.show()?

Is there a way to do this which allow stdout to go to the console both before and after pushing the .show() output to the logger?

http://spark.apache.org/docs/latest/sql-programming-guide.htm

val df = spark.read.json("examples/src/main/resources/people.json")

// Displays the content of the DataFrame to stdout
df.show()
// +----+-------+
// | age|   name|
// +----+-------+
// |null|Michael|
// |  30|   Andy|
// |  19| Justin|
// +----+-------+
like image 680
teserecter Avatar asked Jan 11 '17 20:01

teserecter


1 Answers

The showString() function from teserecter comes from Spark code (Dataset.scala).

You can't use that function from your code because it's package private but you can place the following snippet in a file DatasetShims.scala in your source code and mix-in the trait in your classes to access the function.

package org.apache.spark.sql

trait DatasetShims {
  implicit class DatasetHelper[T](ds: Dataset[T]) {
    def toShowString(numRows: Int = 20, truncate: Int = 20, vertical: Boolean = false): String =
      "\n" + ds.showString(numRows, truncate, vertical)
  }
}
like image 86
redsk Avatar answered Sep 25 '22 00:09

redsk