The Spark API Doc's show how to get a pretty-print snippit from a dataset or dataframe sent to stdout.
Can this output be directed to a log4j logger? Alternately: can someone share code which will create output formatted similarly to the df.show()?
Is there a way to do this which allow stdout to go to the console both before and after pushing the .show() output to the logger?
http://spark.apache.org/docs/latest/sql-programming-guide.htm
val df = spark.read.json("examples/src/main/resources/people.json")
// Displays the content of the DataFrame to stdout
df.show()
// +----+-------+
// | age| name|
// +----+-------+
// |null|Michael|
// | 30| Andy|
// | 19| Justin|
// +----+-------+
The showString() function from teserecter comes from Spark code (Dataset.scala).
You can't use that function from your code because it's package private but you can place the following snippet in a file DatasetShims.scala in your source code and mix-in the trait in your classes to access the function.
package org.apache.spark.sql
trait DatasetShims {
implicit class DatasetHelper[T](ds: Dataset[T]) {
def toShowString(numRows: Int = 20, truncate: Int = 20, vertical: Boolean = false): String =
"\n" + ds.showString(numRows, truncate, vertical)
}
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With