The Spark API Doc's show how to get a pretty-print snippit from a dataset or dataframe sent to stdout.
Can this output be directed to a log4j logger? Alternately: can someone share code which will create output formatted similarly to the df.show()?
Is there a way to do this which allow stdout to go to the console both before and after pushing the .show() output to the logger?
http://spark.apache.org/docs/latest/sql-programming-guide.htm
val df = spark.read.json("examples/src/main/resources/people.json")
// Displays the content of the DataFrame to stdout
df.show()
// +----+-------+
// | age| name|
// +----+-------+
// |null|Michael|
// | 30| Andy|
// | 19| Justin|
// +----+-------+
The showString()
function from teserecter comes from Spark code (Dataset.scala
).
You can't use that function from your code because it's package private but you can place the following snippet in a file DatasetShims.scala
in your source code and mix-in the trait in your classes to access the function.
package org.apache.spark.sql
trait DatasetShims {
implicit class DatasetHelper[T](ds: Dataset[T]) {
def toShowString(numRows: Int = 20, truncate: Int = 20, vertical: Boolean = false): String =
"\n" + ds.showString(numRows, truncate, vertical)
}
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With