I need a useful string representation of a Spark dataframe. The one I get with df.show
is great -- but I can't get that output as a string because the internal showString
method called by show
is private. Is there some way I can get a similar output without writing a method to duplicate this same functionality?
showString
is simply private[sql] that means that the code to access it has to be in the same package, i.e. org.apache.spark.sql
.
The trick is to create a helper object that does belong to the org.apache.spark.sql
package, but the single method we're about to create is not private
(at any level).
I usually mimic what an instance method does with the very first input parameter as the target and the input parameters to match the target method.
package org.apache.spark.sql
object AccessShowString {
def showString[T](df: Dataset[T],
_numRows: Int, truncate: Int = 20, vertical: Boolean = false): String = {
df.showString(_numRows, truncate, vertical)
}
}
TIP Use paste -raw
to copy and paste the code in spark-shell
.
Let's use showString
then.
import org.apache.spark.sql.AccessShowString.showString
val df = spark.range(10)
scala> println(showString(df, 10))
+---+
| id|
+---+
| 0|
| 1|
| 2|
| 3|
| 4|
| 5|
| 6|
| 7|
| 8|
| 9|
+---+
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With