Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to get a string representation of DataFrame (as does Dataset.show)?

I need a useful string representation of a Spark dataframe. The one I get with df.show is great -- but I can't get that output as a string because the internal showString method called by show is private. Is there some way I can get a similar output without writing a method to duplicate this same functionality?

like image 741
Sasgorilla Avatar asked Jul 06 '18 22:07

Sasgorilla


1 Answers

showString is simply private[sql] that means that the code to access it has to be in the same package, i.e. org.apache.spark.sql.

The trick is to create a helper object that does belong to the org.apache.spark.sql package, but the single method we're about to create is not private (at any level).

I usually mimic what an instance method does with the very first input parameter as the target and the input parameters to match the target method.

package org.apache.spark.sql
object AccessShowString {
  def showString[T](df: Dataset[T],
      _numRows: Int, truncate: Int = 20, vertical: Boolean = false): String = {
    df.showString(_numRows, truncate, vertical)
  }
}

TIP Use paste -raw to copy and paste the code in spark-shell.

Let's use showString then.

import org.apache.spark.sql.AccessShowString.showString
val df = spark.range(10)
scala> println(showString(df, 10))
+---+
| id|
+---+
|  0|
|  1|
|  2|
|  3|
|  4|
|  5|
|  6|
|  7|
|  8|
|  9|
+---+
like image 187
Jacek Laskowski Avatar answered Oct 12 '22 01:10

Jacek Laskowski