Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Saving contents of df.show() as a string in spark-scala app

I need to save the output of df.show() as a string so that i can email it directly.

For ex., the below example taken from official spark docs,:

val df = spark.read.json("examples/src/main/resources/people.json")

// Displays the content of the DataFrame to stdout
df.show()
// +----+-------+
// | age|   name|
// +----+-------+
// |null|Michael|
// |  30|   Andy|
// |  19| Justin|
// +----+-------+

I need to save the above table as a string which is printed in the console. I did look at log4j to print the log, but couldnt come across any info on logging only the output.

Can someone help me with it?

like image 845
Omkar Avatar asked Jan 31 '18 16:01

Omkar


2 Answers

Workaround is to redirect standard output to variable:

val baos = new java.io.ByteArrayOutputStream();
val ps =  new java.io.PrintStream(baos);

val oldPs = Console.out
Console.setOut(ps)
df.show()
val content = baos.toString()
Console.setOut(oldPs)

Note that I have one deprecation warning here.

You can also re-implement method Dataset.showString, which generated data. It uses take in background. Maybe it's also a good moment to create PR to make showString public? :)

like image 73
T. Gawęda Avatar answered Nov 01 '22 19:11

T. Gawęda


scala.Console has a withOut method for this kind of thing:

val outCapture = new ByteArrayOutputStream
Console.withOut(outCapture) {
  df.show()
}
val result = new String(outCapture.toByteArray)
like image 43
Joe K Avatar answered Nov 01 '22 19:11

Joe K