Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I pretty print a data frame in Zeppelin/Spark/Scala?

Tags:

I am using Spark 2 and Scala 2.11 in a Zeppelin 0.7 notebook. I have a dataframe that I can print like this:

dfLemma.select("text", "lemma").show(20,false) 

and the output looks like:

+---------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+ |text                                                                                                                       |lemma                                                                                                                                                                  | +---------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+ |RT @Dope_Promo: When you and your crew beat your high scores on FUGLY FROG 😍🔥 https://time.com/Sxp3Onz1w8                    |[rt, @dope_promo, :, when, you, and, you, crew, beat, you, high, score, on, FUGLY, FROG, https://time.com/sxp3onz1w8]                                                      | |RT @axolROSE: Did yall just call Kermit the frog a lizard?  https://time.com/wDAEAEr1Ay                                        |[rt, @axolrose, :, do, yall, just, call, Kermit, the, frog, a, lizard, ?, https://time.com/wdaeaer1ay]                                                                     | 

I am trying to make the output nicer in Zeppelin, by:

val printcols= dfLemma.select("text", "lemma") println("%table " + printcols) 

which gives this output:

printcols: org.apache.spark.sql.DataFrame = [text: string, lemma: array<string>] 

and a new blank Zeppelin paragraph headed

[text: string, lemma: array] 

Is there a way of getting the dataframe to show as a nicely formatted table? TIA!

like image 770
schoon Avatar asked Jul 06 '17 10:07

schoon


People also ask

How do I print a Spark size data frame?

Similar to Python Pandas you can get the Size and Shape of the PySpark (Spark with Python) DataFrame by running count() action to get the number of rows on DataFrame and len(df. columns()) to get the number of columns.

How do I Preview Spark data frame?

You can visualize a Spark dataframe in Jupyter notebooks by using the display(<dataframe-name>) function. The display() function is supported only on PySpark kernels. The Qviz framework supports 1000 rows and 100 columns. By default, the dataframe is visualized as a table.

How do I add a filter to my Spark data frame?

The where() filter can be used on array collection column using array_contains(), Spark SQL function that checks if the array contains a value if present it returns true else false. The filter condition is applied on the dataframe consist of nested struct columns to filter the rows based on a nested column.

What does Spark DataFrame look like?

In Spark, DataFrames are the distributed collections of data, organized into rows and columns. Each column in a DataFrame has a name and an associated type. DataFrames are similar to traditional database tables, which are structured and concise.


1 Answers

In Zeppelin you can use z.show(df) to show a pretty table. Here's an example:

val df = Seq(   (1,1,1), (2,2,2), (3,3,3) ).toDF("first_column", "second_column", "third_column")  z.show(df) 

enter image description here

like image 192
Daniel de Paula Avatar answered Sep 22 '22 16:09

Daniel de Paula