Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Display PySpark Dataframe as HTML Table in Juypyter Notebook

I'm trying to display a PySpark dataframe as an HTML table in a Jupyter Notebook, but all methods seem to be failing.

Using this method displays a text-formatted table:

import pandas
df.toPandas()

Using this method displays the HTML table as a string:

df.toPandas().to_html()

This prints the non-resolved HTML prettier, but it doesn't resolve into a table:

print(df.toPandas().to_html())

And, all of these

from IPython.display import display, HTML

HTML(df.toPandas().to_html())
print(HTML(df.toPandas().to_html()))
display(HTML(df.toPandas().to_html()))

Simply print this object description:

<IPython.core.display.HTML object>

Any other ideas I can try?

like image 566
nxl4 Avatar asked Feb 15 '19 15:02

nxl4


2 Answers

I ran into this issue using PySpark kernels within JupyterLab notebooks on AWS EMR clusters. I found that the sparkmagic command %%display solved the issue. For instance, my Jupyter cell would look like -

%%display
some_spark_df

Also worth pointing out that this errored if there were empty lines between the %%display and the variable.

However I'm not sure how to do the same with a pandas dataframe. That still returns the object description when using the PySpark kernel (as oppose to a pure Python3 kernel)

like image 107
mkirzon Avatar answered Sep 27 '22 23:09

mkirzon


so df.toPandas() really renders the dataframe as a html object, but my assumption is that you are looking for something else or are trying to get ride of the ellipses (...).

you can config pandas before to get ride of those, this is what i use to get ride of truncation at the column,row and field levels;

pd.set_option('display.max_colwidth', -1)
pd.set_option('display.max_rows', 500)
pd.set_option('display.max_columns',500)

Also you can use the method above but you are a little out of order, here is a quick little udf that i use;

from IPython.display import display, HTML
from pyspark.sql.functions import *

def printDf(sprkDF,records): 
    return HTML(sprkDF.limit(records).toPandas().to_html())

#printDf(df,10)

hope this helps.

like image 44
Travis Pfrommer Avatar answered Sep 27 '22 22:09

Travis Pfrommer