I'm trying to display a PySpark dataframe as an HTML table in a Jupyter Notebook, but all methods seem to be failing.
Using this method displays a text-formatted table:
import pandas
df.toPandas()
Using this method displays the HTML table as a string:
df.toPandas().to_html()
This prints the non-resolved HTML prettier, but it doesn't resolve into a table:
print(df.toPandas().to_html())
And, all of these
from IPython.display import display, HTML
HTML(df.toPandas().to_html())
print(HTML(df.toPandas().to_html()))
display(HTML(df.toPandas().to_html()))
Simply print this object description:
<IPython.core.display.HTML object>
Any other ideas I can try?
I ran into this issue using PySpark kernels within JupyterLab notebooks on AWS EMR clusters. I found that the sparkmagic command %%display
solved the issue. For instance, my Jupyter cell would look like -
%%display
some_spark_df
Also worth pointing out that this errored if there were empty lines between the %%display
and the variable.
However I'm not sure how to do the same with a pandas dataframe. That still returns the object description when using the PySpark kernel (as oppose to a pure Python3 kernel)
so df.toPandas() really renders the dataframe as a html object, but my assumption is that you are looking for something else or are trying to get ride of the ellipses (...).
you can config pandas before to get ride of those, this is what i use to get ride of truncation at the column,row and field levels;
pd.set_option('display.max_colwidth', -1)
pd.set_option('display.max_rows', 500)
pd.set_option('display.max_columns',500)
Also you can use the method above but you are a little out of order, here is a quick little udf that i use;
from IPython.display import display, HTML
from pyspark.sql.functions import *
def printDf(sprkDF,records):
return HTML(sprkDF.limit(records).toPandas().to_html())
#printDf(df,10)
hope this helps.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With