I'm in the process of migrating current DataBricks Spark notebooks to Jupyter notebooks, DataBricks provides convenient and beautiful display(data_frame) function to be able to visualize Spark dataframes and RDDs ,but there's no direct equivalent for Jupyter(im not sure but i think its a DataBricks specific function), i tried :
dataframe.show()
But it's a text version of it ,when you have many columns it breaks , so i'm trying to find an alternative to display() that can render Spark dataframes better than show() functions. Is there any equivalent or alternative to this?
Notebooks in Azure Databricks are similar to Jupyter notebooks, but they have enhanced them quite a bit. Due to these enhancements, exploring our data is much easier. To create a notebook, on the left navigation click on “Workspace”.
You can visualize a Spark dataframe in Jupyter notebooks by using the display(<dataframe-name>) function. The display() function is supported only on PySpark kernels. The Qviz framework supports 1000 rows and 100 columns. By default, the dataframe is visualized as a table.
When you use Jupyter, instead of using df.show() use myDF.limit(10).toPandas().head(). And, as sometimes, we are working multiple columns it truncates the view. So just set your Pandas view column config to the max.
# Alternative to Databricks display function.
import pandas as PD
pd.set_option('max_columns', None)
myDF.limit(10).toPandas().head()
First Recommendation: When you use Jupyter, don't use df.show()
instead use df.limit(10).toPandas().head()
which results perfect display even better Databricks display()
Second Recommendation:
Zeppelin Notebook. Just use z.show(df.limit(10))
Additionally in Zeppelin;
df.createOrReplaceTempView('tableName')
%sql
then query your table with amazing display.If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With