Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Databricks display() function equivalent or alternative to Jupyter

I'm in the process of migrating current DataBricks Spark notebooks to Jupyter notebooks, DataBricks provides convenient and beautiful display(data_frame) function to be able to visualize Spark dataframes and RDDs ,but there's no direct equivalent for Jupyter(im not sure but i think its a DataBricks specific function), i tried :

dataframe.show()

But it's a text version of it ,when you have many columns it breaks , so i'm trying to find an alternative to display() that can render Spark dataframes better than show() functions. Is there any equivalent or alternative to this?

like image 285
Luis Leal Avatar asked Sep 08 '17 23:09

Luis Leal


People also ask

Is Databricks notebook same as Jupyter notebook?

Notebooks in Azure Databricks are similar to Jupyter notebooks, but they have enhanced them quite a bit. Due to these enhancements, exploring our data is much easier. To create a notebook, on the left navigation click on “Workspace”.

How do you use display in PySpark?

You can visualize a Spark dataframe in Jupyter notebooks by using the display(<dataframe-name>) function. The display() function is supported only on PySpark kernels. The Qviz framework supports 1000 rows and 100 columns. By default, the dataframe is visualized as a table.


2 Answers

When you use Jupyter, instead of using df.show() use myDF.limit(10).toPandas().head(). And, as sometimes, we are working multiple columns it truncates the view. So just set your Pandas view column config to the max.

# Alternative to Databricks display function.
import pandas as PD
pd.set_option('max_columns', None)

myDF.limit(10).toPandas().head()enter image description here

like image 56
AP-Big Data Avatar answered Sep 17 '22 13:09

AP-Big Data


First Recommendation: When you use Jupyter, don't use df.show() instead use df.limit(10).toPandas().head() which results perfect display even better Databricks display()

Second Recommendation: Zeppelin Notebook. Just use z.show(df.limit(10))

Additionally in Zeppelin;

  1. You register your dataframe as SQL Table df.createOrReplaceTempView('tableName')
  2. Insert new paragraph beginning %sql then query your table with amazing display.
like image 44
Erkan Şirin Avatar answered Sep 16 '22 13:09

Erkan Şirin