Databricks display() function equivalent or alternative to Jupyter

Tags:

I'm in the process of migrating current DataBricks Spark notebooks to Jupyter notebooks, DataBricks provides convenient and beautiful display(data_frame) function to be able to visualize Spark dataframes and RDDs ,but there's no direct equivalent for Jupyter(im not sure but i think its a DataBricks specific function), i tried :

dataframe.show()

But it's a text version of it ,when you have many columns it breaks , so i'm trying to find an alternative to display() that can render Spark dataframes better than show() functions. Is there any equivalent or alternative to this?

285

asked Sep 08 '17 23:09

Luis Leal

2 Answers

When you use Jupyter, instead of using df.show() use myDF.limit(10).toPandas().head(). And, as sometimes, we are working multiple columns it truncates the view. So just set your Pandas view column config to the max.

# Alternative to Databricks display function.
import pandas as PD
pd.set_option('max_columns', None)

myDF.limit(10).toPandas().head() enter image description here

answered Sep 17 '22 13:09

AP-Big Data

First Recommendation: When you use Jupyter, don't use df.show() instead use df.limit(10).toPandas().head() which results perfect display even better Databricks display()

Second Recommendation: Zeppelin Notebook. Just use z.show(df.limit(10))

Additionally in Zeppelin;

You register your dataframe as SQL Table df.createOrReplaceTempView('tableName')
Insert new paragraph beginning %sql then query your table with amazing display.

answered Sep 16 '22 13:09

Erkan Şirin

Related questions
                            
                                Understanding parallelism in Spark and Scala
                            
                                How to read XML files from apache spark framework?
                            
                                Change hadoop version using spark-ec2
                            
                                Spark SQL HiveContext - saveAsTable creates wrong schema
                            
                                Iterate through a Java RDD by row
                            
                                Is Spark zipWithIndex safe with parallel implementation?
                            
                                spark submit java.lang.ClassNotFoundException
                            
                                Differentiate driver code and work code in Apache Spark
                            
                                Returning Multiple Arrays from User-Defined Aggregate Function (UDAF) in Apache Spark SQL
                            
                                Unit testing with Spark dataframes
                            
                                Apache spark Hive, executable JAR with maven shade
                            
                                Non linear (DAG) ML pipelines in Apache Spark
                            
                                Pyspark socket timeout exception after application running for a while
                            
                                Share config files with spark-submit in cluster mode
                            
                                Writing a sparkdataframe to a .csv file in S3 and choose a name in pyspark
                            
                                How to exclude jar in final sbt assembly plugin
                            
                                How can I tell if my spark job is progressing?
                            
                                Difference between spark-submit vs. SparkSession in python script?
                            
                                Spark ML Pipeline with RandomForest takes too long on 20MB dataset
                            
                                Understanding DAG in spark

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Databricks display() function equivalent or alternative to Jupyter

Tags:

jupyter-notebook

apache-spark

databricks

Luis Leal

People also ask

2 Answers

AP-Big Data

Erkan Şirin

Recent Activity

Donate For Us