Using PySpark in a Jupyter notebook, the output of Spark's DataFrame.show
is low-tech compared to how Pandas DataFrames are displayed. I thought "Well, it does the job", until I got this:
The output is not adjusted to the width of the notebook, so that the lines wrap in an ugly way. Is there a way to customize this? Even better, is there a way to get output Pandas-style (without converting to pandas.DataFrame
obviously)?
To show the full data without any hiding, you can use pd. set_option('display. max_rows', 500) and pd.
The only way to show the full column content we are using show() function. show(): Function is used to show the Dataframe. n: Number of rows to display. truncate: Through this parameter we can tell the Output sink to display the full column content by setting truncate option to false, by default this value is true.
To obtain the shape of a data frame in PySpark, you can obtain the number of rows through "DF. count()" and the number of columns through "len(DF. columns)".
This is now possible natively as of Spark 2.4.0 by setting spark.sql.repl.eagerEval.enabled
to True
:
After playing around with my table which has a lot of columns I decided the best thing to do to get a feel for the data is to use:
df.show(n=5, truncate=False, vertical=True)
This displays it vertically without truncation and is the cleanest viewing I can come up with.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With