Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

pyspark show dataframe as table with horizontal scroll in ipython notebook

a pyspark.sql.DataFrame displays messy with DataFrame.show() - lines wrap instead of a scroll.

enter image description here

but displays with pandas.DataFrame.head enter image description here

I tried these options

import IPython
IPython.auto_scroll_threshold = 9999

from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"
from IPython.display import display

but no luck. Although the scroll works when used within Atom editor with jupyter plugin:

enter image description here

like image 490
muon Avatar asked Apr 15 '17 14:04

muon


5 Answers

this is a workaround

spark_df.limit(5).toPandas().head() 

although, I do not know the computational burden of this query. I am thinking limit() is not expensive. corrections welcome.

like image 72
muon Avatar answered Sep 20 '22 22:09

muon


Just add (and execute)

from IPython.core.display import HTML
display(HTML("<style>pre { white-space: pre !important; }</style>"))

And you'll get the df.show() with the scrollbar enter image description here

like image 41
jmPicaza Avatar answered Sep 19 '22 22:09

jmPicaza


I'm not sure if anyone's still facing the issue. But it could be resolved by tweaking some website settings using developer tools.

WHen you do enter image description here

Open developer setting (F12). and then inspect element (ctrl+shift+c) and click on the output. and uncheck whitespace attribute (see snapshot below) enter image description here

You just need to do this setting once. (unless you refresh the page)

This will show you the exact data natively as is. No need to convert to pandas.

like image 26
Vijay Jangir Avatar answered Sep 17 '22 22:09

Vijay Jangir


Just edit the css file and you are good to go.

  1. Open the jupyter notebook ../site-packages/notebook/static/style/style.min.css file.

  2. Search for white-space: pre-wrap;, and remove it.

  3. Save the file and restart jupyter-notebook.

Problem fixed. :)

like image 34
Karan Singla Avatar answered Sep 17 '22 22:09

Karan Singla


Adding to the answers given above by @karan-singla and @vijay-jangir, a handy one-liner to comment out the white-space: pre-wrap styling can be done like so:

$ awk -i inplace '/pre-wrap/ {$0="/*"$0"*/"}1' $(dirname `python -c "import notebook as nb;print(nb.__file__)"`)/static/style/style.min.css

This translates as; use awk to update inplace lines that contain pre-wrap to be surrounded by */ -- */ i.e. comment out, on the file found in styles.css found in your working Python environment.

This, in theory, can then be used as an alias if one uses multiple environments, say with Anaconda.

REFs:

  • https://stackoverflow.com/a/24884616/4521950

  • Save modifications in place with awk

like image 32
tallamjr Avatar answered Sep 18 '22 22:09

tallamjr