I create a pyspark dataframe and i want to see it in the SciView tab in PyCharm when i debug my code (like I used to do when i have worked with pandas). It says "Nothing to show" (the dataframe exists, I can see it when I use the show() command).
someone knows how to do it or maybe there is no integration between pycharm and pyspark dataframe in this case?
In the Variables tab of the Debug tool window, select an array or a DataFrame. Click a link View as Array/View as DataFrame to the right. Alternatively, you can choose View as Array or View as DataFrame from the context menu.
You can visualize a Spark dataframe in Jupyter notebooks by using the display(<dataframe-name>) function. The display() function is supported only on PySpark kernels. The Qviz framework supports 1000 rows and 100 columns. By default, the dataframe is visualized as a table.
To be able to run PySpark in PyCharm, you need to go into “Preferences” and “Project Structure” to “add Content Root”, where you specify the location of the python executable of apache-spark. Press “Apply” and “OK” after you are done. should be able to run within the PyCharm console.
PySpark users can access the full PySpark APIs by calling DataFrame. to_spark() . pandas-on-Spark DataFrame and Spark DataFrame are virtually interchangeable. However, note that a new default index is created when pandas-on-Spark DataFrame is created from Spark DataFrame.
Pycharm does not support spark dataframes, you should call the toPandas()
method on the dataframe. As @abhiieor mentioned in a comment, be aware that you can potentially collect a lot of data, you should first limit()
the number of rows returned.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With