I am using pyspark to read a parquet file like below:
my_df = sqlContext.read.parquet('hdfs://myPath/myDB.db/myTable/**')
Then when I do my_df.take(5)
, it will show [Row(...)]
, instead of a table format like when we use the pandas data frame.
Is it possible to display the data frame in a table format like pandas data frame? Thanks!
Spark show() – Display DataFrame Contents in Table. Spark DataFrame show() is used to display the contents of the DataFrame in a Table Row & Column Format. By default, it shows only 20 Rows and the column values are truncated at 20 characters.
You can visualize a Spark dataframe in Jupyter notebooks by using the display(<dataframe-name>) function. The display() function is supported only on PySpark kernels. The Qviz framework supports 1000 rows and 100 columns. By default, the dataframe is visualized as a table.
The show method does what you're looking for.
For example, given the following dataframe of 3 rows, I can print just the first two rows like this:
df = sqlContext.createDataFrame([("foo", 1), ("bar", 2), ("baz", 3)], ('k', 'v')) df.show(n=2)
which yields:
+---+---+ | k| v| +---+---+ |foo| 1| |bar| 2| +---+---+ only showing top 2 rows
As mentioned by @Brent in the comment of @maxymoo's answer, you can try
df.limit(10).toPandas()
to get a prettier table in Jupyter. But this can take some time to run if you are not caching the spark dataframe. Also, .limit()
will not keep the order of original spark dataframe.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With