I am using pyspark to read a parquet file like below: <pre class="prettyprint"><code>my_df = sqlContext.read.parquet('hdfs://myPath/myDB.db/myTable/**') </code></pre> Then when I do <code>my_df.take(5)</code>, it will show <code>[Row(...)]</code>, instead of a table format like when we use the pandas data frame. Is it possible to display the data frame in a table format like pandas data frame? Thanks!

The show method does what you're looking for. For example, given the following dataframe of 3 rows, I can print just the first two rows like this: <pre class="prettyprint"><code>df = sqlContext.createDataFrame([("foo", 1), ("bar", 2), ("baz", 3)], ('k', 'v')) df.show(n=2) </code></pre> which yields: <pre class="prettyprint"><code>+---+---+ | k| v| +---+---+ |foo| 1| |bar| 2| +---+---+ only showing top 2 rows </code></pre>

Pyspark: display a spark data frame in a table format

Tags:

python

pandas

pyspark

spark-dataframe

I am using pyspark to read a parquet file like below:

my_df = sqlContext.read.parquet('hdfs://myPath/myDB.db/myTable/**')

Then when I do my_df.take(5), it will show [Row(...)], instead of a table format like when we use the pandas data frame.

Is it possible to display the data frame in a table format like pandas data frame? Thanks!

846

asked Aug 21 '16 18:08

Edamame

2 Answers

The show method does what you're looking for.

For example, given the following dataframe of 3 rows, I can print just the first two rows like this:

df = sqlContext.createDataFrame([("foo", 1), ("bar", 2), ("baz", 3)], ('k', 'v')) df.show(n=2)

which yields:

+---+---+ |  k|  v| +---+---+ |foo|  1| |bar|  2| +---+---+ only showing top 2 rows

answered Sep 21 '22 15:09

eddies

As mentioned by @Brent in the comment of @maxymoo's answer, you can try

df.limit(10).toPandas()

to get a prettier table in Jupyter. But this can take some time to run if you are not caching the spark dataframe. Also, .limit() will not keep the order of original spark dataframe.

answered Sep 23 '22 15:09

Louis Yang

Related questions
                            
                                Convert PyTorch tensor to python list
                            
                                Check if input is a list/tuple of strings or a single string
                            
                                SQLAlchemy insert or update example
                            
                                List of tensor names in graph in Tensorflow
                            
                                PyTorch: How to change the learning rate of an optimizer at any given moment (no LR schedule)
                            
                                Call and receive output from Python script in Java?
                            
                                How do I manage third-party Python libraries with Google App Engine? (virtualenv? pip?)
                            
                                Altering an Enum field using Alembic
                            
                                check how many elements are equal in two numpy arrays python
                            
                                Convert python datetime to timestamp in milliseconds
                            
                                Why is there a difference between `0--3//2` and `--3//2`?
                            
                                Python, want logging with log rotation and compression
                            
                                How to fix Selenium WebDriverException: The browser appears to have exited before we could connect?
                            
                                Getting computer's UTC offset in Python
                            
                                Can not increment global variable from function in python [duplicate]
                            
                                Python: find position of element in array
                            
                                python argparse: unrecognized arguments
                            
                                unknown error: session deleted because of page crash from unknown error: cannot determine loading status from tab crashed with ChromeDriver Selenium
                            
                                How to show matplotlib plots?
                            
                                Ignore divide by 0 warning in NumPy

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With