I'm using Spark 1.3.1. I am trying to view the values of a Spark dataframe column in Python. With a Spark dataframe, I can do <code>df.collect()</code> to view the contents of the dataframe, but there is no such method for a Spark dataframe column as best as I can see. For example, the dataframe <code>df</code> contains a column named <code>'zip_code'</code>. So I can do <code>df['zip_code']</code> and it turns a <code>pyspark.sql.dataframe.Column</code> type, but I can't find a way to view the values in <code>df['zip_code']</code>.

You can access underlying <code>RDD</code> and map over it <pre class="prettyprint"><code>df.rdd.map(lambda r: r.zip_code).collect() </code></pre> You can also use <code>select</code> if you don't mind results wrapped using <code>Row</code> objects: <pre class="prettyprint"><code>df.select('zip_code').collect() </code></pre> Finally, if you simply want to inspect content then <code>show</code> method should be enough: <pre class="prettyprint"><code>df.select('zip_code').show() </code></pre>

You can simply write: <pre class="prettyprint"><code>df.select('your column's name').show() </code></pre> In your case here, it will be: <pre class="prettyprint"><code>df.select('zip_code').show() </code></pre>

Viewing the content of a Spark Dataframe Column

Tags:

python

dataframe

apache-spark

pyspark

I'm using Spark 1.3.1.

I am trying to view the values of a Spark dataframe column in Python. With a Spark dataframe, I can do df.collect() to view the contents of the dataframe, but there is no such method for a Spark dataframe column as best as I can see.

For example, the dataframe df contains a column named 'zip_code'. So I can do df['zip_code'] and it turns a pyspark.sql.dataframe.Column type, but I can't find a way to view the values in df['zip_code'].

287

asked Jun 29 '15 19:06

John Lin

2 Answers

You can access underlying RDD and map over it

df.rdd.map(lambda r: r.zip_code).collect()

You can also use select if you don't mind results wrapped using Row objects:

df.select('zip_code').collect()

Finally, if you simply want to inspect content then show method should be enough:

df.select('zip_code').show()

141

answered Sep 20 '22 17:09

zero323

You can simply write:

df.select('your column's name').show()

In your case here, it will be:

df.select('zip_code').show()

answered Sep 19 '22 17:09

Navid Roohani

Related questions
                            
                                When to apply(pd.to_numeric) and when to astype(np.float64) in python?
                            
                                How do I specify OrderedDict K,V types for Mypy type annotation?
                            
                                converting string to long in python
                            
                                delete items from a set while iterating over it
                            
                                Inserting a table name into a query gives sqlite3.OperationalError: near "?": syntax error
                            
                                Print program usage example with argparse module
                            
                                Wrapping python doctest results that are longer than 80 characters
                            
                                How to catch an exception in the for loop iterator
                            
                                Jupyter Notebook: interactive plot with widgets
                            
                                Documenting `tuple` return type in a function docstring for PyCharm type hinting
                            
                                What causes the error "_pickle.UnpicklingError: invalid load key, ' '."?
                            
                                matplotlib bar chart with dates
                            
                                logging setLevel, how it works
                            
                                sqlalchemy easy way to insert or update?
                            
                                How to find the line that is generating a Pandas SettingWithCopyWarning?
                            
                                How to extend Python class init
                            
                                Python LRU Cache Decorator Per Instance
                            
                                Import a Python library from Github
                            
                                How do I change the range of the x-axis with datetimes in matplotlib?
                            
                                ValueError: max() arg is an empty sequence

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With