I'm using Spark 1.3.1.
I am trying to view the values of a Spark dataframe column in Python. With a Spark dataframe, I can do df.collect()
to view the contents of the dataframe, but there is no such method for a Spark dataframe column as best as I can see.
For example, the dataframe df
contains a column named 'zip_code'
. So I can do df['zip_code']
and it turns a pyspark.sql.dataframe.Column
type, but I can't find a way to view the values in df['zip_code']
.
show(): Function is used to show the Dataframe. n: Number of rows to display. truncate: Through this parameter we can tell the Output sink to display the full column content by setting truncate option to false, by default this value is true.
You can select the single or multiple columns of the Spark DataFrame by passing the column names you wanted to select to the select() function. Since DataFrame is immutable, this creates a new DataFrame with a selected columns. show() function is used to show the DataFrame contents.
1 Answer. Show activity on this post. put data_pandas in a cell and run that cell. It will display the content in output.
You can access underlying RDD
and map over it
df.rdd.map(lambda r: r.zip_code).collect()
You can also use select
if you don't mind results wrapped using Row
objects:
df.select('zip_code').collect()
Finally, if you simply want to inspect content then show
method should be enough:
df.select('zip_code').show()
You can simply write:
df.select('your column's name').show()
In your case here, it will be:
df.select('zip_code').show()
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With