Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Viewing the content of a Spark Dataframe Column

I'm using Spark 1.3.1.

I am trying to view the values of a Spark dataframe column in Python. With a Spark dataframe, I can do df.collect() to view the contents of the dataframe, but there is no such method for a Spark dataframe column as best as I can see.

For example, the dataframe df contains a column named 'zip_code'. So I can do df['zip_code'] and it turns a pyspark.sql.dataframe.Column type, but I can't find a way to view the values in df['zip_code'].

like image 287
John Lin Avatar asked Jun 29 '15 19:06

John Lin


People also ask

How do I display the contents of a DataFrame in Spark?

show(): Function is used to show the Dataframe. n: Number of rows to display. truncate: Through this parameter we can tell the Output sink to display the full column content by setting truncate option to false, by default this value is true.

How do I read a column from Spark data frame?

You can select the single or multiple columns of the Spark DataFrame by passing the column names you wanted to select to the select() function. Since DataFrame is immutable, this creates a new DataFrame with a selected columns. show() function is used to show the DataFrame contents.

How do I view the content of a data frame?

1 Answer. Show activity on this post. put data_pandas in a cell and run that cell. It will display the content in output.


2 Answers

You can access underlying RDD and map over it

df.rdd.map(lambda r: r.zip_code).collect() 

You can also use select if you don't mind results wrapped using Row objects:

df.select('zip_code').collect() 

Finally, if you simply want to inspect content then show method should be enough:

df.select('zip_code').show() 
like image 141
zero323 Avatar answered Sep 20 '22 17:09

zero323


You can simply write:

df.select('your column's name').show() 

In your case here, it will be:

df.select('zip_code').show() 
like image 23
Navid Roohani Avatar answered Sep 19 '22 17:09

Navid Roohani