Is there a way to convert a Spark Df (not RDD) to pandas DF I tried the following: <pre class="prettyprint"><code>var some_df = Seq( ("A", "no"), ("B", "yes"), ("B", "yes"), ("B", "no") ).toDF( "user_id", "phone_number") </code></pre> Code: <pre class="prettyprint"><code>%pyspark pandas_df = some_df.toPandas() </code></pre> Error: <pre class="prettyprint"><code> NameError: name 'some_df' is not defined </code></pre> Any suggestions.

In my case the following conversion from spark dataframe to pandas dataframe worked: <pre class="prettyprint"><code>pandas_df = spark_df.select("*").toPandas() </code></pre>

Convert a spark DataFrame to pandas DF

Tags:

pandas

apache-spark

apache-spark-sql

Is there a way to convert a Spark Df (not RDD) to pandas DF

I tried the following:

var some_df = Seq(
 ("A", "no"),
 ("B", "yes"),
 ("B", "yes"),
 ("B", "no")

 ).toDF(
"user_id", "phone_number")

Code:

%pyspark
pandas_df = some_df.toPandas()

Error:

 NameError: name 'some_df' is not defined

Any suggestions.

297

asked Jun 21 '18 00:06

data_person

2 Answers

following should work

Sample DataFrame

    some_df = sc.parallelize([
     ("A", "no"),
     ("B", "yes"),
     ("B", "yes"),
     ("B", "no")]
     ).toDF(["user_id", "phone_number"])

Converting DataFrame to Pandas DataFrame

    pandas_df = some_df.toPandas()

124

answered Oct 08 '22 02:10

Gaurang Shah

In my case the following conversion from spark dataframe to pandas dataframe worked:

pandas_df = spark_df.select("*").toPandas()

answered Oct 08 '22 00:10

Inna

Related questions
                            
                                Add months to a date in Pandas
                            
                                Sorting columns and selecting top n rows in each group pandas dataframe
                            
                                adding dummy columns to the original dataframe
                            
                                Get unique values from index column in MultiIndex
                            
                                In Pandas how do I convert a string of date strings to datetime objects and put them in a DataFrame?
                            
                                how to do a left,right and mid of a string in a pandas dataframe
                            
                                Calculate weighted average using a pandas/dataframe
                            
                                Comparing previous row values in Pandas DataFrame
                            
                                what's the inverse of the quantile function on a pandas Series?
                            
                                Pandas Timedelta in Days
                            
                                Pandas: Difference between pivot and pivot_table. Why is only pivot_table working?
                            
                                Is there a performance difference between Numpy and Pandas?
                            
                                Add new column in Pandas DataFrame Python [duplicate]
                            
                                Find unique values in a Pandas dataframe, irrespective of row or column location
                            
                                pandas - find first occurrence
                            
                                Pandas convert string to int
                            
                                How to create a DataFrame while preserving order of the columns?
                            
                                Export a Pandas dataframe as a table image
                            
                                pandas assign with new column name as string
                            
                                splitting at underscore in python and storing the first value

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With