PySpark -- Convert List of Rows to Data Frame

Question

The problem I'm actually trying to solve is to take the first/last N rows of a PySpark dataframe and have the result be a dataframe. Specifically, I want to be able to do something like this:

 my_df.head(20).toPandas()

However, because head() returns a list of rows, I get this error:

AttributeError: 'list' object has no attribute 'toPandas'

So, I'm looking either for method that will return the first N rows of PySpark dataframe as a dataframe, or a method for converting these lists of rows into a dataframe. Any ideas?

user6022341 · Accepted Answer

With limit:

>>> df = sc.parallelize((("a", 1), ("b", 2))).toDF()
>>> df.limit(1).toPandas()
  _1  _2
0  a   1

with pd.DataFrame:

>>> pd.DataFrame(df.head(1), columns=df.columns)
  _1  _2
0  a   1

PySpark -- Convert List of Rows to Data Frame

Tags:

python

apache-spark

pyspark

pyspark-sql

TuringMachin

1 Answers

user6022341

Recent Activity

Donate For Us

PySpark -- Convert List of Rows to Data Frame

Tags:

python

apache-spark

pyspark

pyspark-sql

TuringMachin

1 Answers

user6022341

Related questions

Recent Activity

Donate For Us