The problem I'm actually trying to solve is to take the first/last N rows of a PySpark dataframe and have the result be a dataframe. Specifically, I want to be able to do something like this:
my_df.head(20).toPandas()
However, because head()
returns a list of rows, I get this error:
AttributeError: 'list' object has no attribute 'toPandas'
So, I'm looking either for method that will return the first N rows of PySpark dataframe as a dataframe, or a method for converting these lists of rows into a dataframe. Any ideas?
With limit
:
>>> df = sc.parallelize((("a", 1), ("b", 2))).toDF()
>>> df.limit(1).toPandas()
_1 _2
0 a 1
with pd.DataFrame
:
>>> pd.DataFrame(df.head(1), columns=df.columns)
_1 _2
0 a 1
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With