I am trying to find out the size/shape of a DataFrame in PySpark. I do not see a single function that can do this.
In Python, I can do this:
data.shape()
Is there a similar function in PySpark? This is my current solution, but I am looking for an element one
row_number = data.count() column_number = len(data.dtypes)
The computation of the number of columns is not ideal...
To obtain the shape of a data frame in PySpark, you can obtain the number of rows through "DF. count()" and the number of columns through "len(DF. columns)".
To get the shape of Pandas DataFrame, use DataFrame. shape. The shape property returns a tuple representing the dimensionality of the DataFrame. The format of shape would be (rows, columns).
You can get its shape
with:
print((df.count(), len(df.columns)))
Use df.count()
to get the number of rows.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With