Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to find the size or shape of a DataFrame in PySpark?

I am trying to find out the size/shape of a DataFrame in PySpark. I do not see a single function that can do this.

In Python, I can do this:

data.shape() 

Is there a similar function in PySpark? This is my current solution, but I am looking for an element one

row_number = data.count() column_number = len(data.dtypes) 

The computation of the number of columns is not ideal...

like image 444
Xi Liang Avatar asked Sep 23 '16 04:09

Xi Liang


People also ask

How do you find the shape of a PySpark DataFrame?

To obtain the shape of a data frame in PySpark, you can obtain the number of rows through "DF. count()" and the number of columns through "len(DF. columns)".

How do you find DF shape?

To get the shape of Pandas DataFrame, use DataFrame. shape. The shape property returns a tuple representing the dimensionality of the DataFrame. The format of shape would be (rows, columns).


2 Answers

You can get its shape with:

print((df.count(), len(df.columns))) 
like image 175
George Fisher Avatar answered Oct 13 '22 17:10

George Fisher


Use df.count() to get the number of rows.

like image 27
VMEscoli Avatar answered Oct 13 '22 15:10

VMEscoli