Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Getting the count of records in a data frame quickly

I have a dataframe with as many as 10 million records. How can I get a count quickly? df.count is taking a very long time.

like image 568
thunderhemu Avatar asked Sep 06 '16 20:09

thunderhemu


People also ask

How do I count the number of rows in a spark data frame?

For counting the number of distinct rows we are using distinct(). count() function which extracts the number of distinct rows from the Dataframe and storing it in the variable named as 'row' For counting the number of columns we are using df.

How do you use a count function in a data frame?

Pandas DataFrame count() Method The count() method counts the number of not empty values for each row, or column if you specify the axis parameter as axis='columns' , and returns a Series object with the result for each row (or column).

Which function is used to find number of rows in a data frame?

len() method is used to get the number of rows and number of columns individually.


1 Answers

It's going to take so much time anyway. At least the first time.

One way is to cache the dataframe, so you will be able to more with it, other than count.

E.g

df.cache() df.count() 

Subsequent operations don't take much time.

like image 189
Ravi Avatar answered Oct 13 '22 22:10

Ravi