I have a dataframe with as many as 10 million records. How can I get a count quickly? df.count
is taking a very long time.
For counting the number of distinct rows we are using distinct(). count() function which extracts the number of distinct rows from the Dataframe and storing it in the variable named as 'row' For counting the number of columns we are using df.
Pandas DataFrame count() Method The count() method counts the number of not empty values for each row, or column if you specify the axis parameter as axis='columns' , and returns a Series object with the result for each row (or column).
len() method is used to get the number of rows and number of columns individually.
It's going to take so much time anyway. At least the first time.
One way is to cache the dataframe, so you will be able to more with it, other than count.
E.g
df.cache() df.count()
Subsequent operations don't take much time.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With