I have a dataframe with as many as 10 million records. How can I get a count quickly? <code>df.count</code> is taking a very long time.

It's going to take so much time anyway. At least the first time. One way is to cache the dataframe, so you will be able to more with it, other than count. E.g <pre class="prettyprint"><code>df.cache() df.count() </code></pre> Subsequent operations don't take much time.

Getting the count of records in a data frame quickly

1 Answers

It's going to take so much time anyway. At least the first time.

One way is to cache the dataframe, so you will be able to more with it, other than count.

E.g

df.cache() df.count()

Subsequent operations don't take much time.

189

answered Oct 13 '22 22:10

Ravi

Related questions
                            
                                Scala foreach strange behaviour
                            
                                How to set hadoop configuration values from pyspark
                            
                                How to set amount of Spark executors?
                            
                                How can I pattern match on a range in Scala?
                            
                                Increment for-loop by 2 in Scala
                            
                                How to define an Ordering in Scala?
                            
                                Why Some(null) isn't considered None?
                            
                                Most elegant repeat loop in Scala
                            
                                Scala maps -> operator
                            
                                Capitalize the first letter of every word in Scala
                            
                                Aggregating multiple columns with custom function in Spark
                            
                                Running Java gives "Error: could not open `C:\Program Files\Java\jre6\lib\amd64\jvm.cfg'"
                            
                                Using the "Prolog in Scala" to find available type class instances
                            
                                Static return type of Scala macros
                            
                                Is there a good GnuPG encryption library for Java/Scala? [closed]
                            
                                Are Options and named default arguments like oil and water in a Scala API?
                            
                                How to investigate objects/types/etc. from Scala REPL?
                            
                                Securing REST API on Play framework and OAuth2
                            
                                Specifying the filename when saving a DataFrame as a CSV [duplicate]
                            
                                Calling Java/Scala function from a task

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Getting the count of records in a data frame quickly

Tags:

scala

apache-spark

hadoop-streaming

thunderhemu

People also ask

1 Answers

Ravi

Recent Activity

Donate For Us