I need to measure the execution time of query on Apache spark (Bluemix). What I tried:
import time
startTimeQuery = time.clock()
df = sqlContext.sql(query)
df.show()
endTimeQuery = time.clock()
runTimeQuery = endTimeQuery - startTimeQuery
Is it a good way? The time that I get looks too small relative to when I see the table.
I use System.nanoTime wrapped around a  helper function, like this - 
def time[A](f: => A) = {
  val s = System.nanoTime
  val ret = f
  println("time: "+(System.nanoTime-s)/1e6+"ms")
  ret
}
time {
  df = sqlContext.sql(query)
  df.show()
}
To do it in a spark-shell (Scala), you can use spark.time().
See another response by me: https://stackoverflow.com/a/50289329/3397114
df = sqlContext.sql(query)
spark.time(df.show())
The output would be:
+----+----+
|col1|col2|
+----+----+
|val1|val2|
+----+----+
Time taken: xxx ms
Related: On Measuring Apache Spark Workload Metrics for Performance Troubleshooting.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With