How to measure the execution time of a query on Spark

Question

I need to measure the execution time of query on Apache spark (Bluemix). What I tried:

import time

startTimeQuery = time.clock()
df = sqlContext.sql(query)
df.show()
endTimeQuery = time.clock()
runTimeQuery = endTimeQuery - startTimeQuery

Is it a good way? The time that I get looks too small relative to when I see the table.

shridharama · Accepted Answer

I use System.nanoTime wrapped around a helper function, like this -

def time[A](f: => A) = {
  val s = System.nanoTime
  val ret = f
  println("time: "+(System.nanoTime-s)/1e6+"ms")
  ret
}

time {
  df = sqlContext.sql(query)
  df.show()
}

Tyrone321 · Answer

To do it in a spark-shell (Scala), you can use spark.time().

See another response by me: https://stackoverflow.com/a/50289329/3397114

df = sqlContext.sql(query)
spark.time(df.show())

The output would be:

+----+----+
|col1|col2|
+----+----+
|val1|val2|
+----+----+
Time taken: xxx ms

Related: On Measuring Apache Spark Workload Metrics for Performance Troubleshooting.

How to measure the execution time of a query on Spark

Tags:

sql

time

apache-spark

ibm-cloud

Yakov

2 Answers

shridharama

Tyrone321

Recent Activity

Donate For Us

How to measure the execution time of a query on Spark

Tags:

sql

time

apache-spark

ibm-cloud

Yakov

2 Answers

shridharama

Tyrone321

Related questions

Recent Activity

Donate For Us