I need to get the ID of the current task in Spark. I have been searching in Google and in the official API but the only IDs I can find are the executor ID and the ID of the RDD. Does anyone know how to get the unique ID of a task? I have seen that the class TaskInfo
has exactly what I am looking for, but I do not know how to get an instance of this class.
Use sc. getConf. get("spark.executor.id") to know where the code is executed — driver or executors. It sets the jars and files based on spark.
Task is the smallest execution unit in Spark. A task in spark executes a series of instructions. For eg. reading data, filtering and applying map() on data can be combined into a task. Tasks are executed inside an executor.
A task in Spark is represented by the Task abstract class with two concrete implementations: ShuffleMapTask that executes a task and divides the task's output to multiple buckets (based on the task's partitioner). ResultTask that executes a task and sends the task's output back to the driver application.
In order to get the specific task ID you can use the TaskContext:
import org.apache.spark.TaskContext;
textFile.map( x -> {
TaskContext tc = TaskContext.get();
System.out.println(tc.taskAttemptId());
});
Bear in mind that the specific println will be printed on the node it is currently executed and not the drivers console.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With