Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Get current task ID in Spark in Java

I need to get the ID of the current task in Spark. I have been searching in Google and in the official API but the only IDs I can find are the executor ID and the ID of the RDD. Does anyone know how to get the unique ID of a task? I have seen that the class TaskInfo has exactly what I am looking for, but I do not know how to get an instance of this class.

like image 618
Balduz Avatar asked Oct 13 '14 13:10

Balduz


People also ask

How do I find my executor ID on Spark?

Use sc. getConf. get("spark.executor.id") to know where the code is executed — driver or executors. It sets the jars and files based on spark.

What is task in Apache spark?

Task is the smallest execution unit in Spark. A task in spark executes a series of instructions. For eg. reading data, filtering and applying map() on data can be combined into a task. Tasks are executed inside an executor.

How the tasks are created in spark?

A task in Spark is represented by the Task abstract class with two concrete implementations: ShuffleMapTask that executes a task and divides the task's output to multiple buckets (based on the task's partitioner). ResultTask that executes a task and sends the task's output back to the driver application.


1 Answers

In order to get the specific task ID you can use the TaskContext:

import org.apache.spark.TaskContext;

textFile.map( x -> {
    TaskContext tc = TaskContext.get();
    System.out.println(tc.taskAttemptId());
});

Bear in mind that the specific println will be printed on the node it is currently executed and not the drivers console.

like image 58
MitsakosGR Avatar answered Oct 06 '22 00:10

MitsakosGR