How to get ID of a map task in Spark?

1 Answers

I am not sure what you mean by ID of map task but you can access task information using TaskContext:

import org.apache.spark.TaskContext

sc.parallelize(Seq[Int](), 4).mapPartitions(_ => {
    val ctx = TaskContext.get
    val stageId = ctx.stageId
    val partId = ctx.partitionId
    val hostname = java.net.InetAddress.getLocalHost().getHostName()
    Iterator(s"Stage: $stageId, Partition: $partId, Host: $hostname")
}).collect.foreach(println)

A similar functionality has been added to PySpark in Spark 2.2.0 (SPARK-18576):

from pyspark import TaskContext
import socket

def task_info(*_):
    ctx = TaskContext()
    return ["Stage: {0}, Partition: {1}, Host: {2}".format(
        ctx.stageId(), ctx.partitionId(), socket.gethostname())]

for x in sc.parallelize([], 4).mapPartitions(task_info).collect():
    print(x)

answered Sep 26 '22 03:09

zero323

Related questions
                            
                                Is there a "SELF" type in scala that represents the current type?
                            
                                How to perform pattern matching with vararg case classes?
                            
                                scala game programming: advancing object position in a functional style
                            
                                'val' or 'var', mutable or immutable?
                            
                                cassandra with scala
                            
                                Better version of "iterate over Seq or if empty" in scala?
                            
                                how to use asInstanceOf properly in Scala
                            
                                How to schedule an hourly job with Play Framework 2.1?
                            
                                Idiomatic way of treating Option[Boolean]
                            
                                Play framework input without label
                            
                                Try / Option with null
                            
                                Are polymorphic functions "restrictive" in Scala?
                            
                                akka: how to test that an actor was stopped
                            
                                Spark converting a Dataset to RDD
                            
                                Spark dataframe write method writing many small files
                            
                                Getting a Scala Map from a Java Properties
                            
                                LRUCache in Scala?
                            
                                How to find file size in scala?
                            
                                How do compiled queries in slick actually work?
                            
                                Spark - Random Number Generation

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to get ID of a map task in Spark?

Tags:

scala

apache-spark

hadoop

hadoop-yarn

MetallicPriest

People also ask

1 Answers

zero323

Recent Activity

Donate For Us