Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Hadoop YARN: How to force a Node to be Marked "LOST" instead of "SHUTDOWN"?

I'm troubleshooting YARN application failures that happen when nodes are LOST, so I'm trying to recreate this scenario. But I'm only able to force nodes to be SHUTDOWN instead of LOST. I'm using AWS EMR, and I've tried:

  • logging into a node and doing a shutdown -h now
  • logging into a node and doing sudo stop hadoop-yarn-nodemanager and sudo stop hadoop-hdfs-datanode
  • killing the NodeManager with a kill -9 <pid>

Those result in SHUTDOWN nodes but not LOST nodes.

How do I create a LOST node in AWS EMR?

like image 841
gallamine Avatar asked Feb 10 '21 21:02

gallamine


People also ask

What is yarn Resourcemanager?

YARN is an open source Apache project that stands for “Yet Another Resource Negotiator”. It is a Hadoop cluster manager that is responsible for allocating resources (such as cpu, memory, disk and network), for scheduling & monitoring jobs across the Hadoop cluster.

What is yarn Nodemanager local Dirs?

nodemanager. local-dirs. This setting specifies the directories to use as base directories for the containers run within YARN. For each application and container created in YARN, a set of directories will be created underneath these local directories. These are then cleaned up when the application completes.

Is yarn a Node Manager?

The Hadoop Yarn Node Manager is the per-machine/per-node framework agent who is responsible for containers, monitoring their resource usage and reporting the same to the ResourceManager.


2 Answers

NodeManager is LOST means that ResourceManager haven't received heartbeats from it for a duration of nm.liveness-monitor.expiry-interval-ms milliseconds (default is 10 minutes). You may wanna try to block outbound traffic from NM node to RM's IP (or just the port if RM node runs multiple services), but I'm not sure how exactly that can be accomplished in AWS. Maybe use iptables, for example:

iptables -A OUTPUT -p tcp -d <RM's IP> --dport <RM's port> -j DROP
like image 173
mazaneicha Avatar answered Oct 27 '22 00:10

mazaneicha


As I suggested in the comments, bringing the interface down on the node induces the node LOST scenario e.g:

ifconfig eth0 down

like image 43
Chris Avatar answered Oct 27 '22 00:10

Chris