I'm troubleshooting YARN application failures that happen when nodes are LOST, so I'm trying to recreate this scenario. But I'm only able to force nodes to be SHUTDOWN instead of LOST. I'm using AWS EMR, and I've tried:
shutdown -h now
sudo stop hadoop-yarn-nodemanager
and sudo stop hadoop-hdfs-datanode
kill -9 <pid>
Those result in SHUTDOWN nodes but not LOST nodes.
How do I create a LOST node in AWS EMR?
YARN is an open source Apache project that stands for “Yet Another Resource Negotiator”. It is a Hadoop cluster manager that is responsible for allocating resources (such as cpu, memory, disk and network), for scheduling & monitoring jobs across the Hadoop cluster.
nodemanager. local-dirs. This setting specifies the directories to use as base directories for the containers run within YARN. For each application and container created in YARN, a set of directories will be created underneath these local directories. These are then cleaned up when the application completes.
The Hadoop Yarn Node Manager is the per-machine/per-node framework agent who is responsible for containers, monitoring their resource usage and reporting the same to the ResourceManager.
NodeManager is LOST
means that ResourceManager haven't received heartbeats from it for a duration of nm.liveness-monitor.expiry-interval-ms
milliseconds (default is 10 minutes). You may wanna try to block outbound traffic from NM node to RM's IP (or just the port if RM node runs multiple services), but I'm not sure how exactly that can be accomplished in AWS. Maybe use iptables, for example:
iptables -A OUTPUT -p tcp -d <RM's IP> --dport <RM's port> -j DROP
As I suggested in the comments, bringing the interface down on the node induces the node LOST scenario e.g:
ifconfig eth0 down
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With