Hadoop YARN: How to force a Node to be Marked "LOST" instead of "SHUTDOWN"?

Tags:

I'm troubleshooting YARN application failures that happen when nodes are LOST, so I'm trying to recreate this scenario. But I'm only able to force nodes to be SHUTDOWN instead of LOST. I'm using AWS EMR, and I've tried:

logging into a node and doing a shutdown -h now
logging into a node and doing sudo stop hadoop-yarn-nodemanager and sudo stop hadoop-hdfs-datanode
killing the NodeManager with a kill -9 <pid>

Those result in SHUTDOWN nodes but not LOST nodes.

How do I create a LOST node in AWS EMR?

841

asked Feb 10 '21 21:02

gallamine

2 Answers

NodeManager is LOST means that ResourceManager haven't received heartbeats from it for a duration of nm.liveness-monitor.expiry-interval-ms milliseconds (default is 10 minutes). You may wanna try to block outbound traffic from NM node to RM's IP (or just the port if RM node runs multiple services), but I'm not sure how exactly that can be accomplished in AWS. Maybe use iptables, for example:

iptables -A OUTPUT -p tcp -d <RM's IP> --dport <RM's port> -j DROP

173

answered Oct 27 '22 00:10

mazaneicha

As I suggested in the comments, bringing the interface down on the node induces the node LOST scenario e.g:

ifconfig eth0 down

answered Oct 27 '22 00:10

Chris

Related questions
                            
                                Apache Spark Throws java.lang.IllegalStateException: unread block data
                            
                                Hadoop: HDFS File Writes & Reads
                            
                                Oozie Java Action : Passing Hbase classpath
                            
                                Why hive doesn't allow create external table with CTAS?
                            
                                Opening a port on HDInsight cluster on Azure
                            
                                Magic byte in Apache Kafka
                            
                                Apache Drill connection through Java
                            
                                How to set configuration in Hive-Site.xml file for hive metastore connection?
                            
                                How to decide when to use a Map-Side Join or Reduce-Side while writing an MR code in java?
                            
                                nutch 1.10 input path does not exist /linkdb/current
                            
                                parquet version used to write a file
                            
                                hive compaction using insert overwrite partition
                            
                                Hadoop name node format warning
                            
                                HDFS as volume in cloudera quickstart docker
                            
                                Using aws credentials profiles with spark scala app
                            
                                Kafka Streams with lookup data on HDFS
                            
                                Apache Spark: In SparkSql, are sql's vulnerable to Sql Injection [duplicate]
                            
                                How Blockchain is different from HDFS and how bitcoin mining is different from Map reduce or spark?
                            
                                use of "default" in avro schema
                            
                                hdfs moveFromLocal does not distribute replica blocks across data nodes

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Hadoop YARN: How to force a Node to be Marked "LOST" instead of "SHUTDOWN"?

Tags:

hadoop

hadoop-yarn

amazon-emr

gallamine

People also ask

2 Answers

mazaneicha

Chris

Recent Activity

Donate For Us