From what I understand for High availability in hadoop we need one Name Node and one Standby Node,Network shared Storage space(shared between two name nodes), at least 2 data nodes for running hadoop cluster.
Can we run dataNode server on the same machine which is running name node.
Can Yarn run on the machine which is running NameNode or dataNode server.
Please suggest if i am missing any other service which is necessary for production hadoop environment.
What should be the system requirements for name node as it is only handling metadata(I/O intensive of CPU Intensive). The data we are crunching is mostly I/O intensive.
For Hadoop HA - you need atleast two separate machine which can run Namenode and Namenode HA. So in theory you can have Hadoop HA cluster with atleast 2 machines. But that's not much useful in practical.
To answer your other question : 1. You can run DataNode service on the machine which runs Namenode service. This is general scenario in PoC cluster where you have small cluster (3-7nodes roughly) NOTE: You should use dedicated machines for Master services like Namenode in production as part of best practices.
Namenode mostly needs RAM which depends on your cluster data size and number blocks you have in your cluster or expected to have.Generally , your queries (CPU or I/O intensive) do not affect namenode system requirement.
For more service details refer :
http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HdfsDesign.html http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/YARN.html
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With