Assume that there is a Hadoop Cluster that has 20 machines. Out of those 20 machines 18 machines are slaves and machine 19 is for NameNode and machine 20 is for JobTracker.
Now i know that hadoop software has to be installed in all those 20 machines.
but my question is which machine is involved to load a file xyz.txt in to Hadoop Cluster. Is that client machine a separate machine . Do we need to install Hadoop software in that clinet machine as well. How does the client machine identifes Hadoop cluster?
Client nodes are in charge of loading the data into the cluster. Client nodes first submit MapReduce jobs describing how data needs to be processed and then fetch the results once the processing is finished.
Configuration Files are the files which are located in the extracted tar. gz file in the etc/hadoop/ directory. All Configuration Files in Hadoop are listed below, 1) HADOOP-ENV.sh->>It specifies the environment variables that affect the JDK used by Hadoop Daemon (bin/hadoop).
I am new to hadoop, so from what I understood:
If your data upload is not an actual service of the cluster, which should be running on an edge node of the cluster, then you can configure your own computer to work as an edge node.
An edge node doesn't need to be known by the cluster (but for security stuff) as it does not store data nor compute job. This is basically what it means to be an edge-node: it is connected to the hadoop cluster but does not participate.
In case it can help someone, here is what I have done to connect to a cluster that I don't administer:
myaccount
myaccount
/home/myaccount/hadoop-x.x
JAVA_HOME
, HADOOP_HOME
(/home/me/hadoop-x.x
)export PATH=$HADOOP_HOME/bin:$PATH
$HADOOP_HOME/etc/hadoop
$JAVA_HOME
defined in conf files. To find them use: grep -r "export.*JAVA_HOME"
Then do hadoop fs -ls /
which should list the root directory of the cluster hdfs.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With