Assume that there is a Hadoop Cluster that has 20 machines. Out of those 20 machines 18 machines are slaves and machine 19 is for NameNode and machine 20 is for JobTracker. Now i know that hadoop software has to be installed in all those 20 machines. but my question is which machine is involved to load a file xyz.txt in to Hadoop Cluster. Is that client machine a separate machine . Do we need to install Hadoop software in that clinet machine as well. How does the client machine identifes Hadoop cluster?

I am new to hadoop, so from what I understood: If your data upload is not an actual service of the cluster, which should be running on an edge node of the cluster, then you can configure your own computer to work as an edge node. An edge node doesn't need to be known by the cluster (but for security stuff) as it does not store data nor compute job. This is basically what it means to be an edge-node: it is connected to the hadoop cluster but does not participate. In case it can help someone, here is what I have done to connect to a cluster that I don't administer: <ul> <li>get an account on the cluster, say <code>myaccount</code> </li> <li>create an account on you computer with the same name: <code>myaccount</code> </li> <li>configure your computer to access the cluster machines (ssh w\out passphrase, registered ip, ...)</li> <li>get the hadoop configuration files from an edge-node of the cluster </li> <li>get a hadoop distrib (eg. from here)</li> <li>uncompress it where you want, say <code>/home/myaccount/hadoop-x.x</code> </li> <li>add the following environment variables: <code>JAVA_HOME</code>, <code>HADOOP_HOME</code> (<code>/home/me/hadoop-x.x</code>)</li> <li>(if you'd like) add hadoop bin to your path: <code>export PATH=$HADOOP_HOME/bin:$PATH</code> </li> <li>replace your hadoop configuration files by those you got from the edge node. With hadoop 2.5.2, it is the folder <code>$HADOOP_HOME/etc/hadoop</code> </li> <li>also, I had to change the value of a couple <code>$JAVA_HOME</code> defined in conf files. To find them use: <code>grep -r "export.*JAVA_HOME"</code> </li> </ul> Then do <code>hadoop fs -ls /</code> which should list the root directory of the cluster hdfs.

Hadoop Client Node Configuration

Tags:

hadoop

Assume that there is a Hadoop Cluster that has 20 machines. Out of those 20 machines 18 machines are slaves and machine 19 is for NameNode and machine 20 is for JobTracker.

Now i know that hadoop software has to be installed in all those 20 machines.

but my question is which machine is involved to load a file xyz.txt in to Hadoop Cluster. Is that client machine a separate machine . Do we need to install Hadoop software in that clinet machine as well. How does the client machine identifes Hadoop cluster?

934

asked Mar 07 '14 14:03

Surender Raja

1 Answers

I am new to hadoop, so from what I understood:

If your data upload is not an actual service of the cluster, which should be running on an edge node of the cluster, then you can configure your own computer to work as an edge node.

An edge node doesn't need to be known by the cluster (but for security stuff) as it does not store data nor compute job. This is basically what it means to be an edge-node: it is connected to the hadoop cluster but does not participate.

In case it can help someone, here is what I have done to connect to a cluster that I don't administer:

get an account on the cluster, say myaccount
create an account on you computer with the same name: myaccount
configure your computer to access the cluster machines (ssh w\out passphrase, registered ip, ...)
get the hadoop configuration files from an edge-node of the cluster
get a hadoop distrib (eg. from here)
uncompress it where you want, say /home/myaccount/hadoop-x.x
add the following environment variables: JAVA_HOME, HADOOP_HOME (/home/me/hadoop-x.x)
(if you'd like) add hadoop bin to your path: export PATH=$HADOOP_HOME/bin:$PATH
replace your hadoop configuration files by those you got from the edge node. With hadoop 2.5.2, it is the folder $HADOOP_HOME/etc/hadoop
also, I had to change the value of a couple $JAVA_HOME defined in conf files. To find them use: grep -r "export.*JAVA_HOME"

Then do hadoop fs -ls / which should list the root directory of the cluster hdfs.

117

answered Oct 04 '22 14:10

Juh_

Related questions
                            
                                Hadoop Map Reduce: Algorithms
                            
                                Hadoop and MySQL Integration
                            
                                .NET and Hadoop - What should I know / learn and what is available? [closed]
                            
                                Is there any way to download a HDFS file using WebHDFS REST API? [closed]
                            
                                How to write pyspark dataframe to HDFS and then how to read it back into dataframe?
                            
                                How to avoid OutOfMemoryException when running Hadoop?
                            
                                Installing Hbase / Hadoop on EC2 cluster
                            
                                Apache Spark EOF exception
                            
                                What is difference between Oozie workflow, coordinator and bundle
                            
                                Parallel Algorithms for Generating Prime Numbers (possibly using Hadoop's map reduce)
                            
                                Wordcount program is stuck in hadoop-2.3.0
                            
                                Why does relocation with the maven shade plugin not work?
                            
                                Loop over files in HDFS directory
                            
                                Is there a good library for accessing HBase from Python? [closed]
                            
                                Attempt to do update or delete using transaction manager that does not support these operations
                            
                                How to customize Writable class in Hadoop?
                            
                                How to specify KeyValueTextInputFormat Separator in Hadoop-.20 api?
                            
                                Julia on Hadoop? [closed]
                            
                                Spark : multiple spark-submit in parallel
                            
                                Hbase Schema Nested Entity

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With