I want to setup a hadoop-cluster in pseudo-distributed mode. I managed to perform all the setup-steps, including startuping a Namenode, Datanode, Jobtracker and a Tasktracker on my machine.
Then I tried to run some exemplary programms and faced the java.net.ConnectException: Connection refused
error. I stepped back to the very first steps of running some operations in standalone mode and faced the same problem.
I performed even triple-check of all the installation steps and have no idea how to fix it. (I am new to Hadoop and a beginner Ubuntu user thus I kindly ask you for "taking it into account" if providing any guide or tip).
This is the error output I keep receiving:
hduser@marta-komputer:/usr/local/hadoop$ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.0.jar grep input output 'dfs[a-z.]+' 15/02/22 18:23:04 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 15/02/22 18:23:04 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032 java.net.ConnectException: Call From marta-komputer/127.0.1.1 to localhost:9000 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:408) at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:791) at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:731) at org.apache.hadoop.ipc.Client.call(Client.java:1472) at org.apache.hadoop.ipc.Client.call(Client.java:1399) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232) at com.sun.proxy.$Proxy9.delete(Unknown Source) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.delete(ClientNamenodeProtocolTranslatorPB.java:521) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:483) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) at com.sun.proxy.$Proxy10.delete(Unknown Source) at org.apache.hadoop.hdfs.DFSClient.delete(DFSClient.java:1929) at org.apache.hadoop.hdfs.DistributedFileSystem$12.doCall(DistributedFileSystem.java:638) at org.apache.hadoop.hdfs.DistributedFileSystem$12.doCall(DistributedFileSystem.java:634) at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) at org.apache.hadoop.hdfs.DistributedFileSystem.delete(DistributedFileSystem.java:634) at org.apache.hadoop.examples.Grep.run(Grep.java:95) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.examples.Grep.main(Grep.java:101) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:483) at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:71) at org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:144) at org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:74) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:483) at org.apache.hadoop.util.RunJar.run(RunJar.java:221) at org.apache.hadoop.util.RunJar.main(RunJar.java:136) Caused by: java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:716) at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206) at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:530) at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:494) at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:607) at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:705) at org.apache.hadoop.ipc.Client$Connection.access$2800(Client.java:368) at org.apache.hadoop.ipc.Client.getConnection(Client.java:1521) at org.apache.hadoop.ipc.Client.call(Client.java:1438) ... 32 more
etc/hadoop/hadoop-env.sh file:
# The java implementation to use. export JAVA_HOME=/usr/lib/jvm/java-8-oracle # The jsvc implementation to use. Jsvc is required to run secure datanodes # that bind to privileged ports to provide authentication of data transfer # protocol. Jsvc is not required if SASL is configured for authentication of # data transfer protocol using non-privileged ports. #export JSVC_HOME=${JSVC_HOME} export HADOOP_CONF_DIR=${HADOOP_CONF_DIR:-"/etc/hadoop"} # Extra Java CLASSPATH elements. Automatically insert capacity-scheduler. for f in $HADOOP_HOME/contrib/capacity-scheduler/*.jar; do if [ "$HADOOP_CLASSPATH" ]; then export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:$f else export HADOOP_CLASSPATH=$f fi done # The maximum amount of heap to use, in MB. Default is 1000. #export HADOOP_HEAPSIZE= #export HADOOP_NAMENODE_INIT_HEAPSIZE="" # Extra Java runtime options. Empty by default. export HADOOP_OPTS="$HADOOP_OPTS -Djava.net.preferIPv4Stack=true" # Command specific options appended to HADOOP_OPTS when specified export HADOOP_NAMENODE_OPTS="-Dhadoop.security.logger=${HADOOP_SECURITY_LOGGER:-INFO,RFAS} -Dhdfs.audit.logger=${HDFS_AUDIT_LOGGER:-INFO,NullAppender} $HADOOP_NAMENODE_OPTS" export HADOOP_DATANODE_OPTS="-Dhadoop.security.logger=ERROR,RFAS $HADOOP_DATANODE_OPTS" export HADOOP_SECONDARYNAMENODE_OPTS="-Dhadoop.security.logger=${HADOOP_SECURITY_LOGGER:-INFO,RFAS} -Dhdfs.audit.logger=${HDFS_AUDIT_LOGGER:-INFO,NullAppender} $HADOOP_SECONDARYNAMENODE_OPTS" export HADOOP_NFS3_OPTS="$HADOOP_NFS3_OPTS" export HADOOP_PORTMAP_OPTS="-Xmx512m $HADOOP_PORTMAP_OPTS" # The following applies to multiple commands (fs, dfs, fsck, distcp etc) export HADOOP_CLIENT_OPTS="-Xmx512m $HADOOP_CLIENT_OPTS" #HADOOP_JAVA_PLATFORM_OPTS="-XX:-UsePerfData $HADOOP_JAVA_PLATFORM_OPTS" # On secure datanodes, user to run the datanode as after dropping privileges. # This **MUST** be uncommented to enable secure HDFS if using privileged ports # to provide authentication of data transfer protocol. This **MUST NOT** be # defined if SASL is configured for authentication of data transfer protocol # using non-privileged ports. export HADOOP_SECURE_DN_USER=${HADOOP_SECURE_DN_USER} # Where log files are stored. $HADOOP_HOME/logs by default. #export HADOOP_LOG_DIR=${HADOOP_LOG_DIR}/$USER # Where log files are stored in the secure data environment. export HADOOP_SECURE_DN_LOG_DIR=${HADOOP_LOG_DIR}/${HADOOP_HDFS_USER} # HDFS Mover specific parameters ### # Specify the JVM options to be used when starting the HDFS Mover. # These options will be appended to the options specified as HADOOP_OPTS # and therefore may override any similar flags set in HADOOP_OPTS # # export HADOOP_MOVER_OPTS="" ### # Advanced Users Only! ### # The directory where pid files are stored. /tmp by default. # NOTE: this should be set to a directory that can only be written to by # the user that will run the hadoop daemons. Otherwise there is the # potential for a symlink attack. export HADOOP_PID_DIR=${HADOOP_PID_DIR} export HADOOP_SECURE_DN_PID_DIR=${HADOOP_PID_DIR} # A string representing this instance of hadoop. $USER by default. export HADOOP_IDENT_STRING=$USER
.bashrc file Hadoop-related fragment:
# -- HADOOP ENVIRONMENT VARIABLES START -- # export JAVA_HOME=/usr/lib/jvm/java-8-oracle export HADOOP_HOME=/usr/local/hadoop export PATH=$PATH:$HADOOP_HOME/bin export PATH=$PATH:$HADOOP_HOME/sbin export HADOOP_MAPRED_HOME=$HADOOP_HOME export HADOOP_COMMON_HOME=$HADOOP_HOME export HADOOP_HDFS_HOME=$HADOOP_HOME export YARN_HOME=$HADOOP_HOME export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib" # -- HADOOP ENVIRONMENT VARIABLES END -- #
/usr/local/hadoop/etc/hadoop/core-site.xml file:
<configuration> <property> <name>hadoop.tmp.dir</name> <value>/usr/local/hadoop_tmp</value> <description>A base for other temporary directories.</description> </property> <property> <name>fs.default.name</name> <value>hdfs://localhost:9000</value> </property> </configuration>
/usr/local/hadoop/etc/hadoop/hdfs-site.xml file:
<configuration> <property> <name>dfs.replication</name> <value>1</value> </property> <property> <name>dfs.namenode.name.dir</name> <value>file:/usr/local/hadoop_tmp/hdfs/namenode</value> </property> <property> <name>dfs.datanode.data.dir</name> <value>file:/usr/local/hadoop_tmp/hdfs/datanode</value> </property> </configuration>
/usr/local/hadoop/etc/hadoop/yarn-site.xml file:
<configuration> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <property> <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name> <value>org.apache.hadoop.mapred.ShuffleHandler</value> </property> </configuration>
/usr/local/hadoop/etc/hadoop/mapred-site.xml file:
<configuration> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> <configuration>
Running hduser@marta-komputer:/usr/local/hadoop$ bin/hdfs namenode -format
results in an output as follows (I substitiute some of its part with (...)
):
hduser@marta-komputer:/usr/local/hadoop$ bin/hdfs namenode -format 15/02/22 18:50:47 INFO namenode.NameNode: STARTUP_MSG: /************************************************************ STARTUP_MSG: Starting NameNode STARTUP_MSG: host = marta-komputer/127.0.1.1 STARTUP_MSG: args = [-format] STARTUP_MSG: version = 2.6.0 STARTUP_MSG: classpath = /usr/local/hadoop/etc/hadoop:/usr/local/hadoop/share/hadoop/common/lib/htrace-core-3.0.4.jar:/usr/local/hadoop/share/hadoop/common/lib/commons-cli (...)2.6.0.jar:/usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-common-2.6.0.jar:/usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-hs-2.6.0.jar:/usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-app-2.6.0.jar:/usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-shuffle-2.6.0.jar:/usr/local/hadoop/contrib/capacity-scheduler/*.jar STARTUP_MSG: build = https://git-wip-us.apache.org/repos/asf/hadoop.git -r e3496499ecb8d220fba99dc5ed4c99c8f9e33bb1; compiled by 'jenkins' on 2014-11-13T21:10Z STARTUP_MSG: java = 1.8.0_31 ************************************************************/ 15/02/22 18:50:47 INFO namenode.NameNode: registered UNIX signal handlers for [TERM, HUP, INT] 15/02/22 18:50:47 INFO namenode.NameNode: createNameNode [-format] 15/02/22 18:50:47 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Formatting using clusterid: CID-0b65621a-eab3-47a4-bfd0-62b5596a940c 15/02/22 18:50:48 INFO namenode.FSNamesystem: No KeyProvider found. 15/02/22 18:50:48 INFO namenode.FSNamesystem: fsLock is fair:true 15/02/22 18:50:48 INFO blockmanagement.DatanodeManager: dfs.block.invalidate.limit=1000 15/02/22 18:50:48 INFO blockmanagement.DatanodeManager: dfs.namenode.datanode.registration.ip-hostname-check=true 15/02/22 18:50:48 INFO blockmanagement.BlockManager: dfs.namenode.startup.delay.block.deletion.sec is set to 000:00:00:00.000 15/02/22 18:50:48 INFO blockmanagement.BlockManager: The block deletion will start around 2015 Feb 22 18:50:48 15/02/22 18:50:48 INFO util.GSet: Computing capacity for map BlocksMap 15/02/22 18:50:48 INFO util.GSet: VM type = 64-bit 15/02/22 18:50:48 INFO util.GSet: 2.0% max memory 889 MB = 17.8 MB 15/02/22 18:50:48 INFO util.GSet: capacity = 2^21 = 2097152 entries 15/02/22 18:50:48 INFO blockmanagement.BlockManager: dfs.block.access.token.enable=false 15/02/22 18:50:48 INFO blockmanagement.BlockManager: defaultReplication = 1 15/02/22 18:50:48 INFO blockmanagement.BlockManager: maxReplication = 512 15/02/22 18:50:48 INFO blockmanagement.BlockManager: minReplication = 1 15/02/22 18:50:48 INFO blockmanagement.BlockManager: maxReplicationStreams = 2 15/02/22 18:50:48 INFO blockmanagement.BlockManager: shouldCheckForEnoughRacks = false 15/02/22 18:50:48 INFO blockmanagement.BlockManager: replicationRecheckInterval = 3000 15/02/22 18:50:48 INFO blockmanagement.BlockManager: encryptDataTransfer = false 15/02/22 18:50:48 INFO blockmanagement.BlockManager: maxNumBlocksToLog = 1000 15/02/22 18:50:48 INFO namenode.FSNamesystem: fsOwner = hduser (auth:SIMPLE) 15/02/22 18:50:48 INFO namenode.FSNamesystem: supergroup = supergroup 15/02/22 18:50:48 INFO namenode.FSNamesystem: isPermissionEnabled = true 15/02/22 18:50:48 INFO namenode.FSNamesystem: HA Enabled: false 15/02/22 18:50:48 INFO namenode.FSNamesystem: Append Enabled: true 15/02/22 18:50:48 INFO util.GSet: Computing capacity for map INodeMap 15/02/22 18:50:48 INFO util.GSet: VM type = 64-bit 15/02/22 18:50:48 INFO util.GSet: 1.0% max memory 889 MB = 8.9 MB 15/02/22 18:50:48 INFO util.GSet: capacity = 2^20 = 1048576 entries 15/02/22 18:50:48 INFO namenode.NameNode: Caching file names occuring more than 10 times 15/02/22 18:50:48 INFO util.GSet: Computing capacity for map cachedBlocks 15/02/22 18:50:48 INFO util.GSet: VM type = 64-bit 15/02/22 18:50:48 INFO util.GSet: 0.25% max memory 889 MB = 2.2 MB 15/02/22 18:50:48 INFO util.GSet: capacity = 2^18 = 262144 entries 15/02/22 18:50:48 INFO namenode.FSNamesystem: dfs.namenode.safemode.threshold-pct = 0.9990000128746033 15/02/22 18:50:48 INFO namenode.FSNamesystem: dfs.namenode.safemode.min.datanodes = 0 15/02/22 18:50:48 INFO namenode.FSNamesystem: dfs.namenode.safemode.extension = 30000 15/02/22 18:50:48 INFO namenode.FSNamesystem: Retry cache on namenode is enabled 15/02/22 18:50:48 INFO namenode.FSNamesystem: Retry cache will use 0.03 of total heap and retry cache entry expiry time is 600000 millis 15/02/22 18:50:48 INFO util.GSet: Computing capacity for map NameNodeRetryCache 15/02/22 18:50:48 INFO util.GSet: VM type = 64-bit 15/02/22 18:50:48 INFO util.GSet: 0.029999999329447746% max memory 889 MB = 273.1 KB 15/02/22 18:50:48 INFO util.GSet: capacity = 2^15 = 32768 entries 15/02/22 18:50:48 INFO namenode.NNConf: ACLs enabled? false 15/02/22 18:50:48 INFO namenode.NNConf: XAttrs enabled? true 15/02/22 18:50:48 INFO namenode.NNConf: Maximum size of an xattr: 16384 Re-format filesystem in Storage Directory /usr/local/hadoop_tmp/hdfs/namenode ? (Y or N) Y 15/02/22 18:50:50 INFO namenode.FSImage: Allocated new BlockPoolId: BP-948369552-127.0.1.1-1424627450316 15/02/22 18:50:50 INFO common.Storage: Storage directory /usr/local/hadoop_tmp/hdfs/namenode has been successfully formatted. 15/02/22 18:50:50 INFO namenode.NNStorageRetentionManager: Going to retain 1 images with txid >= 0 15/02/22 18:50:50 INFO util.ExitUtil: Exiting with status 0 15/02/22 18:50:50 INFO namenode.NameNode: SHUTDOWN_MSG: /************************************************************ SHUTDOWN_MSG: Shutting down NameNode at marta-komputer/127.0.1.1 ************************************************************/
Starting dfs
and yarn
results in the following output:
hduser@marta-komputer:/usr/local/hadoop$ start-dfs.sh 15/02/22 18:53:05 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Starting namenodes on [localhost] localhost: starting namenode, logging to /usr/local/hadoop/logs/hadoop-hduser-namenode-marta-komputer.out localhost: starting datanode, logging to /usr/local/hadoop/logs/hadoop-hduser-datanode-marta-komputer.out Starting secondary namenodes [0.0.0.0] 0.0.0.0: starting secondarynamenode, logging to /usr/local/hadoop/logs/hadoop-hduser-secondarynamenode-marta-komputer.out 15/02/22 18:53:20 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable hduser@marta-komputer:/usr/local/hadoop$ start-yarn.sh starting yarn daemons starting resourcemanager, logging to /usr/local/hadoop/logs/yarn-hduser-resourcemanager-marta-komputer.out localhost: starting nodemanager, logging to /usr/local/hadoop/logs/yarn-hduser-nodemanager-marta-komputer.out
Calling jps
shortly after that gives:
hduser@marta-komputer:/usr/local/hadoop$ jps 11696 ResourceManager 11842 NodeManager 11171 NameNode 11523 SecondaryNameNode 12167 Jps
netstat output:
hduser@marta-komputer:/usr/local/hadoop$ sudo netstat -lpten | grep java tcp 0 0 0.0.0.0:8088 0.0.0.0:* LISTEN 1001 690283 11696/java tcp 0 0 0.0.0.0:42745 0.0.0.0:* LISTEN 1001 684574 11842/java tcp 0 0 0.0.0.0:13562 0.0.0.0:* LISTEN 1001 680955 11842/java tcp 0 0 0.0.0.0:8030 0.0.0.0:* LISTEN 1001 684531 11696/java tcp 0 0 0.0.0.0:8031 0.0.0.0:* LISTEN 1001 684524 11696/java tcp 0 0 0.0.0.0:8032 0.0.0.0:* LISTEN 1001 680879 11696/java tcp 0 0 0.0.0.0:8033 0.0.0.0:* LISTEN 1001 687392 11696/java tcp 0 0 0.0.0.0:8040 0.0.0.0:* LISTEN 1001 680951 11842/java tcp 0 0 127.0.0.1:9000 0.0.0.0:* LISTEN 1001 687242 11171/java tcp 0 0 0.0.0.0:8042 0.0.0.0:* LISTEN 1001 680956 11842/java tcp 0 0 0.0.0.0:50090 0.0.0.0:* LISTEN 1001 690252 11523/java tcp 0 0 0.0.0.0:50070 0.0.0.0:* LISTEN 1001 687239 11171/java
/etc/hosts file:
127.0.0.1 localhost 127.0.1.1 marta-komputer # The following lines are desirable for IPv6 capable hosts ::1 ip6-localhost ip6-loopback fe00::0 ip6-localnet ff00::0 ip6-mcastprefix ff02::1 ip6-allnodes ff02::2 ip6-allrouters
UPDATE 1.
I updated the core-site.xml and now I have:
<property> <name>fs.default.name</name> <value>hdfs://marta-komputer:9000</value> </property>
but I keep receiving the error - now starting as:
15/03/01 00:59:34 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032 java.net.ConnectException: Call From marta-komputer.home/192.168.1.8 to marta-komputer:9000 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
I also notice that telnet localhost 9000
is not working:
hduser@marta-komputer:~$ telnet localhost 9000 Trying 127.0.0.1... telnet: Unable to connect to remote host: Connection refused
ConnectException: Connection refused: 1) First try to ping the destination host, if the host is ping-able it means the client and server machine are in the network. 2) Try connecting to server host and port using telnet. If you are able to connect means something is wrong with your client code.
Datanode daemon should be started manually using $HADOOP_HOME/bin/hadoop-daemon.sh script. Master (NameNode) should correspondingly join the cluster after automatically contacted. New node should be added to the configuration/slaves file in the master server. New node will be identified by script-based commands.
NameNode is the master node in the Apache Hadoop HDFS Architecture that maintains and manages the blocks present on the DataNodes (slave nodes). NameNode is a very highly available server that manages the File System Namespace and controls access to files by clients.
For me these steps worked
stop-all.sh
hadoop namenode -format
start-all.sh
Hi Edit your conf/core-site.xml and change localhost to 0.0.0.0. Use the conf below. That should work.
<configuration> <property> <name>fs.default.name</name> <value>hdfs://0.0.0.0:9000</value> </property>
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With