Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Hadoop Pseudo-Distributed Operation error: Protocol message tag had invalid wire type

Tags:

java

hadoop

I am setting up a Hadoop 2.6.0 Single Node Cluster. I follow the hadoop-common/SingleCluster documentation. I work on Ubuntu 14.04. So far I have managed to run Standalone Operation successfully.

I face an error when trying to perform Pseudo-Distributed Operation. I managed to start NameNode daemon and DataNode daemon. jps oputut:

martakarass@marta-komputer:/usr/local/hadoop$ jps
4963 SecondaryNameNode
4785 DataNode
8400 Jps
martakarass@marta-komputer:/usr/local/hadoop$ 

But when I try to make the HDFS directories required to execute MapReduce jobs, I receive the following error:

martakarass@marta-komputer:/usr/local/hadoop$ bin/hdfs dfs -mkdir /user
15/05/01 20:36:00 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
mkdir: Failed on local exception: com.google.protobuf.InvalidProtocolBufferException: Protocol message tag had invalid wire type.; Host Details : local host is: "marta-komputer/127.0.0.1"; destination host is: "localhost":9000; 
martakarass@marta-komputer:/usr/local/hadoop$ 

(I believe I can ignore the WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... warning at this point.)


When it comes to Hadoop config files, I changed only the files mentioned in the documentation. I have:

etc/hadoop/core-site.xml:

<configuration>
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://localhost:9000</value>
    </property>
</configuration>

etc/hadoop/hdfs-site.xml:

<configuration>
    <property>
        <name>dfs.replication</name>
        <value>1</value>
    </property>
</configuration>

I managed to connect to localhost:

martakarass@marta-komputer:~$ ssh localhost
martakarass@localhost's password: 
Welcome to Ubuntu 14.04.1 LTS (GNU/Linux 3.13.0-45-generic x86_64)

 * Documentation:  https://help.ubuntu.com/

Last login: Fri May  1 20:28:58 2015 from localhost

I formatted the filesystem:

martakarass@marta-komputer:/usr/local/hadoop$  bin/hdfs namenode -format
15/05/01 20:30:21 INFO namenode.NameNode: STARTUP_MSG: 
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG:   host = marta-komputer/127.0.0.1
STARTUP_MSG:   args = [-format]
STARTUP_MSG:   version = 2.6.0
(...)
15/05/01 20:30:24 INFO namenode.NameNode: SHUTDOWN_MSG: 
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at marta-komputer/127.0.0.1
************************************************************/

/etc/hosts:

127.0.0.1       localhost
127.0.0.1       marta-komputer

# The following lines are desirable for IPv6 capable hosts
::1     ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters

etc/hostname:

marta-komputer
like image 423
Marta Karas Avatar asked May 01 '15 18:05

Marta Karas


1 Answers

This is a set of steps I followed on Ubuntu when facing exactly the same problem but with 2.7.1, the steps shouldn't differ much for previous and future version (I'd believe).

1) Format of my /etc/hosts folder:

    127.0.0.1    localhost   <computer-name>
    # 127.0.1.1    <computer-name>
    <ip-address>    <computer-name>

    # Rest of file with no changes

2) *.xml configuration files (displaying contents inside <configuration> tag):

  • For core-site.xml:

        <property>
            <name>fs.defaultFS</name>
            <value>hdfs://localhost/</value>
        </property>
        <!-- set value to a directory you want with an absolute path -->
        <property>
            <name>hadoop.tmp.dir</name>
            <value>"set/a/directory/on/your/machine/"</value>
            <description>A base for other temporary directories</description>
        </property>
    
  • For hdfs-site.xml:

        <property>
            <name>dfs.replication</name>
            <value>1</value>
        </property>
    
  • For yarn-site.xml:

        <property>
            <name>yarn.recourcemanager.hostname</name>
            <value>localhost</value>
        </property>
    
        <property>
            <name>yarn.nodemanager.aux-services</name>
            <value>mapreduce_shuffle</value>
        </property>
    
  • For mapred-site.xml:

        <property>
            <name>mapreduce.framework.name</name>
            <value>yarn</value>
        </property>
    

3) Verify $HADOOP_CONF_DIR:

This is a good opportunity to verify that you are indeed using this configuration. In the folder where your .xml files reside, view contents of script hadoop_env.sh and make sure $HADOOP_CONF_DIR is pointing at the right directory.

4) Check your PORTS:

NameNode binds ports 50070 and 8020 on my standard distribution and DataNode binds ports 50010, 50020, 50075 and 43758. Run sudo lsof -i to be certain no other services are using them for some reason.

5) Format if necessary:

At this point, if you have changed the value hadoop.tmp.dir you should reformat the NameNode by hdfs namenode -format. If not remove the temporary files already present in the tmp directory you are using (default /tmp/):

6) Start Nodes and Yarn:

In /sbin/ start the name and data node by using the start-dfs.sh script and yarn with start-yarn.sh and evaluate the output of jps:

    ./start-dfs.sh   
    ./start-yarn.sh

At this point if NameNode, DataNode, NodeManager and ResourceManager are all running you should be set to go!

If any of these hasn't started, share the log output for us to re-evaluate.

like image 131
Dimitris Fasarakis Hilliard Avatar answered Sep 21 '22 16:09

Dimitris Fasarakis Hilliard