Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Hbase master keeps dying, claims a hbase:namespace already exists

Tags:

hadoop

hbase

In todays episode of hbase is bringing me to my wits end we have an issue where the hbase master starts and then very quickly dies. My master log is like so:

2014-06-20 12:52:40,469 FATAL [master:hdev01:60000] master.HMaster: Master serve
r abort: loaded coprocessors are: []
2014-06-20 12:52:40,470 FATAL [master:hdev01:60000] master.HMaster: Unhandled ex
ception. Starting shutdown.
org.apache.hadoop.hbase.TableExistsException: hbase:namespace
        at org.apache.hadoop.hbase.master.handler.CreateTableHandler.prepare(Cre
ateTableHandler.java:120)
        at org.apache.hadoop.hbase.master.TableNamespaceManager.createNamespaceT
able(TableNamespaceManager.java:232)
        at org.apache.hadoop.hbase.master.TableNamespaceManager.start(TableNames
paceManager.java:86)
        at org.apache.hadoop.hbase.master.HMaster.initNamespace(HMaster.java:106
2)
        at org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.j
ava:926)
        at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:615)
        at java.lang.Thread.run(Thread.java:662)
2014-06-20 12:52:40,473 INFO  [master:hdev01:60000] master.HMaster: Aborting
2014-06-20 12:52:40,473 DEBUG [master:hdev01:60000] master.HMaster: Stopping ser
vice threads
2014-06-20 12:52:40,473 INFO  [master:hdev01:60000] ipc.RpcServer: Stopping serv
er on 60000
2014-06-20 12:52:40,473 INFO  [CatalogJanitor-hdev01:60000] master.CatalogJanito
r: CatalogJanitor-hdev01:60000 exiting
2014-06-20 12:52:40,473 INFO  [hdev01,60000,1403283149823-BalancerChore] balance
r.BalancerChore: hdev01,60000,1403283149823-BalancerChore exiting
2014-06-20 12:52:40,474 INFO  [RpcServer.listener,port=60000] ipc.RpcServer: Rpc
Server.listener,port=60000: stopping
2014-06-20 12:52:40,474 INFO  [RpcServer.responder] ipc.RpcServer: RpcServer.res
ponder: stopped
2014-06-20 12:52:40,474 INFO  [master:hdev01:60000] master.HMaster: Stopping inf
oServer
2014-06-20 12:52:40,474 INFO  [RpcServer.responder] ipc.RpcServer: RpcServer.res
ponder: stopping
2014-06-20 12:52:40,474 INFO  [master:hdev01:60000.oldLogCleaner] cleaner.LogCle
aner: master:hdev01:60000.oldLogCleaner exiting
2014-06-20 12:52:40,475 INFO  [hdev01,60000,1403283149823-ClusterStatusChore] ba
lancer.ClusterStatusChore: hdev01,60000,1403283149823-ClusterStatusChore exiting

2014-06-20 12:52:40,476 INFO  [master:hdev01:60000.oldLogCleaner] master.Replica
tionLogCleaner: Stopping replicationLogCleaner-0x246ba2ab1e4001c, quorum=hdev02:
5181,hdev01:5181,hdev03:5181, baseZNode=/hbase
2014-06-20 12:52:40,479 INFO  [master:hdev01:60000] mortbay.log: Stopped SelectC
[email protected]:16010
2014-06-20 12:52:40,478 INFO  [master:hdev01:60000.archivedHFileCleaner] cleaner
.HFileCleaner: master:hdev01:60000.archivedHFileCleaner exiting
2014-06-20 12:52:40,483 INFO  [master:hdev01:60000.oldLogCleaner] zookeeper.ZooK
eeper: Session: 0x246ba2ab1e4001c closed
2014-06-20 12:52:40,484 INFO  [master:hdev01:60000-EventThread] zookeeper.Client
Cnxn: EventThread shut down
2014-06-20 12:52:40,589 DEBUG [master:hdev01:60000] catalog.CatalogTracker: Stop
ping catalog tracker org.apache.hadoop.hbase.catalog.CatalogTracker@f3f348b
2014-06-20 12:52:40,591 INFO  [master:hdev01:60000] client.HConnectionManager$HC
onnectionImplementation: Closing zookeeper sessionid=0x246ba2ab1e4001b
2014-06-20 12:52:40,592 INFO  [master:hdev01:60000] zookeeper.ZooKeeper: Session
: 0x246ba2ab1e4001b closed
2014-06-20 12:52:40,592 INFO  [master:hdev01:60000-EventThread] zookeeper.Client
Cnxn: EventThread shut down
2014-06-20 12:52:40,695 INFO  [hdev01,60000,1403283149823.splitLogManagerTimeout
Monitor] master.SplitLogManager$TimeoutMonitor: hdev01,60000,1403283149823.split
LogManagerTimeoutMonitor exiting
2014-06-20 12:52:40,696 INFO  [master:hdev01:60000] zookeeper.ZooKeeper: Session
: 0x246ba2ab1e4001a closed
2014-06-20 12:52:40,696 INFO  [main-EventThread] zookeeper.ClientCnxn: EventThre
ad shut down
2014-06-20 12:52:40,696 INFO  [master:hdev01:60000] master.HMaster: HMaster main
 thread exiting
2014-06-20 12:52:40,697 ERROR [main] master.HMasterCommandLine: Master exiting
java.lang.RuntimeException: HMaster Aborted
        at org.apache.hadoop.hbase.master.HMasterCommandLine.startMaster(HMaster
CommandLine.java:194)
        at org.apache.hadoop.hbase.master.HMasterCommandLine.run(HMasterCommandL
ine.java:135)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
        at org.apache.hadoop.hbase.util.ServerCommandLine.doMain(ServerCommandLi
ne.java:126)
        at org.apache.hadoop.hbase.master.HMaster.main(HMaster.java:2803)

I thought this might be some remnant of an old run so I deleted the files in hbases data directory, the zookeepers data directory and my hdfs. I still got the same error. Strangely my HMaster popper back up again temporarily when I ran stop-hbase.sh although there wasn't much I could do with it.

My Hbase version is 98.3 and my hadoop is 2.2.0. My hbase-site.comf is

<configuration>
<property>
  <name>hbase.master</name>
  <value>hdev01:60000</value>
  <description>The host and port that the HBase master runs at.
                                                     A value of 'local' runs the master and a regionserver
                                                     in a single process.
                                </description>
</property>
<property>
  <name>hbase.rootdir</name>
  <value>hdfs://hdev01:9000/hbase</value>
  <description>The directory shared by region servers.</description>
</property>
<property>
  <name>hbase.cluster.distributed</name>
  <value>true</value>
  <description>The mode the cluster will be in. Possible values are
                                false: standalone and pseudo-distributed setups with managed
                                Zookeeper true: fully-distributed with unmanaged Zookeeper
                                Quorum (see hbase-env.sh)
                                </description>
</property>
<property>
  <name>hbase.zookeeper.property.clientPort</name>
  <value>5181</value>
  <description>Property from ZooKeeper's config zoo.cfg.
    The port at which the clients will connect.
    </description>
</property>
<property>
  <name>zookeeper.session.timeout</name>
  <value>10000</value>
  <description></description>
</property>
<property>
  <name>hbase.client.retries.number</name>
  <value>10</value>
  <description></description>
</property>
<property>
  <name>hbase.zookeeper.quorum</name>
  <value>hdev01,hdev02,hdev03</value>
  <description>Comma separated list of servers in the ZooKeeper Quorum. For example, "host1.mydomain.com,host2.mydomain.com". By default this is set to localhost for local and pseudo-distributed modes of operation. For a fully-distributed setup, this should be set to a full list of ZooKeeper quorum servers. If
                                     HBASE_MANAGES_ZK is set in hbase-env.sh
                                     this is the list of servers which we will start/stop
                                     ZooKeeper on.
                </description>
</property>
</configuration>

EDIT Attempted hbase org.apache.hadoop.hbase.util.hbck.OfflineMetaRepair, my error now is HBase file layout needs to be upgraded. You have version null and I want version 8. Is your hbase.rootdir valid? If so, you may need to run 'hbase hbck -fixVersionFile' Which is unhelpful since without a master hbck will not actually run. Edited edit I nuked and restarted my dfs and then tried repairing and starting things again, i am now back where i started.

like image 501
chenab Avatar asked Jun 20 '14 18:06

chenab


1 Answers

hbase namespace is the internal namespace HBAse uses for its own management tables. Try to run the offline repair tool from the $HBASE_HOME directory:

 ./bin/hbase org.apache.hadoop.hbase.util.hbck.OfflineMetaRepair
like image 142
Arnon Rotem-Gal-Oz Avatar answered Oct 29 '22 05:10

Arnon Rotem-Gal-Oz