Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Best practice of handling hbase connection and table in java?

Tags:

hbase

I'm using hbase-client 1.2.3, I saw there is a comment in Connection.getTable() method:

  • Retrieve a Table implementation for accessing a table.

  • The returned Table is not thread safe, a new instance should be created for each using thread.

  • This is a lightweight operation, pooling or caching of the returned Table

  • is neither required nor desired.

So I start wondering what's the best practice to handle connection and table?

For example, I have a main class, will start several threads, let's call A,B,C...

Now I call "Connection connection = ConnectionFactory.createConnection();" in the main method, and pass the connection to each thread as a param to each thread. And then init Table class in each thread.

I want to know is this the best way? Will it cause some threadsafe or efficiency or any other problems?

like image 322
Mobility Avatar asked Apr 25 '17 02:04

Mobility


People also ask

How do I connect to HBase database?

Create a Connection to HBase DataIn the Databases menu, click New Connection. In the Create new connection wizard that results, select the driver. On the next page of the wizard, click the driver properties tab. Enter values for authentication credentials and other properties required to connect to HBase.

Can we insert data from one table to another in HBase?

CopyTable is a utility to copy the data of one table to another table, either on the same cluster, or on a different HBase cluster.

Is HBase thread safe?

So, yes, creating HBase connection is slow. But the obtained Connection is Thread Safe and you should only have one HBase connection in your application (that you should only close at the end / shutdown of your application).

Is HBase written in Java?

HBase is an open-source non-relational distributed database modeled after Google's Bigtable and written in Java. It is developed as part of Apache Software Foundation's Apache Hadoop project and runs on top of HDFS (Hadoop Distributed File System) or Alluxio, providing Bigtable-like capabilities for Hadoop.


1 Answers

According to the documentation:

Connection creation is a heavy-weight operation. Connection implementations are thread-safe, so that the client can create a connection once, and share it with different threads. Table and Admin instances, on the other hand, are light-weight and are not thread-safe. Typically, a single connection per client application is instantiated and every thread will obtain its own Table instance. Caching or pooling of Table and Admin is not recommended.

So I would suggest that you initialize a connection only once and a new table every time it's required.

Sample Code:

//initialize connection
Configuration hBaseConfig = HBaseConfiguration.create();
Connection connection = ConnectionFactory.createConnection(hBaseConfig);

//initialize table
try (Table table = connection.getTable(TableName.valueOf("table-name"))) {
    //use table
}
like image 84
aprousas Avatar answered Jan 22 '23 05:01

aprousas