Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

SocketTimeoutException issue from HBase Client

Tags:

hbase

We are working on a scenario where we need to check the existence of the record before insertion. If the record already exists we dont insert it again. We are doing in batches. First we create a batch of Gets to see the existence of the records we want to insert. This issue is not coming when the table size is less and also it is very intermittent. What is the recommended batch size for Get. And what is best approach to check the existence of the records before inserting?? Appreciate your responses..

Here's the stack trace..

java.util.concurrent.ExecutionException: java.net.SocketTimeoutException: Call to b16-pf-dv-093.abc.com/10.106.8.103:60020 failed on socket timeout exception: java.net.SocketTimeoutException: 60000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/10.106.8.133:41903 remote=b16-pf-dv-093.abc.com/10.106.8.103:60020] 
        at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:222) 
        at java.util.concurrent.FutureTask.get(FutureTask.java:83) 
        at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatchCallback(HConnectionManager.java:1604) 
        at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatch(HConnectionManager.java:1456) 
        at org.apache.hadoop.hbase.client.HTable.batch(HTable.java:757) 
        at org.apache.hadoop.hbase.client.HTable.get(HTable.java:726) 
        at org.apache.hadoop.hbase.client.HTablePool$PooledHTable.get(HTablePool.java:367) 
        at com.abc.psp.core.metering.util.HBaseClient.get(HBaseClient.java:263) 
        at com.abc.psp.core.metering.dao.MeteringHBaseDAOImpl.addMeteredRecords(MeteringHBaseDAOImpl.java:374) 
        at com.abc.psp.core.metering.dao.MeteringHBaseDAOImpl.addMeteredRecords(MeteringHBaseDAOImpl.java:342) 
        at HBaseTest.main(HBaseTest.java:32) 
Caused by: java.net.SocketTimeoutException: Call to b16-pf-dv-093.abc.com/10.106.8.103:60020 failed on socket timeout exception: java.net.SocketTimeoutException: 60000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/10.106.8.133:41903 remote=b16-pf-dv-093.abc.com/10.106.8.103:60020] 
        at org.apache.hadoop.hbase.ipc.HBaseClient.wrapException(HBaseClient.java:1026) 
        at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:999) 
        at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:86) 
        at $Proxy6.multi(Unknown Source) 
        at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1433) 
        at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1431) 
        at org.apache.hadoop.hbase.client.ServerCallable.withoutRetries(ServerCallable.java:215) 
        at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3.call(HConnectionManager.java:1440) 
        at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3.call(HConnectionManager.java:1428) 
        at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) 
        at java.util.concurrent.FutureTask.run(FutureTask.java:138) 
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) 
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) 
        at java.lang.Thread.run(Thread.java:662) 
Caused by: java.net.SocketTimeoutException: 60000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/10.106.8.133:41903 remote=b16-pf-dv-093.abc.com/10.106.8.103:60020] 
        at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164) 
        at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155) 
        at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128) 
        at java.io.FilterInputStream.read(FilterInputStream.java:116) 
        at org.apache.hadoop.hbase.ipc.HBaseClient$Connection$PingInputStream.read(HBaseClient.java:373) 
        at java.io.BufferedInputStream.fill(BufferedInputStream.java:218) 
        at java.io.BufferedInputStream.read(BufferedInputStream.java:237) 
        at java.io.DataInputStream.readInt(DataInputStream.java:370) 
        at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.receiveResponse(HBaseClient.java:646) 
        at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.run(HBaseClient.java:580)
like image 748
Naresh Reddy Avatar asked Jul 29 '13 20:07

Naresh Reddy


2 Answers

The solution provided here is not 100% correct. I faced socketTimeOut on both reads and writes on high load. Increasing hbase.rpc.timeout is not a solution until or unless the scan or writes on hbase server are very big.

Here is my problem:

I tried to scan rows which were returned back by hbase in a few milliseconds. Everything was normal until I increased my concurrent scan threads from 10 to 50. By doing so I started experiencing socketTimeoutException (same exception as in this thread) which is an obstacle to scale hbase read or write from one process.

To get to the exact solution, you first need to understand the cause first.

Causes for socketTimeout

a. The read or write return from the hbase server is slow

b. The client is unable to connect to the server and timed out. Threads congestion?

If you are experiencing "a", then increasing hbase.rpc.timeout might be your solution, but still you most probably will end up on "b" as well.

I noticed that hbase client by default creates only one connection per regionServer. To validate, please run this command from the client where reads to hbase are made. Make sure the load in running.

netstat -an | grep 60020 | grep EST

To my surprise, for every regionServer, the process made only one connection. This explained the timeouts. One connection/socket only? Seems as if this is the default hbase client behavior. Not sure why, yet?

Solution:

Add these two properties in hbase conf in the client and restart the client

<property>
   <name>hbase.client.ipc.pool.type</name>
   <value>RoundRobinPool</value>
</property>
<property>
   <name>hbase.client.ipc.pool.size</name>
   <value>10</value>
</property>

This created 10 sockets on each regionServer from every client. With this change you should see a major improvement at client side. I have not experienced socketTimeOutException, since this change.

like image 87
jaskirat Bhatia Avatar answered Nov 17 '22 21:11

jaskirat Bhatia


You are getting this error because the time taken by your gets is more than the default allowed time which an HBase client applications can take for a remote call to time out, which is 60 seconds. When your table is big(which means you have more data to fetch) gets will take time. You can increase this value by setting the value of hbase.rpc.timeout to some higher value in your hbase-site.xml file.

What is the recommended batch size for Get?

Depends on your design, configuration, specs, data and access pattern.

what is best approach to check the existence of the records before inserting?

When you want to check something, checking is the only option. It'll be helpful if you could elaborate your use case a bit more. That will help me in coming up with some proper suggestion.

like image 40
Tariq Avatar answered Nov 17 '22 19:11

Tariq