Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Bigtable scan/get response time (latency) with low frequency calls is very high

I have one small table (of size 100Mb) in bigtable with 10 instances. When i am trying to scan/get a row once every 1 minute, the latency of the call is more than 300ms. If i hit is with more frequent calls like one every second the latency is 50-60ms. I am not sure how can i improve the performance with low frequency calls. is this expected behavior. or am i doing anything wrong.

Here is my test code. I created a single executor for two hbase client connections to big table. but the low frequency connection response is way slower than the connection that make more frequent calls.

Any suggestions?

package com.bids;

import java.io.IOException;
import java.util.ArrayList;
import java.util.LinkedHashMap;
import java.util.List;
import java.util.Map;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.CellUtil;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.TableName;
import org.apache.hadoop.hbase.client.Connection;
import org.apache.hadoop.hbase.client.ConnectionFactory;
import org.apache.hadoop.hbase.client.Result;
import org.apache.hadoop.hbase.client.ResultScanner;
import org.apache.hadoop.hbase.client.Scan;
import org.apache.hadoop.hbase.client.Table;
import org.apache.hadoop.hbase.filter.FilterList;
import org.apache.hadoop.hbase.util.Bytes;
import org.fusesource.jansi.AnsiConsole;

public class BTConnectTest {
    public static void main(String[] args) throws IOException, InterruptedException {

        Configuration hBaseConfig = HBaseConfiguration.create();
        hBaseConfig.set("google.bigtable.project.id", "xxxxxxx");
        hBaseConfig.set("google.bigtable.cluster.name", "hbase-test1");
        hBaseConfig.set("google.bigtable.zone.name", "us-central1-b");
        hBaseConfig.set("hbase.client.connection.impl", "com.google.cloud.bigtable.hbase1_1.BigtableConnection");

        ExecutorService executor = Executors.newSingleThreadExecutor();

        final Connection bigTableConnection1 = ConnectionFactory.createConnection(hBaseConfig, executor);

        final Connection bigTableConnection2 = ConnectionFactory.createConnection(hBaseConfig, executor);

        Thread t = new Thread(new Runnable() {

            @Override
            public void run() {
                while (true) {
                    try {
                        Thread.sleep(1000);
                    } catch (InterruptedException e1) {
                        // TODO Auto-generated catch block
                        e1.printStackTrace();
                    }
                    long before = System.nanoTime();
                    try {
                        makeACall2Bigtable(bigTableConnection2);
                    } catch (Exception e) {
                        // TODO Auto-generated catch block
                        e.printStackTrace();
                    }
                    // bigTableConnection.close();
                    long after = System.nanoTime();

                    long diff = after - before;

                    System.out.println("\t\t\t\t\t\t connection: " + 1 + " diff: " + diff / (1000 * 1000));
                }
            }
        });
        t.start();

        long sum = 0;
        int n = 0;
        while (true) {
            if (n > 60) {
                Thread.sleep(60000);
            }

            long before = System.nanoTime();

            Connection bigTableConnection = bigTableConnection1;
            int label = -1;

            makeACall2Bigtable(bigTableConnection);
            long after = System.nanoTime();

            long diff = after - before;
            n = n + 1;
            sum += diff;
            long avg = sum / (n * 1000 * 1000);
            AnsiConsole a = new AnsiConsole();

            System.out.println("connection: " + 0 + " diff: " + diff / (1000 * 1000) + " avg: " + avg);

        }
        // bigTableConnection.close();

    }

    private static void makeACall2Bigtable(Connection bigTableConnection) throws IOException {

        Table table = bigTableConnection.getTable(TableName.valueOf("customer"));
        Scan scan = new Scan();
        scan.setStartRow(Bytes.toBytes("101"));
        scan.setStopRow(Bytes.toBytes("102"));
        List<String> cols = new ArrayList<String>(3);
        cols.add("name");
        cols.add("age");
        cols.add("weight");
        String keyName = "id";
        final String DEFAULT_COLFAM = "z";
        for (String col : cols) {
            scan.addColumn(Bytes.toBytes(DEFAULT_COLFAM), Bytes.toBytes(col));
        }
        ResultScanner resultScanner = table.getScanner(scan);

        for (Result result : resultScanner) {
            Map<String, String> columnValueMap = new LinkedHashMap<String, String>();
            for (String col : cols) {
                if (result.containsColumn(Bytes.toBytes(DEFAULT_COLFAM), Bytes.toBytes(col))) {
                    columnValueMap.put(col, new String(CellUtil.cloneValue(
                            result.getColumnLatestCell(Bytes.toBytes(DEFAULT_COLFAM), Bytes.toBytes(col)))));
                } else {
                    if (cols.contains(keyName)) {
                        columnValueMap.put(col, null);
                    }

                }
            }

        }
        resultScanner.close();
        table.close();

    }

}
like image 674
PraveenK Avatar asked Oct 19 '25 17:10

PraveenK


2 Answers

  • The first few calls are slower due to known issues. The's some setup that happens on the server side for each "Channel" and we have multiple channels.
  • You shouldn't need finalFilterList.
  • You should cache your Scan, TableName and column family bytes. You can reuse them.
  • If you're getting a single row, do a Get instead of a scan.
  • Do you need executor?
  • Your scan should probably use setMaxVersions(1) just to be safe.
  • Maybe try scan.setStartRow(Bytes.toBytes("101")) and scan.setStopRow(Bytes.toBytes("102")) instead of a row prefix to see if that helps?
  • Make sure that your code is run in the same zone as your cluster.

I hope that helps.

like image 145
Solomon Duskis Avatar answered Oct 22 '25 06:10

Solomon Duskis


If you really are going to do low frequency requests in production, you might wish to run a background thread that makes a random request to your table every few seconds.

Bigtable is really optimized for a large amount of data with frequent access. The first request in a while may call for the data to be read in again. Periodic requests will keep it live.

like image 39
Les Vogel - Google DevRel Avatar answered Oct 22 '25 06:10

Les Vogel - Google DevRel