For Example for hbase table 'test_table', Values inserted are:
Row1 - Val1 => t
Row1 - Val2 => t + 3
Row1 - Val3 => t + 5
Row2 - Val1 => t
Row2 - Val2 => t + 3
Row2 - Val3 => t + 5
on scan 'test_table' where version = t + 4 should return
Row1 - Val1 => t + 3
Row2 - Val2 => t + 3
How do i achieve time stamp based scans (Based on latest available value less than or equal to the timestamp) in HBase?
Timestamp can be set in write request to HBase. By default HBase sets timestamp at server to the current value of epoch time in milliseconds. Versions count stored by HBase can be set per column family.
Scaning using HBase Shell The scan command is used to view the data in HTable. Using the scan command, you can get the table data. Its syntax is as follows: scan '<table name>'
When you compare a partial key scan and a get, remember that the row key you use for Get can be a much longer string than the partial key you use for the scan. In that case, for the Get, HBase has to do a deterministic lookup to ascertain the exact location of the row key that it needs to match and fetch it.
Consider this table:
hbase(main):009:0> create 't1', { NAME => 'f1', VERSIONS => 100 }
hbase(main):010:0> put 't1', 'key1', 'f1:a', 'value1'
hbase(main):011:0> put 't1', 'key1', 'f1:a', 'value2'
hbase(main):012:0> put 't1', 'key1', 'f1:a', 'value3'
hbase(main):013:0> put 't1', 'key2', 'f1:a', 'value4'
hbase(main):014:0> put 't1', 'key2', 'f1:a', 'value5'
hbase(main):015:0> put 't1', 'key1', 'f1:a', 'value6'
Here's its scan in shell with all the versions:
hbase(main):003:0> scan 't1', {VERSIONS => 100 }
ROW COLUMN+CELL
key1 column=f1:a, timestamp=1416083314098, value=value6
key1 column=f1:a, timestamp=1416083294981, value=value3
key1 column=f1:a, timestamp=1416083293273, value=value2
key1 column=f1:a, timestamp=1416083291009, value=value1
key2 column=f1:a, timestamp=1416083305050, value=value5
key2 column=f1:a, timestamp=1416083299840, value=value4
Here's the scan limited to a specific timestamp, as you requested:
hbase(main):002:0> scan 't1', { TIMERANGE => [0, 1416083300000] }
ROW COLUMN+CELL
key1 column=f1:a, timestamp=1416083294981, value=value3
key2 column=f1:a, timestamp=1416083299840, value=value4
Here's the same in Java code:
package org.example.test;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.client.*;
import org.apache.hadoop.hbase.util.Bytes;
import java.io.IOException;
public class test {
public static void main (String[] args) throws IOException {
HTable table = new HTable(HBaseConfiguration.create(), "t1");
Scan s = new Scan();
s.setMaxVersions(1);
s.setTimeRange (0L, 1416083300000L);
ResultScanner scanner = table.getScanner(s);
for (Result rr = scanner.next(); rr != null; rr = scanner.next()) {
System.out.println(Bytes.toString(rr.getRow()) + " => " +
Bytes.toString(rr.getValue(Bytes.toBytes("f1"), Bytes.toBytes("a"))));
}
}
}
Be aware that specifying the time range maximal value is excluded, which means that if you want to get the last value for all the keys with maximum timestamp T, you should specify upper bound of the range to T+1
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With