Bigtable row key scenario to avoid hotspotting?

Question

A company needs you to create a schema in Google Bigtable that will allow for the historical analysis of the last 2 years of records. Each record that comes in is sent every 15 minutes, and contains a unique identifier of the device and a data record. The most common query is for all the data for a given device for a given day. Which schema should you use?

A. Rowkey: date#device_id, Column data: data_point
B. Rowkey: date, Column data: device_id, data_point
C. Rowkey: device_id, Column data: date, data_point
D. Rowkey: data_point, Column data: device_id, date
E. Rowkey: date#data_point, Column data: device_id

What would be the best option in above?

rohanphadte · Accepted Answer

According to the Bigtable schema documentation:

Rows are sorted lexicographically by row key.

This means that in order to avoid hotspotting, common queries should return row results that sequential.

Essentially, you want to be querying rows with a given date and device id. Google Cloud Bigtable allows you query rows by a certain row key prefix. Since the most common queries all the data for a given device and date, the device and date need to be part of the row prefix query, and must be the first two entries in a row key.

Bigtable row key scenario to avoid hotspotting?

Tags:

bigtable

google-cloud-bigtable

Roshan Fernando

1 Answers

rohanphadte

Recent Activity

Donate For Us

Bigtable row key scenario to avoid hotspotting?

Tags:

bigtable

google-cloud-bigtable

Roshan Fernando

1 Answers

rohanphadte

Related questions

Recent Activity

Donate For Us