Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Bigtable row key scenario to avoid hotspotting?

Bigtable row key scenario to avoid hotspotting?

A company needs you to create a schema in Google Bigtable that will allow for the historical analysis of the last 2 years of records. Each record that comes in is sent every 15 minutes, and contains a unique identifier of the device and a data record. The most common query is for all the data for a given device for a given day. Which schema should you use?

  • A. Rowkey: date#device_id, Column data: data_point
  • B. Rowkey: date, Column data: device_id, data_point
  • C. Rowkey: device_id, Column data: date, data_point
  • D. Rowkey: data_point, Column data: device_id, date
  • E. Rowkey: date#data_point, Column data: device_id

What would be the best option in above?

like image 700
Roshan Fernando Avatar asked Jan 31 '26 13:01

Roshan Fernando


1 Answers

According to the Bigtable schema documentation:

Rows are sorted lexicographically by row key.

This means that in order to avoid hotspotting, common queries should return row results that sequential.

Essentially, you want to be querying rows with a given date and device id. Google Cloud Bigtable allows you query rows by a certain row key prefix. Since the most common queries all the data for a given device and date, the device and date need to be part of the row prefix query, and must be the first two entries in a row key.

like image 185
rohanphadte Avatar answered Feb 03 '26 11:02

rohanphadte