Bigtable row key scenario to avoid hotspotting?
A company needs you to create a schema in Google Bigtable that will allow for the historical analysis of the last 2 years of records. Each record that comes in is sent every 15 minutes, and contains a unique identifier of the device and a data record. The most common query is for all the data for a given device for a given day. Which schema should you use?
What would be the best option in above?
According to the Bigtable schema documentation:
Rows are sorted lexicographically by row key.
This means that in order to avoid hotspotting, common queries should return row results that sequential.
Essentially, you want to be querying rows with a given date and device id. Google Cloud Bigtable allows you query rows by a certain row key prefix. Since the most common queries all the data for a given device and date, the device and date need to be part of the row prefix query, and must be the first two entries in a row key.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With