Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to estimate BigTable storage utilization?

How does one estimate how much space a BigTable table will actually use?

Let's say I have 1B rows with one column family. The qualifier is a 10-character string. The value is a 5-character string. GC policy: only most recent version.

The raw data is 15 GB, but of course there is lots of overhead such as storing lengths and timestamps. How much storage utilization should one expect?

What if I have 2 such families? Does it simply multiply?

like image 449
Adam Avatar asked Mar 07 '26 22:03

Adam


1 Answers

Unfortunately there isn't a very precise rule of thumb here, but you should expect somewhere on the same order of magnitude as the logical data size.

Things can get substantially smaller if your data compresses well, but shouldn't get substantially larger modulo the obvious sources of overhead you mentioned. If they do, let us know!

For example, some naive math on your example would expect 8B/timestamp * 1B = 8GB of extra space for timestamps, but consider that all your timestamps are likely to be close together, and so might reasonably compress to half that. If you have rows or row ranges that contain multiple values with identical or near-identical timestamps, the compression may be even better.

Also bear in mind that this is constant overhead per value, so with larger values it will contribute a smaller fraction of the overall cost. And, of course, the list price for even 8GB of extra SSD space is < $2/month (https://cloud.google.com/products/calculator/#id=996764ef-d4a4-4043-8016-177c8100a35f)

like image 56
Douglas McErlean Avatar answered Mar 10 '26 10:03

Douglas McErlean



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!