Lets assume a table test
cf:a cf:b yy:a kk:cat
"com.cnn.news" zubrava10 sobaka foobar
"ch.main.users" - - - purrpurr
And the first cell ("zubrava") has 10 versions (10 timestamps) ("zubrava1", "zubrava2"...)
How data of this table will be stored on disk?
I mean is the primary index always
("row","column_family:column",timestamp) ?
So 10 versions of the same row for 10 timestamps will be stored together? How the entire table is stored?
Is scan for all values of given column is as fast as in column-oriented models?
SELECT cf:a from test
So 10 versions of the same row for 10 timestamps will be stored together? How the entire table is stored?
Bigtable is a row-oriented database, so all data for a single row are stored together, organized by column family, and then by column. Data is stored in reversed-timestamp order, which means it's easy and fast to ask for the latest value, but hard to ask for the oldest value.
Is scan for all values of given column is as fast as in column-oriented models?
SELECT cf:a from test
No, a column-oriented storage model stores all the data for a single column together, across all rows. Thus, a full-table scan in a column-oriented system (such as Google BigQuery) is faster than in a row-oriented storage system, but a row-oriented system provides for row-based mutations and row-based atomic mutations that a column-oriented storage system typically cannot.
On top of this, Bigtable provides a sorted order of all row keys in lexicographic order; column-oriented storage systems typically make no such guarantees.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With