Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What's the difference between BigQuery and Bigtable? [closed]

Is there any reason why someone would use Bigtable instead of BigQuery? Both seem to support Read and Write operations with the latter offering also advanced 'Query' operations.

I need to develop an affiliate network (thus I need to track clicks and 'sales') so I'm quite confused by the difference because BigQuery seems to be just Bigtable with a better API.

like image 960
The user with no hat Avatar asked Oct 07 '16 14:10

The user with no hat


People also ask

Is Bigtable the same as BigQuery?

To summarise, the primary differences between Bigtable and BigQuery are as follows: Bigtable is a mutable data NoSQL database service that is best suited for OLTP use cases. On the other hand, BigQuery is an immutable SQL data warehouse suitable for OLAP applications like business intelligence and analytics.

Does BigQuery use Bigtable?

Querying Cloud Bigtable data using permanent external tables. To query a Cloud Bigtable data source using a permanent table, you create a table in a BigQuery dataset that is linked to your Cloud Bigtable data source. The data is not stored in the BigQuery table.

What is the difference between Datastore and Bigtable?

Cloud Datastore. BigTable is optimized for high volumes of data and analytics while Datastore is optimized to serve high-value transactional data to applications.

When should I use Bigtable?

Bigtable is ideal for applications that need high throughput and scalability for key/value data, where each value is typically no larger than 10 MB. Bigtable also excels as a storage engine for batch MapReduce operations, stream processing/analytics, and machine-learning applications.


2 Answers

The difference is basically this:

BigQuery is a query Engine for datasets that don't change much, or change by appending. It's a great choice when your queries require a "table scan" or the need to look across the entire database. Think sums, averages, counts, groupings. BigQuery is what you use when you have collected a large amount of data, and need to ask questions about it.

BigTable is a database. It is designed to be the foundation for a large, scaleable application. Use BigTable when you are making any kind of app that needs to read and write data, and scale is a potential issue.

like image 55
Michael Manoochehri Avatar answered Dec 20 '22 22:12

Michael Manoochehri


Google Cloud - GCP database options decision flowchart

This may help a bit in deciding between different datastore solutions that Google cloud offers (Disclaimer! Copied from Google Cloud page)

If your requirement is a live database, BigTable is what you need (Not really an OLTP system though). If it is more of an analytics kind of purpose, then BigQuery is what you need!

Think of OLTP vs OLAP; Or if you are familiar with Cassandra vs Hadoop, BigTable roughly equates to Cassandra, BigQuery roughly equates to Hadoop (Agreed, it's not a fair comparison, but you get the idea)

https://cloud.google.com/images/storage-options/flowchart.svg

Note

Please keep in mind that Bigtable is not a relational database and it does not support SQL queries or JOINs, nor does it support multi-row transactions. Also, it is not a good solution for small amounts of data. If you want an RDBMS OLTP, you might need to look at cloudSQL (mysql/ postgres) or spanner.

Cost Perspective

https://stackoverflow.com/a/34845073/6785908. Quoting the relevant parts here.

The overall cost boils down to how often you will 'query' the data. If it's a backup and you don't replay events too often, it'll be dirt cheap. However, if you need to replay it daily once, you will start triggering the 5$/TB scanned very easily. We were surprised too how cheap inserts and storage were, but this is ofc because Google expects you to run expensive queries at some point in time on them. You'll have to design around a few things though. E.g. AFAIK streaming inserts have no guarantees of being written to the table and you have to poll frequently on tail of list to see if it was really written. Tailing can be done efficiently with time range table decorator, though (not paying for scanning whole dataset).

If you don't care about order, you can even list a table for free. No need to run a 'query' then.

Edit 1

Cloud spanner is relatively young, but is powerful and promising. At least, google marketing claims that it's features are best of both worlds (Traditional RDBMS and noSQL)

enter image description here

like image 22
so-random-dude Avatar answered Dec 20 '22 23:12

so-random-dude