Comparing Google BigQuery vs. Amazon Redshift shows that both can answer same set of requirements, differ mostly by cost plans. It seems that Redshift is more complex to configure (defining keys and optimization work) vs. Google BigQuery that perhaps has an issue with joining tables.
Is there a pros & cons list of Google BigQuery vs. Amazon Redshift?
adverb. in favor of a proposition, opinion, etc. noun, plural pros. a proponent of an issue; a person who upholds the affirmative in a debate. an argument, consideration, vote, etc., for something.
Definition of pros and cons 1 : arguments for and against —often + of Congress weighed the pros and cons of the new tax plan. 2 : good points and bad points Each technology has its pros and cons.
When weighing options, we use “pros” to describe positives while using cons to describe negatives. The idiom “pro and con” compares the advantages and disadvantages of something with the intention to aid in the decision-making process.
The pros and cons of something are its advantages and disadvantages, which you consider carefully so that you can make a sensible decision.
I posted this comparison on reddit. Quickly enough a long term RedShift practitioner came to comment on my statements. Please see https://www.reddit.com/r/bigdata/comments/3jnam1/whats_your_preference_for_running_jobs_in_the_aws/cur518e for the full conversation.
Sizing your cluster:
Hourly costs when doing nothing:
Speed of queries:
Indexing:
Vacuuming:
Data partitioning and distributing:
Streaming live data:
Growing your cluster:
Multi zone:
To try BigQuery you don't need a credit card or any setup time. Just try it (quick instructions to try BigQuery).
When you are ready to put your own data into BigQuery, just copy your JSON new-line separated logs from to Google Cloud Storage and import them.
See this in depth guide to data warehouse pricing on the cloud: Understanding Cloud Pricing Part 3.2 - More Data Warehouses
Amazon Redshift is a standard SQL database (based on Postgres) with MPP features that allow it to scale. These features also require you to conform your data model somewhat to get the best performance. It supports a large amount of the SQL standard and most tools that can speak to Postgres can use it unchanged.
BigQuery is not a database, in the sense that there it doesn't use standard SQL and doesn't provide JDBC/ODBC connectivity. It's a unique service with it's own API and interfaces. It provides limited support for SQL queries but most users interact with via custom code (Java, Python, etc.). Some 3rd party tools have added support for BigQuery but existing tools will not work without modification.
tl;dr - Redshift is better for interacting with existing tools and using complex SQL. BigQuery is better for custom coded interactions and teams who dislike SQL.
UPDATE 2017-04-17 - Here's a much more up to date summary of the cost and speed differences (wrapped in a sales pitch so YMMV). TL;DR - Redshift is usually faster and will be cheaper if you query the data somewhat regularly. http://blog.panoply.io/a-full-comparison-of-redshift-and-bigquery
UPDATE - Since I keep getting down votes on this (🤷‍♂️) here's an up-to-date response to the items in the other answer:
Sizing your cluster:
Hourly costs when doing nothing:
Speed of queries:
Indexing:
Vacuuming:
Data partitioning and distributing:
Streaming live data:
Growing your cluster:
Multi zone:
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With