Pros & cons of BigQuery vs. Amazon Redshift [closed]

Tags:

Comparing Google BigQuery vs. Amazon Redshift shows that both can answer same set of requirements, differ mostly by cost plans. It seems that Redshift is more complex to configure (defining keys and optimization work) vs. Google BigQuery that perhaps has an issue with joining tables.

Is there a pros & cons list of Google BigQuery vs. Amazon Redshift?

262

asked Oct 13 '14 12:10

user2339344

2 Answers

I posted this comparison on reddit. Quickly enough a long term RedShift practitioner came to comment on my statements. Please see https://www.reddit.com/r/bigdata/comments/3jnam1/whats_your_preference_for_running_jobs_in_the_aws/cur518e for the full conversation.

Sizing your cluster:

Redshift will ask you to choose a number of CPUs, RAM, HD, etc. and to turn them on.
BigQuery doesn't care. Use it whenever you want, no provisioning needed.

Hourly costs when doing nothing:

Redshift will ask you to pay per hour of each of these servers running, even when you are doing nothing.
When idle BigQuery only charges you $0.02 per month per GB stored. 2 cents per month per GB, that's it.

Speed of queries:

Redshift performance is limited by the amount of CPUs you are paying for
BigQuery transparently brings in as many resources as needed to run your query in seconds.

Indexing:

Redshift will ask you to index (correction: distribute) your data under certain criteria, and you'll only be able to run fast queries based on this index.
BigQuery has no indexes. Every operation is fast.

Vacuuming:

Redshift requires periodic maintenance and 'vacuum' operations that last hours. You are paying for each of these server hours.
BigQuery does not. Forget about 'vacuuming'.

Data partitioning and distributing:

Redshift requires you to think about how to distribute data within your servers to keep performance up - optimization that works only for certain queries.
BigQuery does not. Just run whatever query you want.

Streaming live data:

Impossible(?) with Redshift.
BigQuery easily handles ingesting up to 100,000 rows per second per table.

Growing your cluster:

If you have more data, or more concurrent users scaling up will be painful with Redshift.
BigQuery will just work.

Multi zone:

You want a multi-zone Redshift for availability and data integrity? Painful.
BigQuery is multi-zoned by default.

To try BigQuery you don't need a credit card or any setup time. Just try it (quick instructions to try BigQuery).

When you are ready to put your own data into BigQuery, just copy your JSON new-line separated logs from to Google Cloud Storage and import them.

See this in depth guide to data warehouse pricing on the cloud: Understanding Cloud Pricing Part 3.2 - More Data Warehouses

186

answered Oct 21 '22 02:10

Felipe Hoffa

Amazon Redshift is a standard SQL database (based on Postgres) with MPP features that allow it to scale. These features also require you to conform your data model somewhat to get the best performance. It supports a large amount of the SQL standard and most tools that can speak to Postgres can use it unchanged.

BigQuery is not a database, in the sense that there it doesn't use standard SQL and doesn't provide JDBC/ODBC connectivity. It's a unique service with it's own API and interfaces. It provides limited support for SQL queries but most users interact with via custom code (Java, Python, etc.). Some 3rd party tools have added support for BigQuery but existing tools will not work without modification.

tl;dr - Redshift is better for interacting with existing tools and using complex SQL. BigQuery is better for custom coded interactions and teams who dislike SQL.

UPDATE 2017-04-17 - Here's a much more up to date summary of the cost and speed differences (wrapped in a sales pitch so YMMV). TL;DR - Redshift is usually faster and will be cheaper if you query the data somewhat regularly. http://blog.panoply.io/a-full-comparison-of-redshift-and-bigquery

UPDATE - Since I keep getting down votes on this (🤷‍♂️) here's an up-to-date response to the items in the other answer:

Sizing your cluster:

Redshift allows you to tailor your costs to your usage. If you want the fastest possible queries choose SSD nodes and if you want the lowest possible cost per GB choose HDD nodes. Start small and add nodes whenever you want.

Hourly costs when doing nothing:

Redshift keeps your cluster ready for queries, can respond in milliseconds (result cache) and it provides a simple, predictable monthly bill.
For example, even if some script accidentally runs 10,000 giant queries over the weekend your Redshift bill will not increase at all.

Speed of queries:

Redshift performance is absolutely best in class and gets faster all the time. 3-5x faster in the last 6 months.

Indexing:

Redshift has no indexes. It allows you to define sort keys to optimize performance from fast to insanely fast.

Vacuuming:

Redshift now automatically runs routine maintenance such as ANALYZE and VACUUM DELETE when your cluster has free resource.

Data partitioning and distributing:

Redshift never requires distribution. It allows you to define distribution keys which can make even huge joins very fast.
{Ask competitors about join performance…}

Streaming live data:

Redshift has 2 choices
- Stream real time data into Redshift using Amazon Kinesis Firehose.
- Skip ingestion altogether by querying your real time instantly on S3 as soon as it land (and at high speeds) using Redshift Spectrum external tables.

Growing your cluster:

Redshift can elastically resize most clusters in a few minutes.

Multi zone:

Redshift seamlessly replaces any failed hardware and continuously backs up your data, including across regions if desired.

answered Oct 21 '22 03:10

Joe Harris

Related questions
                            
                                BigQuery - remove unused column from schema
                            
                                BigQuery - NULL values
                            
                                BigQuery - Export query results to local file/Google storage
                            
                                Google App Engine: Using Big Query on datastore?
                            
                                Bigquery - json_extract all elements from an array
                            
                                Copy table structure alone in Bigquery
                            
                                Google BigQuery - how to drop table with bq command?
                            
                                Count number of GCP log entries during a specified time
                            
                                Export from Google BigQuery into CloudSQL?
                            
                                BIGQUERY SELECT list expression references column CHANNEL_ID which is neither grouped nor aggregated at [10:13]
                            
                                Default values for columns in Big Query Tables
                            
                                How to extract all the keys in a JSON object with BigQuery
                            
                                Best Practice to migrate data from MySQL to BigQuery
                            
                                Avoid correlated subqueries error in BigQuery
                            
                                How can I change the project in BigQuery
                            
                                BigQuery - how to compare a "date" column (using legacy SQL)?
                            
                                Oops! used a reserved word to name a column
                            
                                How to convert Timestamp to Date Data Type in Google Bigquery
                            
                                What is the difference between NUMERIC and FLOAT in BigQuery?
                            
                                Calculate percentage of group using GROUP BY

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Pros & cons of BigQuery vs. Amazon Redshift [closed]

Tags:

google-bigquery

amazon-redshift

user2339344

People also ask

2 Answers

Felipe Hoffa

Joe Harris

Recent Activity

Donate For Us