Google BigQuery pricing

Tags:

google-bigquery

I'm a Phd student from Singapore Management University. Currently I'm working in Carnegie Mellon University on a research project which needs the historical events from Github Archive (http://www.githubarchive.org/). I noticed that Google Bigquery has Github Archive data. So I run a program to crawl data using Google Bigquery service.

I just found that the price of Google bigquery shows on the console is not updated in real-time... While I started running the program for a few hours, the fee was only 4 dollar plus, so I thought the price is reasonable and I kept running the program. After 1~2 days, I checked the price again on Sep 13, 2013, the price became 1388$...I therefore immediately stopped using Google bigquery service. And just now I checked the price again, it turns out I need to pay 4179$...

It is my fault that I didn't realize I need to pay this big amount of money for executing queries and obtaining data from Google bigquery.

This project is only for research, not for commercial purpose. I would like to know whether it is possible to waive the fee. I really need [Google Bigquery team]'s kindly help.

Thank you very much & Best Regards, Lisa

957

asked Sep 16 '13 17:09

dodoro

1 Answers

A year later update:

Please note some big developments since this situation:

Querying prices are 85% down.
GithubArchive is publishing daily and yearly tables now - so while developing your queries always test them on smaller datasets.

BigQuery pricing is based on the amount of data queried. One of its highlights is how easily it scales, going from scanning few gigabytes to terabytes in very few seconds.

Pricing scaling linearly is a feature: Most (or all?) other databases I know of would require exponentially more expensive resources, or are just not able to handle these amounts of data - at least not in a reasonable time frame.

That said, linear scaling means that a query over a terabyte is a 1000 times more expensive than a query over a gigabyte. BigQuery users need to be aware of this and plan accordingly. For these purposes BigQuery offers the "dry run" flag, that allows one to see exactly how much data will be queried before running the query - and adjust accordingly.

In this case WeiGong was querying a 105 GB table. Ten SELECT * LIMIT 10 queries will quickly amount to a terabyte of data, and so on.

There are ways to make these same queries consume much less data:

Instead of querying SELECT * LIMIT 10, call only the columns you are looking for. BigQuery charges based on the columns you are querying, so having unnecessary columns, will add unnecessary costs.

For example, SELECT * ... queries 105 GB, while SELECT repository_url, repository_name, payload_ref_type, payload_pull_request_deletions FROM [githubarchive:github.timeline] only goes through 8.72 GB, making this query more than 10 times less expensive.

Instead of "SELECT *" use tabledata.list when looking to download the whole table. It's free.
Github archive table contains data for all time. Partition it if you only want to see one month data.

For example, extracting all of the January data with a query leaves a new table of only 91.7 MB. Querying this table is a thousand times less expensive than the big one!

SELECT *
FROM [githubarchive:github.timeline]
WHERE created_at BETWEEN '2014-01-01' and '2014-01-02'
-> save this into a new table 'timeline_201401'

Combining these methods you can go from a $4000 bill, to a $4 one, for the same amount of quick and insightful results.

(I'm working with Github archive's owner to get them to store monthly data instead of one monolithic table to make this even easier)

116

answered Sep 28 '22 17:09

Felipe Hoffa

Related questions
                            
                                Does Google BigQuery supports Parquet file format?
                            
                                Bigquery - How to zip two arrays into one?
                            
                                A question about how to select the value of a event_params.key in big query
                            
                                String to date time conversion Bigquery
                            
                                BigQuery IF field exists THEN
                            
                                How to query for data in streaming buffer ONLY in BigQuery?
                            
                                Using BigQuery with R for Analyzing Data
                            
                                BigQuery - set "" as Null
                            
                                [email protected] does not have bigquery.jobs.create permission in project yyyy
                            
                                Is there a limit to the number of tables allowed in bigquery
                            
                                Using REGEXP_EXTRACT to get domain and subdomains
                            
                                Is there an equivalent of table wildcard functions in BigQuery with standard SQL?
                            
                                Reshape from wide to long in big query (standard SQL)
                            
                                Is there an easy way to convert rows in BigQuery to JSON?
                            
                                Verify BigQuery table existence
                            
                                Convert UNIX time (INT) to timestamp in BigQuery
                            
                                How to convert an Epoch timestamp to a Date in Standard SQL
                            
                                Why BigQuery doesn't have an option to remove column?
                            
                                Unnesting Multiple Nested Fields Deep in BigQuery
                            
                                Cannot insert new value to BigQuery table after updating with new column using streaming API

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With