Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Bigquery pricing variations?

We are running BigQuery for about 4 months now and we have some interesting observations and questions regarding pricings:

  1. Pricing model is quite unexplained it needs explanations for joins, unions and so on
  2. If I run join on small table my data processed will increase for each copy of small table that was multiplied inside bigquery engine?
  3. If I reuse same sub-query inside a query this is new data processing regardless it is the same subquery? Can this be optimized in any way? Doesn't bigquery cache it's data?
  4. Is there minimum size that is rounded up for data processing? Our observations think it is 4MB, is this correct?
  5. In Google Console there is a problem with geo locations, it says that most of the queries are done from CA which is incorrect since my company is in Europe and all development is done there. I checked all IP addresses and there is no CA ip. Is it possible to get more detailed report?
  6. Regarding to 5, we would like to get history of queries and price for each one. How can we do this? Must our software do this or?
  7. Regarding 3) I have this huge problem with data processing (this is why I asked all the questions). My entire dataset is about 150mb large and yet my DP size per query is 3 times larger? If bigquery reads my entire data it would not reach the sizes it shows on data processed. When are all the cases that data will be re-read and thus new DP will take place? And how do you count this?

Thank you.

Regards, Tomislav

like image 991
lord.fist Avatar asked Oct 22 '22 11:10

lord.fist


1 Answers

  1. The pricing is pretty simple for joins, unions, etc. Whatever source data needs to be read, gets charged exactly once. If you do a self-join, for example, you still only get charged for a single read of the data.
  2. See #1. Regardless of how bigquery works internally, you should only get charged once.
  3. BigQuery does cache data on a best-effort basis. If you re-run a query, the cache is checked. If there is a cache hit, you will not get charged for re-running the query. Note that only full query results are cached; if you modify the query or use it in a subquery, the cache will not be used.
  4. BigQuery rounds up to 1MB increments, with a 10 MB minimum.
  5. I wasn't aware of any geo-location in google console. Where do you see this information?
  6. You can get a history of queries and the amount of data processed by each one via the jobs.list api.
  7. Can you send a job id of a query that you ran that was larger than expected. The calculation that is done is 8 bytes * (# of rows of integer and float columns read) + 2 bytes * (total length of all strings in string columns read).

Let me know if this answers your questions.

like image 69
Jordan Tigani Avatar answered Nov 11 '22 18:11

Jordan Tigani