Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Creating a public dataset (or: split storage costs and compute costs across two projects)

I would like to use BigQuery to host datasets that others can query without incurring processing charges against my project. I understand that when I upload a dataset to a project, the storage costs are associated with the project. I want others to be able to discover my dataset, access it via their project/account (preferably without my intervention), and run as many queries on it as they choose to pay for. So, storage costs would go to me, but compute costs would go to those who run the queries.

Is there a way to do this in BigQuery? I asked this via the Google Cloud enterprise sales web form but did not get a response.

like image 745
loren Avatar asked Feb 19 '13 21:02

loren


People also ask

Does BigQuery separate compute from storage?

One of the key features of BigQuery's architecture is the separation of storage and compute. This allows BigQuery to scale both storage and compute independently, based on demand.

What is dataset in project?

A dataset is contained within a specific project. Datasets are top-level containers that are used to organize and control access to your tables and views. A table or view must belong to a dataset, so you need to create at least one dataset before loading data into BigQuery.

What information is required to create a dataset in BigQuery?

When you create a dataset in BigQuery, the dataset name must be unique for each project. The dataset name can contain the following: Up to 1,024 characters. Letters (uppercase or lowercase), numbers, and underscores.

Is BigQuery cost effective?

So in this article, let's look at some cost optimization practices for BigQuery — a serverless and multi-clouded data warehouse. According to Google, BigQuery is already a cost-effective data warehouse compared to other cloud-based platforms.


1 Answers

Absolutely! You can certainly make a dataset public to be queried from other projects, or even share your dataset only with a specific domain, group or user.

In this model, users would be charged for queries to their own Project IDs, while your project covers the storage costs of the datasets. Note that if the users running queries in a different project want to store their resulting tables from their query results, they would of course pay for this storage themselves.

BigQuery currently doesn't provide a mechanism for public dataset discovery. You would have to share the details of your project's public dataset(s) yourself. The GitHub Archive project has a good example of this.

like image 190
Michael Manoochehri Avatar answered Oct 04 '22 04:10

Michael Manoochehri