Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to structure views in bigquery for efficient access management

In BigQuery you give users/roles (or authorized views) access on dataset-level and not views/table-level. The challenge I want to address is how to manage access control in bigquery when I have hundreds of tables and views and many different roles/departments that should have access to both views shared across all departments and views only for a particular role/department?

Example: let's say I have a source dataset with source tables A->D and three views for each table exposing different fields based on sensitivity of data 1->3. Also, I have three roles (Blue, Green, Red). If I could manage access on table-level it would look like this:

View: roles

  • A1: Blue, Red

  • A2: Red

  • A3: Red

  • B1: Blue, Green, Red

  • B2: Green, Red

  • B3: Red

  • C1: Green, Red

  • C2: Green, Red

  • C3: Red

  • D1: Red

  • D2: Red

  • D3: Red

Given these requirements, I can't create datasets based only on sensitivity (1-3) or source (A-D) and manage access based on that. The only solution I can see that meet this is generating a dataset per role. This could be done manually if the number of roles and views are few, but when managing 10+ roles and 50+ views it becomes more challenging.

The only solution I can come up with is a CI/CD setup (cloud build) with file/s defining datasets (i.e. roles), dependencies and DDL-statement/s. Letting a script/program iterate through the file/s and generate views and give access (authorized view) to source. Example file:

{"roles":["crm_analyst", "admin", "customer_service_agent"],
"ddl":"CREATE VIEW `myproject.'{role}'.newview` AS SELECT column_1, column_2, column_3 FROM myproject.mydataset.myview",
"dependencies":"myproject.mydataset.myview"}

How do other companies solve this? There are large banks that have migrated to bigquery that must have loads of departments and different sensitivity of data sets.

like image 640
nDakota Avatar asked Jan 01 '26 22:01

nDakota


1 Answers

I ended up writing a python script that reads view definitions from json-files and then generate datasets and views and give correct access rights. The solution is a bit rough and could make use of dependency mapping (when a view queries another view) instead of the current solution iterating views until all views are generated or the script can't generate anymore views (broken dependencies). The script generates two datasets per group, one with READER (suffix '_ro') and one with WRITER (suffix '_rw') to make sure that views generated by data team can't be modified and at the same time give a sandbox for the group. The group should be an e-mail group and the name of datasets will be the local-part of the email address. The script is executed by google cloud build and triggered by a push to our github repo.

Example view definition (path: views/view_test.json)

{
    "groups":["[email protected]", "[email protected]"],
    "sql":"SELECT * FROM `{project}.shared_views.test_view`"
}

Generates the following datasets (access) and views:

analysts_ro ([email protected]:READER):
- view_test

analysts_rw ([email protected]:WRITER):
(empty)

developers_ro ([email protected]:READER):
- view_test

developers_rw ([email protected]:WRITER):
(empty)

shared_views (analysts_ro.view_test:None, developers_ro.view_test:None):
- test_view

I made the python script available on github as open source as part of datahem, feel free to clone, improve and use for your own purposes.

like image 156
nDakota Avatar answered Jan 04 '26 21:01

nDakota



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!