Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Setting clustering column in BigQuery python api

I'm trying to create a clustered table in BigQuery.

When I test it in the UI, it works perfectly:

CREATE OR REPLACE TABLE `project_id_xyz.temp.clustering`
PARTITION BY date
CLUSTER BY cluster_col AS
SELECT CURRENT_DATE() as date, 1 as cluster_col

However when I try the same via google-bigquery==1.9.0 in python (3.7.1), the table is created and partitioned but not clustered. As seen in the "details" tab in the UI.

Here is the snippet I use to create the table.

dataset = client.dataset("temp")
table = dataset.table("clustering_test")
job_config = bigquery.QueryJobConfig()
job_config.destination = table
job_config.write_disposition = "WRITE_TRUNCATE"

time_partitioning = TimePartitioning()
time_partitioning.field = "date"
job_config.time_partitioning = time_partitioning
job_config.clustering_fields = ["cluster_col"]

sql = """
    SELECT CURRENT_DATE() as date, 1 as cluster_col
"""
query_job = client.query(
    sql,
    location='US',
    job_config=job_config)

query_job.result() 

Code seems very straightforward and also doesn't throw any exceptions.

Is there anything obvious that I'm doing wrong?

like image 233
Dimitri Masin Avatar asked Mar 18 '26 12:03

Dimitri Masin


1 Answers

I run your python code and I can confirm it's working as expected with the cluster settings.

The solution for your problem using Python 3.6.7 is to create a clean version and run your code again

like image 94
Tamir Klein Avatar answered Mar 21 '26 04:03

Tamir Klein



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!