Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to write query result to Google Cloud Storage bucket directly?

from google.cloud import bigquery  
query = """ select * from emp where emp_name=@emp_name""" 
query_params = [bigquery.ScalarQueryParameter('emp_name', 'STRING', 'name')] 
job_config = bigquery.QueryJobConfig() 
job_config.query_parameters = query_params  
client = bigquery.Client() 
query_job = client.query(query, job_config=job_config) 
result = query_job.result()

How can I write the result to Google Cloud Storage instead of writing it to the CSV and uploading it to cloud storage bucket?

like image 917
bobby j Avatar asked Jun 07 '18 07:06

bobby j


People also ask

How do I query and extract data from GCP?

In the Explorer panel, expand your project and dataset, then select the table. In the details panel, click Export and select Export to Cloud Storage. In the Export table to Google Cloud Storage dialog: For Select Google Cloud Storage location, browse for the bucket, folder, or file where you want to export the data.


2 Answers

from google.cloud import bigquery
from google.oauth2 import service_account


credentials = service_account.Credentials.from_service_account_file("dev-key.json", scopes=["https://www.googleapis.com/auth/cloud-platform"],)
client = bigquery.Client(credentials=credentials, project=credentials.project_id,)


bq_export_to_gs = """
EXPORT DATA OPTIONS(
  uri='gs://my-bucket/logs/edo/dengg_audit/bq-demo/temp4/*',
  format='CSV',
  overwrite=true,
  header=false,
  field_delimiter='^') AS
select col1 , col2 from  `project.schema.table` where clientguid = '1234' limit 10

"""

query_job= client.query(bq_export_to_gs)
results = query_job.result()
for row in results:
    print(row)

like image 55
Dave Avatar answered Oct 14 '22 22:10

Dave


Depending on your specific use case (frequency of the export, size of the exports, etc.), the solutions proposed in the answer by @GrahamPolley may work for you, although they would take more development and attention.

The current possibility for writing query results is either to write the results to a table or to download it locally, and even downloading directly to CSV has some limitations. Therefore, there is not the possibility to write query results to GCS in CSV format directly. However, there is a 2-steps solutions consisting in:

  1. Write query results to a BQ table
  2. Export data from a BQ table to a CSV file in GCS. Note that this feature has some limitations too, but they are not as narrow.

The following Python code can give you an idea of how to perform that task:

from google.cloud import bigquery
client = bigquery.Client()

# Write query results to a new table
job_config = bigquery.QueryJobConfig()
table_ref = client.dataset("DATASET").table("TABLE")
job_config.destination = table_ref
job_config.write_disposition = bigquery.WriteDisposition.WRITE_TRUNCATE

query_job = client.query(
    'SELECT name FROM `bigquery-public-data.usa_names.usa_1910_2013` LIMIT 10',
    location='US', # Location must match dataset
    job_config=job_config)
rows = list(query_job)  # Waits for the query to finish


# Export table to GCS
destination_uri = "gs://BUCKET/FILE.CSV"
dataset_ref = client.dataset("DATASET", project="PROJECT_ID")
table_ref = dataset_ref.table("TABLE")

extract_job = client.extract_table(
    table_ref,
    destination_uri,
    location='US')
extract_job.result()  # Waits for job to complete

Note that, after that, you would have to delete the table (you can also do that programatically). This may not be the best solution if you have to automatize the process (if that is your use case, maybe you should better explore @Graham's solutions), but it will do the trick for a simple scenario.

like image 28
dsesto Avatar answered Oct 15 '22 00:10

dsesto