Controlling file size while exporting data from bigquery to Google Cloud Storage

Tags:

I am working on exporting large dataset from bigquery to Goolge cloud storage into the compressed format. In Google cloud storage I have file size limitation( maximum file size 1GB each file). Therefore I am using split and compassion techniques to split data while exporting. The sample code is as follow:

gcs_destination_uri = 'gs://{}/{}'.format(bucket_name, 'wikipedia-*.csv.gz') 
gcs_bucket = storage_client.get_bucket(bucket_name)

# Job Config
job_config = bigquery.job.ExtractJobConfig()
job_config.compression = bigquery.Compression.GZIP

def bigquery_datalake_load():  
    dataset_ref = bigquery_client.dataset(dataset_id, project=project)
    table_ref = dataset_ref.table(table_id)
    table = bigquery_client.get_table(table_ref)  # API Request
    row_count = table.num_rows

    extract_job = bigquery_client.extract_table(
        table_ref,
        gcs_destination_uri,
        location='US',
        job_config=job_config)  # API request
    logging.info('BigQuery extract Started.... Wait for the job to complete.')
    extract_job.result()  # Waits for job to complete.

    print('Exported {}:{}.{} to {}'.format(
        project, dataset_id, table_id, gcs_destination_uri))
    # [END bigquery_extract_table]

This code is splitting the large dataset and compressing into .gz format but it is returning multiple compressed files which size is rounding between 40MB to 70MB.

I am trying to generate the compressed file with the size of 1GB (each file). Is there any way to get this done?

894

asked Jun 20 '18 17:06

Sandeep Singh

1 Answers

Unfortunately no - Google adjust it by itself - you do not have options to specify size. I believe it is because of size of uncompressed data (so each BQ worker produced one file and it is impossible to produce one file from multiple workers)

177

answered Oct 07 '22 03:10

Alexey Maloletkin

Related questions
                            
                                Firebase Cloud Function to resize image on firestore create
                            
                                Error on Google Assistant while no error on Dialogflow
                            
                                400 Error while reading data using pandas to_gbq to create a table in Google BigQuery
                            
                                Google Cloud Function and Service Account
                            
                                Permissions for creating OAuth credentials in Google Cloud
                            
                                Firebase DatabaseURL - configuring firebase
                            
                                Does firebase storage for android support offline cache?
                            
                                Why google cloud shell auto disconnect after 1 hours
                            
                                Firebase Hosting deploy only to sub directory [duplicate]
                            
                                Error when connecting Spring with Google Cloud SQL
                            
                                Migration from Google cloud datastore to Google cloud sql
                            
                                TypeTags error in Java JDK10 [duplicate]
                            
                                Dataflow job run failing when templateLocation argument is set
                            
                                Display Nested Object with Ng2 Smart Table using AngularFire and Firestore
                            
                                Firebase function issue?
                            
                                Firebase returns "app/bad-app-name" in angularfire2
                            
                                Scheduling kube-dns on dedicated node pool
                            
                                Send notification to only one user in firebase [closed]
                            
                                Using IF in BigQuery SQL
                            
                                FCM Vs GCM? Why we need to migrate from GCM to FCM [duplicate]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With