Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

BigQuery - Transfers automation from Google Cloud Storage - Overwrite table

Here's the case:

  • Our client daily uploads CSVs (overwritten) to a bucket in Google Cloud Storage (each table in a different file).
  • We use BigQuery as DataSource in DataStudio
  • We want to automatically transfer the CSVs to BigQuery.

The thing is, even though we've:

  1. Declared the tables in BigQuery with "Overwrite table" write preference option
  2. Configured the daily Transfers vía UI (BigQuery > Transfers) to automatically upload the CSVs from Google Cloud one hour after the files are uploaded to Google Cloud, as stated by the limitations.

The automated transfer/load is by default in "WRITE_APPEND", thus the tables are appended instead of overwritten in BigQuery.

Hence the question: How/where can we change the

configuration.load.writeDisposition = WRITE_TRUNCATE

as stated here in order to overwrite the tables when the CSVs are automatically loaded?

I think that's what we're missing.

Cheers.

like image 571
gabi493 Avatar asked Jan 02 '19 13:01

gabi493


People also ask

How do I overwrite existing table in BigQuery?

To append to or overwrite a table using query results, specify a destination table and set the write disposition to either: Append to table — Appends the query results to an existing table. Overwrite table — Overwrites an existing table with the same name using the query results.

How do I transfer data from one BigQuery table to another?

In the Google Cloud console, go to the BigQuery page. Click Data transfers. Select a transfer for which you want to view the transfer details. On the Transfer details page, select a transfer run.


2 Answers

None of the above worked for us, so I'm posting this in case anyone has the same issue.

We scheduled a query to erase the table content just before the automatic importation process starts:

DELETE FROM project.tableName WHERE true

And then, new data will be imported to a void table, therefore default "WRITE_APPEND" doesn't affect us.

like image 200
gabi493 Avatar answered Oct 04 '22 20:10

gabi493


1) One way to do this is to use DDL to CREATE and REPLACE your table before running the query which imports the data.

This is an example of how to create a table

#standardSQL
 CREATE TABLE mydataset.top_words
 OPTIONS(
   description="Top ten words per Shakespeare corpus"
 ) AS
 SELECT
   corpus,
   ARRAY_AGG(STRUCT(word, word_count) ORDER BY word_count DESC LIMIT 10) AS top_words
 FROM bigquery-public-data.samples.shakespeare
 GROUP BY corpus;

Now that it's created you can import your data.

2) Another way is to use BigQuery schedule Queries enter image description here

3) If you write Python you can find an even better solution here

like image 22
Tamir Klein Avatar answered Oct 04 '22 22:10

Tamir Klein