I ran into some freak undocumented behavior of time-partitioned bigquery tables: I created a time-partitioned table in BigQuery and inserted data. I was able to insert normally - data was written to today's partition (I was also able to explicitly specify a partition and write into it) After some tests with new data, I deleted today's partition, in order to have clean data:(CLI) <pre class="prettyprint"><code>bq --project_id=my-project rm v1.mytable$20160613 </code></pre> I then checked whether it's empty: <pre class="prettyprint"><code>select count(*) from [v1.mytable] </code></pre> Result 270 instead of 0 I tried deleting again and rerunning the query - same result. So I queried <pre class="prettyprint"><code>select count(*) from [v1.mytable$20160613] </code></pre> Result 0 so a couple of previous dates in which I may have inserted data, but all were 0. Finally I ran <pre class="prettyprint"><code>SELECT partition_id from [v1.mytable$__PARTITIONS_SUMMARY__]; </code></pre> and the result was { UNPARTITIONED 20160609 20160613 } and all the data was in fact in UNPARTITIONED My questions: <ol> <li>When is the data written to this special partition instead of the daily partition, and how can I avoid this?</li> <li>Are there other effects, except from losing the ability to address specific dates (in query, or when deleting data, etc.)? should I take care for this case?</li> </ol>

<ol> <li> While data is in the streaming buffer, it remains in the UNPARTITIONED partition. To address this partition in a query, you can use the value NULL for the _PARTITIONTIME pseudo column. <code>SELECT ... FROM mydataset.mypartitioned_table WHERE _PARTITIONTIME IS NULL</code> </li> <li> To delete data for a given partition, we suggest doing a write truncate to it with a query that returns an empty result. For example: <code>bq query --destination_table=mydataset.mypartitionedtable\$20160121 --replace 'SELECT 1 as field1, "one" as field2 FROM (SELECT 1 as field1, "one" as field2) WHERE FALSE'</code> </li> </ol> Note that the partition will still be around (if you do a SELECT * from table$__PARTITIONS__SUMMARY), but it will have 0 rows. <pre class="prettyprint"><code>$ bq query 'SELECT COUNT(*) from [mydataset.mypartitionedtable$20160121]' +-----+ | f0_ | +-----+ | 0 | +-----+ </code></pre>

In time-partitioned bigquery tables, when is data written to UNPARTITIONED? what are the effects?

Tags:

google-bigquery

I ran into some freak undocumented behavior of time-partitioned bigquery tables:

I created a time-partitioned table in BigQuery and inserted data. I was able to insert normally - data was written to today's partition (I was also able to explicitly specify a partition and write into it)

After some tests with new data, I deleted today's partition, in order to have clean data:(CLI)

bq --project_id=my-project rm v1.mytable$20160613

I then checked whether it's empty:

select count(*) from [v1.mytable]

Result 270 instead of 0

I tried deleting again and rerunning the query - same result. So I queried

select count(*) from [v1.mytable$20160613]

Result 0

so a couple of previous dates in which I may have inserted data, but all were 0. Finally I ran

SELECT partition_id from [v1.mytable$__PARTITIONS_SUMMARY__];

and the result was

{ UNPARTITIONED 20160609 20160613 }

and all the data was in fact in UNPARTITIONED

My questions:

When is the data written to this special partition instead of the daily partition, and how can I avoid this?
Are there other effects, except from losing the ability to address specific dates (in query, or when deleting data, etc.)? should I take care for this case?

331

asked Jun 13 '16 14:06

Ran Avnimelech

2 Answers

While data is in the streaming buffer, it remains in the UNPARTITIONED partition. To address this partition in a query, you can use the value NULL for the _PARTITIONTIME pseudo column.

SELECT ... FROM mydataset.mypartitioned_table WHERE _PARTITIONTIME IS NULL
To delete data for a given partition, we suggest doing a write truncate to it with a query that returns an empty result. For example:

bq query --destination_table=mydataset.mypartitionedtable\$20160121 --replace 'SELECT 1 as field1, "one" as field2 FROM (SELECT 1 as field1, "one" as field2) WHERE FALSE'

Note that the partition will still be around (if you do a SELECT * from table$__PARTITIONS__SUMMARY), but it will have 0 rows.

$ bq query 'SELECT COUNT(*) from [mydataset.mypartitionedtable$20160121]'

+-----+
| f0_ |
+-----+
|   0 |
+-----+

200

answered Jan 04 '23 16:01

Pavan Edara

This is a temporary state -- querying an hour later the records all belonged to today's partition.

The effect is thus similar to a delay in data write: querying immediately after the insert may not have the most recent data in the correct partition, but eventually this will be ok

answered Jan 04 '23 17:01

Ran Avnimelech

Related questions
                            
                                I have daily tables on BigQuery. How to query the "newest" one?
                            
                                'TRIM' or 'PROPER' in BigQuery
                            
                                BigQuery: How to Avoid "Resources exceeded during query execution." error
                            
                                "bad double value" in Google BigQuery
                            
                                Does Bigquery support triggers?
                            
                                Create a column of UUIDs in Google BigQuery
                            
                                Syntax error: Unexpected string literal '93868086.ga_sessions_' at [1:244] - BigQuery
                            
                                Bigquery ORDER BY (count )
                            
                                Big query is to slow
                            
                                How to get the first not null value from a column of values in Big Query?
                            
                                Return only the newest rows from a BigQuery table with a duplicate items
                            
                                How to filter on date in Big query
                            
                                How to Set Big Query Require Partition Filter in BQ Commandline
                            
                                How to get current TIMESTAMP in UTC from BigQuery?
                            
                                How to simulate a pivot table with BigQuery?
                            
                                Update nested field in BigQuery table
                            
                                Trim a decimal to 2 places Bigquery
                            
                                Converting YYYYMMDD string to date in standard SQL / BigQuery
                            
                                Is there any method to validate a query in the BigQuery api
                            
                                How do I shard a BigQuery table?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

In time-partitioned bigquery tables, when is data written to UNPARTITIONED? what are the effects?

Tags:

google-bigquery

Ran Avnimelech

People also ask

2 Answers

Pavan Edara

Ran Avnimelech

Recent Activity

Donate For Us

In time-partitioned bigquery tables, when is data written to __UNPARTITIONED__? what are the effects?

Tags:

google-bigquery

Ran Avnimelech

People also ask

2 Answers

Pavan Edara

Ran Avnimelech

Related questions

Recent Activity

Donate For Us

In time-partitioned bigquery tables, when is data written to UNPARTITIONED? what are the effects?