Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Compatibility of Avro dates and times with BigQuery?

BigQuery generally does a good job of loading Avro data, but "bq load" is having a lot of trouble with timestamps and other date/time fields that use the Avro logicalType attribute.

  1. My data with Avro type timestamp-millis is mangled when BigQuery TIMESTAMP interprets them as microsecond timestamps (off by 1000).
  2. A timestamp-micros integer that can load into TIMESTAMP becomes INVALID in a BigQuery DATETIME. I can't find an explanation of what would be valid at https://cloud.google.com/bigquery/docs/reference/standard-sql/data-types
  3. Strings in ISO8601 format can't load into TIMESTAMP or DATETIME (Incompatible types error) but I think BigQuery would support that if I was loading plain JSON.
  4. Avro "date" type fails to load into DATE (also Incompatible types).

I guess I could workaround these problems by always loading the data into temporary fields and using queries to CAST or transform them to additional fields, but that doesn't scale or support schema evolution or stream nicely. Producing data in Avro with well-defined schemas is supposed to avoid that extra step of transforming data again for different consumers.

Is BigQuery really this incompatible with Avro dates and times? (or am I doing something dumb)

Or is "bq load" the problem here? Is there a better way to load Avro data?

like image 355
Kevin Avatar asked Dec 08 '22 20:12

Kevin


2 Answers

Update: This feature is now supported, follow issuetracker.google.com/35905894 for more information.

As Hua said, Avro Logical Types are not supported in BigQuery but the supported way to load Avro data with timestamps is by using the LONG Avro type to load data into an existing BigQuery table that has a TIMESTAMP column. Also, the value should be microseconds (not seconds or milliseconds) from EPOCH. For example, the Avro file below has the a LONG field with value 1408452095000000 which will represent "2014-08-19 12:41:35".

The Avro file's schema:

% avro-tools getschema ~/dataset/simple_timestamp.avro
{
  "type" : "record",
  "name" : "FullName",
  "fields" : [ {
    "name" : "t",
    "type" : "long"
  } ]
}

Example of loading an Avro file to a table with a Timestamp field:

bq mk --schema t:TIMESTAMP -t vimota.simple_timestamp
bq load --source_format=AVRO vimota.simple_timestamp ~/dataset/simple_timestamp.avro
bq head vimota.simple_timestamp:

+---------------------+
|          t          |
+---------------------+
| 2014-08-19 12:41:35 |
+---------------------+
like image 71
Victor Mota Avatar answered Jan 07 '23 19:01

Victor Mota


Native understanding for Avro Logical Types is now available publicly for all BigQuery users. Please refer to the documentation page here for more details: https://cloud.google.com/bigquery/docs/loading-data-cloud-storage-avro#logical_types

like image 45
Hua Zhang Avatar answered Jan 07 '23 17:01

Hua Zhang