Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Read Big Query table with Avro schema using beam, python

I changed apache-beam SDK from 2.5.0 to 2.12.0 and got the Avro schema error when reading the table from Big Query in beam using python.

The BQ table has one TIMESTAMP field, others are STRING.

data = pipe \
        | 'read bigquery' >> beam.io.Read(
            beam.io.BigQuerySource(
                dataset=args.dataset_name,
                table=args.table_name,
                use_standard_sql=True))

Error:

SchemaParseException: Type property "[u'null', {u'logicalType': u'timestamp-micros', u'type': u'long'}]" not a valid Avro schema: Union item must be a valid Avro schema: Currently does not support timestamp-micros logical type

Packages installed:

python=2.7.0, apache-beam=2.12.0, avro=1.9.0

like image 265
Marina Avatar asked May 02 '26 22:05

Marina


1 Answers

This is a regression in avro 1.9.0. The issue tracker for this is here: https://issues.apache.org/jira/browse/AVRO-2429

If you are on python 2 you should be able to downgrade to 1.8.2 by doing pip install "avro==1.8.2". If you are on python 3 I believe beam should try using fastavro by default (which should not have the bug you are running into).

like image 180
Scott Avatar answered May 05 '26 11:05

Scott



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!