Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

python - deserialise avro byte logical type decimal to decimal

I am trying to read an an Avro file using the python avro library (python 2). When I use the following code:

import avro.schema
from avro.datafile import DataFileReader, DataFileWriter
from avro.io import DatumReader, DatumWriter, BinaryDecoder
reader = DataFileReader(open("filename.avro", "rb"), DatumReader())
schema = reader.meta

Then it reads every column correctly, except for one which remains as bytes, rather than the expected decimal values.

How can I convert this column to the expected decimal values? I notice that the file's metadata identifies the column as 'type' : 'bytes', but 'logicalType' :'decimal'

I post below the metadata for this column, as well as the byte values (expected actual values are all multiples of 1,000 less than 25,000. The file was created using Kafka.

Metadata:

 {
                            "name": "amount",
                            "type": {
                                "type": "bytes",
                                "scale": 8,
                                "precision": 20,
                                "connect.version": 1,
                                "connect.parameters": {
                                    "scale": "8",
                                    "connect.decimal.precision": "20"
                                },
                                "connect.name": "org.apache.kafka.connect.data.Decimal",
                                "logicalType": "decimal"
                            }
                        }

Byte values:

'E\xd9d\xb8\x00'
'\x00\xe8\xd4\xa5\x10\x00'
'\x01\x17e\x92\xe0\x00'
'\x01\x17e\x92\xe0\x00'

Expected values:

3,000.00
10,000.00
12,000.00
5,000.00

I need to use this within a Lambda function deployed on AWS, so cannot use fast_avro, or other libraries using C rather than pure Python.

See links below: https://pypi.org/project/fastavro/ https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-python-libraries.html

like image 206
oli5679 Avatar asked Sep 02 '25 15:09

oli5679


2 Answers

To do this you will need to use the fastavro library. Both the avro and avro-python3 libraries do not support logical types at the time of posting this.

like image 129
Scott Avatar answered Sep 05 '25 15:09

Scott


You can use this to decode the byte string into decimal. This pads the value to the next highest byte structure so all possible values will fit.

import struct
from decimal import Decimal

def decode_decimal(value, num_places):
    value_size = len(value)
    for fmt in ('>b', '>h', '>l', '>q'):
        fmt_size = struct.calcsize(fmt)
        if fmt_size >= value_size:
            padding = b'\x00' * (fmt_size - value_size)
            int_value = struct.unpack(fmt, padding + value)[0]
            scale = Decimal('1') / (10 ** num_places)
            return Decimal(int_value) * scale
    raise ValueError('Could not unpack value')

Ex:

>>> decode_decimal(b'\x00\xe8\xd4\xa5\x10\x00', 8)
Decimal('10000.00000000')
>>> decode_decimal(b'\x01\x17e\x92\xe0\x00', 8)
Decimal('12000.00000000')
>>> decode_decimal(b'\xb2\xb4\xe7\x84', 4)  # Negative value
Decimal('-129676.7100')

Refs:

https://avro.apache.org/docs/1.10.2/spec.html#Decimal https://docs.python.org/3/library/struct.html#format-characters

like image 41
Ryan Anguiano Avatar answered Sep 05 '25 14:09

Ryan Anguiano