Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python implementation of avro slow?

Tags:

python

avro

I'm reading some data from avro file using the avro library. It takes about a minute to load 33K objects from the file. This seem very slow to me, specially with the Java version reading the same file in about 1sec.

Here is the code, am I doing something wrong?

import avro.datafile
import avro.io
from time import time

def load(filename):
    fo = open(filename, "rb")
    reader = avro.datafile.DataFileReader(fo, avro.io.DatumReader())
    for i, record in enumerate(reader):
        pass

    return i + 1

def main(argv=None):
    import sys
    from argparse import ArgumentParser

    argv = argv or sys.argv

    parser = ArgumentParser(description="Read avro file")


    start = time()
    num_records = load("events.avro")
    end = time()

    print("{0} records in {1} seconds".format(num_records, end - start))

if __name__ == "__main__":
    main()
like image 421
lazy1 Avatar asked May 05 '11 21:05

lazy1


1 Answers

It appears there is a python package called fastavro that is a fast Cython implementation, but is less feature-complete.

https://bitbucket.org/tebeka/fastavro

like image 60
Uri Laserson Avatar answered Oct 28 '22 14:10

Uri Laserson