I am trying to read avro files using python.
I installed Apache Avro successfully (I think I did because I am able to "import avro" in the python shell) following the instruction here
https://avro.apache.org/docs/1.8.1/gettingstartedpython.html
However, when I try to read avro files following the code in the above instruction. I keep receiving errors when importing avro related stuff.
>>> import avro.schema
Traceback (most recent call last):
File "<pyshell#6>", line 1, in <module>
import avro.schema
File "<frozen importlib._bootstrap>", line 969, in _find_and_load
File "<frozen importlib._bootstrap>", line 954, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 896, in _find_spec
File "<frozen importlib._bootstrap_external>", line 1139, in find_spec
File "<frozen importlib._bootstrap_external>", line 1115, in _get_spec
File "<frozen importlib._bootstrap_external>", line 1096, in _legacy_get_spec
File "<frozen importlib._bootstrap>", line 444, in spec_from_loader
File "<frozen importlib._bootstrap_external>", line 533, in spec_from_file_location
File "I:\Program Files\lib\site-packages\avro-_avro_version_-py3.5.egg\avro\schema.py", line 340
except Exception, e:
^
SyntaxError: invalid syntax
>>> from avro.datafile import DataFileReader, DataFileWriter
Traceback (most recent call last):
File "I:\Program Files\lib\site-packages\avro-_avro_version_-py3.5.egg\avro\datafile.py", line 21, in <module>
from cStringIO import StringIO
ImportError: No module named 'cStringIO'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<pyshell#7>", line 1, in <module>
from avro.datafile import DataFileReader, DataFileWriter
File "I:\Program Files\lib\site-packages\avro-_avro_version_-py3.5.egg\avro\datafile.py", line 23, in <module>
from StringIO import StringIO
ImportError: No module named 'StringIO'
>>> from avro.io import DatumReader, DatumWriter
Traceback (most recent call last):
File "<pyshell#19>", line 1, in <module>
from avro.io import DatumReader, DatumWriter
File "<frozen importlib._bootstrap>", line 969, in _find_and_load
File "<frozen importlib._bootstrap>", line 954, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 896, in _find_spec
File "<frozen importlib._bootstrap_external>", line 1139, in find_spec
File "<frozen importlib._bootstrap_external>", line 1115, in _get_spec
File "<frozen importlib._bootstrap_external>", line 1096, in _legacy_get_spec
File "<frozen importlib._bootstrap>", line 444, in spec_from_loader
File "<frozen importlib._bootstrap_external>", line 533, in spec_from_file_location
File "I:\Program Files\lib\site-packages\avro-_avro_version_-py3.5.egg\avro\io.py", line 200
bits = (((ord(self.read(1)) & 0xffL)) |
^
SyntaxError: invalid syntax
So did I install avro successfully? Why am I receiving those errors? I am using python 3.5.2 on windows 7.
Edited I fixed the issue following the suggestion by Stephane Martin. Then I try to read avro files into python. I have a bunch of avros in a directory which has already been set as the right path in the python. Here is my code
import avro.schema
from avro.datafile import DataFileReader, DataFileWriter
from avro.io import DatumReader, DatumWriter
reader = DataFileReader(open("part-00000-of-01733.avro", "r"), DatumReader())
for user in reader:
print (user)
reader.close()
And it returns the error
Traceback (most recent call last):
File "I:\DJ data\read avro.py", line 5, in <module>
reader = DataFileReader(open("part-00000-of-01733.avro", "r"), DatumReader())
File "I:\Program Files\lib\site-packages\avro_python3-1.8.1-py3.5.egg\avro\datafile.py", line 349, in __init__
self._read_header()
File "I:\Program Files\lib\site-packages\avro_python3-1.8.1-py3.5.egg\avro\datafile.py", line 459, in _read_header
META_SCHEMA, META_SCHEMA, self.raw_decoder)
File "I:\Program Files\lib\site-packages\avro_python3-1.8.1-py3.5.egg\avro\io.py", line 525, in read_data
return self.read_record(writer_schema, reader_schema, decoder)
File "I:\Program Files\lib\site-packages\avro_python3-1.8.1-py3.5.egg\avro\io.py", line 725, in read_record
field_val = self.read_data(field.type, readers_field.type, decoder)
File "I:\Program Files\lib\site-packages\avro_python3-1.8.1-py3.5.egg\avro\io.py", line 515, in read_data
return self.read_fixed(writer_schema, reader_schema, decoder)
File "I:\Program Files\lib\site-packages\avro_python3-1.8.1-py3.5.egg\avro\io.py", line 568, in read_fixed
return decoder.read(writer_schema.size)
File "I:\Program Files\lib\site-packages\avro_python3-1.8.1-py3.5.egg\avro\io.py", line 170, in read
input_bytes = self.reader.read(n)
File "I:\Program Files\lib\encodings\cp1252.py", line 23, in decode
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x90 in position 863: character maps to
I am indeed aware that in the example in the instruction, a schema is created first. But what is a avsc file? How shall I create it and the corresponding schema in my case?
Even if you install the correct Avro package for your Python environment, the API differs between avro and avro-python3 . As an example, for Python 2 (with avro package), you need to use the function avro. schema. parse but for Python 3 (with avro-python3 package), you need to use the function avro.
An easy way to explore Avro files is by using the Avro Tools jar from Apache.
Avro is a file type that is often use because it is highly compact and fast to read. It is used by Apache Kafka, Apache Hadoop, and other data intensive applications. Boomi integrations are not currently able to read and write avro data. Although, this is possible with Boomi Data Catalog and Prep.
With recent versions of the avro
package, this should no longer be an issue.
Original answer:
When installing through pip
or a similar package manager: install the avro-python3
package instead of just avro
.
Use the Avro distribution for python 3, not the one for python 2.
http://apache.mediamirrors.org/avro/stable/py3/
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With