Infinite loop when streaming a .gz file from S3 using boto

Question

I'm attempting to stream a .gz file from S3 using boto and iterate over the lines of the unzipped text file. Mysteriously, the loop never terminates; when the entire file has been read, the iteration restarts at the beginning of the file.

Let's say I create and upload an input file like the following:

> echo '{"key": "value"}' > foo.json
> gzip -9 foo.json
> aws s3 cp foo.json.gz s3://my-bucket/my-location/

and I run the following Python script:

import boto
import gzip

connection = boto.connect_s3()
bucket = connection.get_bucket('my-bucket')
key = bucket.get_key('my-location/foo.json.gz')
gz_file = gzip.GzipFile(fileobj=key, mode='rb')
for line in gz_file:
    print(line)

The result is:

b'{"key": "value"}
'
b'{"key": "value"}
'
b'{"key": "value"}
'
...forever...

Why is this happening? I think there must be something very basic that I am missing.

zweiterlinde · Accepted Answer

Ah, boto. The problem is that the read method redownloads the key if you call it after the key has been completely read once (compare the read and next methods to see the difference).

This isn't the cleanest way to do it, but it solves the problem:

import boto
import gzip

class ReadOnce(object):
    def __init__(self, k):
        self.key = k
        self.has_read_once = False

   def read(self, size=0):
       if self.has_read_once:
           return b''
       data = self.key.read(size)
       if not data:
           self.has_read_once = True
       return data

connection = boto.connect_s3()
bucket = connection.get_bucket('my-bucket')
key = ReadOnce(bucket.get_key('my-location/foo.json.gz'))
gz_file = gzip.GzipFile(fileobj=key, mode='rb')
for line in gz_file:
    print(line)

Infinite loop when streaming a .gz file from S3 using boto

Tags:

python

gzip

amazon-s3

boto

zweiterlinde

1 Answers

zweiterlinde

Recent Activity

Donate For Us

Infinite loop when streaming a .gz file from S3 using boto

Tags:

python

gzip

amazon-s3

boto

zweiterlinde

1 Answers

zweiterlinde

Related questions

Recent Activity

Donate For Us