Downloading large file in python error: Compressed file ended before the end-of-stream marker was reached

Question

I am downloading a compressed file from the internet:

with lzma.open(urllib.request.urlopen(url)) as file:
    for line in file:
        ...

After having downloaded and processed a a large part of the file, I eventually get the error:

File "/usr/lib/python3.4/lzma.py", line 225, in _fill_buffer raise EOFError("Compressed file ended before the " EOFError: Compressed file ended before the end-of-stream marker was reached

I am thinking that it might be caused by an internet connection that drops or the server not responding for some time. If that is the case, is there anyway to make it keep trying, until connection is reestablished, instead of throwing an exception. I don't think it is a problem with the file, as I have manually downloaded many files like it from the same website manually and decompressed it. I have also been able to download and decompress some smaller files with Python. The file I am trying to download has a compressed size of about 20 GB.

Pynchia · Accepted Answer

from the urllib.urlopen docs:

One caveat: the read() method, if the size argument is omitted or negative, may not read until the end of the data stream; there is no good way to determine that the entire stream from a socket has been read in the general case.

Maybe the lzma.open trips on huge size/connection errors/timeout because of the above.

kenorb · Answer

It's probably liblzma bug. As a workaround try adding:

lzma._BUFFER_SIZE = 1023

before calling lzma.open().

Charles D Pantoga · Answer

Have you tried using the requests library? I believe it provides an abstraction over urllib.

The following solution should work for you, but it uses the requests library instead of urllib (but requests > urllib anyway!). Let me know if you prefer to continue using urllib.

import os
import requests
def download(url, chunk_s=1024, fname=None):
    if not fname:
        fname = url.split('/')[-1]
    req = requests.get(url, stream=True)
    with open(fname, 'wb') as fh:
        for chunk in req.iter_content(chunk_size=chunk_s):
            if chunk:
                fh.write(chunk)
    return os.path.join(os.getcwd(), fname)

Downloading large file in python error: Compressed file ended before the end-of-stream marker was reached

Tags:

python

exception

stream

compression

urllib

ClickyButton.com

3 Answers

Pynchia

kenorb

Charles D Pantoga

Recent Activity

Donate For Us

Downloading large file in python error: Compressed file ended before the end-of-stream marker was reached

Tags:

python

exception

stream

compression

urllib

ClickyButton.com

3 Answers

Pynchia

kenorb

Charles D Pantoga

Related questions

Recent Activity

Donate For Us