Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I fix a ValueError: read of closed file exception?

This simple Python 3 script:

import urllib.request

host = "scholar.google.com"
link = "/scholar.bib?q=info:K7uZdMSvdQ0J:scholar.google.com/&output=citation&hl=en&as_sdt=1,14&ct=citation&cd=0"
url = "http://" + host + link
filename = "cite0.bib"
print(url)
urllib.request.urlretrieve(url, filename)

raises this exception:

Traceback (most recent call last):
  File "C:\Users\ricardo\Desktop\Google-Scholar\BibTex\test2.py", line 8, in <module>
    urllib.request.urlretrieve(url, filename)
  File "C:\Python32\lib\urllib\request.py", line 150, in urlretrieve
    return _urlopener.retrieve(url, filename, reporthook, data)
  File "C:\Python32\lib\urllib\request.py", line 1597, in retrieve
    block = fp.read(bs)
ValueError: read of closed file

I thought this might be a temporary problem, so I added some simple exception handling like so:

import random
import time
import urllib.request

host = "scholar.google.com"
link = "/scholar.bib?q=info:K7uZdMSvdQ0J:scholar.google.com/&output=citation&hl=en&as_sdt=1,14&ct=citation&cd=0"
url = "http://" + host + link
filename = "cite0.bib"
print(url)
while True:
    try:
        print("Downloading...")
        time.sleep(random.randint(0, 5))
        urllib.request.urlretrieve(url, filename)
        break
    except ValueError:
        pass

but this just prints Downloading... ad infinitum.

like image 362
Ricardo Altamirano Avatar asked Jul 17 '12 22:07

Ricardo Altamirano


1 Answers

Your URL return a 403 code error and apparently urllib.request.urlretrieve is not good at detecting all the HTTP errors, because it's using urllib.request.FancyURLopener and this latest try to swallow error by returning an urlinfo instead of raising an error.

About the fix if you still want to use urlretrieve you can override FancyURLopener like this (code included to also show the error):

import urllib.request
from urllib.request import FancyURLopener


class FixFancyURLOpener(FancyURLopener):

    def http_error_default(self, url, fp, errcode, errmsg, headers):
        if errcode == 403:
            raise ValueError("403")
        return super(FixFancyURLOpener, self).http_error_default(
            url, fp, errcode, errmsg, headers
        )

# Monkey Patch
urllib.request.FancyURLopener = FixFancyURLOpener

url = "http://scholar.google.com/scholar.bib?q=info:K7uZdMSvdQ0J:scholar.google.com/&output=citation&hl=en&as_sdt=1,14&ct=citation&cd=0"
urllib.request.urlretrieve(url, "cite0.bib")

Else and this is what i recommend you can use urllib.request.urlopen like so:

fp = urllib.request.urlopen('http://scholar.google.com/scholar.bib?q=info:K7uZdMSvdQ0J:scholar.google.com/&output=citation&hl=en&as_sdt=1,14&ct=citation&cd=0')
with open("citi0.bib", "w") as fo:
    fo.write(fp.read())
like image 195
mouad Avatar answered Sep 28 '22 11:09

mouad