urlopen error [Errno 11001] getaddrinfo failed?

Question

Hello everyone I am a beginner programmer in language Python and I need help.

this is my code in Python, it gives an error, please help to fix

urllib.error.URLError: urlopen error [Errno 11001] getaddrinfo failed

Python:

# -*- coding: utf-8 -*-

import urllib.request
from lxml.html import parse

WEBSITE = 'http://allrecipes.com'

URL_PAGE = 'http://allrecipes.com/recipes/110/appetizers-and-snacks/deviled-eggs/?page='

START_PAGE = 1
END_PAGE = 5

def correct_str(s):
    return s.encode('utf-8').decode('ascii', 'ignore').strip()

for i in range(START_PAGE, END_PAGE+1):
    URL = URL_PAGE + str(i)
    HTML = urllib.request.urlopen(URL)

    page = parse(HTML).getroot()

    for elem in page.xpath('//*[@id="grid"]/article[not(contains(@class, "video-card"))]/a[1]'):
        href = WEBSITE + elem.get('href')
        title = correct_str(elem.find('h3').text)

        recipe_page = parse(urllib.request.urlopen(href)).getroot()
        print(correct_str(href))
        photo_url = recipe_page.xpath('//img[@class="rec-photo"]')[0].get('src')

        print('
Name:  |', title)
        print('Photo: |', photo_url)

This into command prompt: python I get this error:

Traceback (most recent call last):
http://allrecipes.com/recipe/236225/crab-stuffed-deviled-eggs/
  File "C:\Users\Ivan\AppData\Local\Programs\Python\Python35-32\lib\urllib
equest.py", line 1240, in do_open

    h.request(req.get_method(), req.selector, req.data, headers)
Name:  | Crab-Stuffed Deviled Eggs
  File "C:\Users\Ivan\AppData\Local\Programs\Python\Python35-32\lib\http\client.py", line 1083, in request
Photo: | http://images.media-allrecipes.com/userphotos/720x405/1091564.jpg
    self._send_request(method, url, body, headers)
  File "C:\Users\Ivan\AppData\Local\Programs\Python\Python35-32\lib\http\client.py", line 1128, in _send_request
    self.endheaders(body)
  File "C:\Users\Ivan\AppData\Local\Programs\Python\Python35-32\lib\http\client.py", line 1079, in endheaders
    self._send_output(message_body)
  File "C:\Users\Ivan\AppData\Local\Programs\Python\Python35-32\lib\http\client.py", line 911, in _send_output
    self.send(msg)
  File "C:\Users\Ivan\AppData\Local\Programs\Python\Python35-32\lib\http\client.py", line 854, in send
    self.connect()
  File "C:\Users\Ivan\AppData\Local\Programs\Python\Python35-32\lib\http\client.py", line 826, in connect
    (self.host,self.port), self.timeout, self.source_address)
  File "C:\Users\Ivan\AppData\Local\Programs\Python\Python35-32\lib\socket.py", line 693, in create_connection
    for res in getaddrinfo(host, port, 0, SOCK_STREAM):
  File "C:\Users\Ivan\AppData\Local\Programs\Python\Python35-32\lib\socket.py", line 732, in getaddrinfo
    for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
socket.gaierror: [Errno 11001] getaddrinfo failed

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:/Users/Ivan/Dropbox/parser/test.py", line 27, in <module>
    recipe_page = parse(urllib.request.urlopen(href)).getroot()
  File "C:\Users\Ivan\AppData\Local\Programs\Python\Python35-32\lib\urllib
equest.py", line 162, in urlopen
    return opener.open(url, data, timeout)
  File "C:\Users\Ivan\AppData\Local\Programs\Python\Python35-32\lib\urllib
equest.py", line 465, in open
    response = self._open(req, data)
  File "C:\Users\Ivan\AppData\Local\Programs\Python\Python35-32\lib\urllib
equest.py", line 483, in _open
    '_open', req)
  File "C:\Users\Ivan\AppData\Local\Programs\Python\Python35-32\lib\urllib
equest.py", line 443, in _call_chain
    result = func(*args)
  File "C:\Users\Ivan\AppData\Local\Programs\Python\Python35-32\lib\urllib
equest.py", line 1268, in http_open
    return self.do_open(http.client.HTTPConnection, req)
  File "C:\Users\Ivan\AppData\Local\Programs\Python\Python35-32\lib\urllib
equest.py", line 1242, in do_open
    raise URLError(err)
urllib.error.URLError: <urlopen error [Errno 11001] getaddrinfo failed>

Process finished with exit code 1

rrauenza · Accepted Answer

I'll attempt to explain three main ways to dig into a programming problem:

(1) Use a debugger. You could walk through your code and examine variables before they are used and before they throw an exception. Python comes with pdb. In this problem you would step through the code and print out the href before urlopen().

(2) Assertions. Use Python's assert to assert assumptions in your code. You could, for example, assert not href.startswith('http')

(3) Logging. Log relevant variables before they are used. This is what I used:

I added the following to your code...

href = WEBSITE + elem.get('href')                                       
print(href)

And got...

Photo: | http://images.media-allrecipes.com/userphotos/720x405/1091564.jpg
http://allrecipes.comhttp://dish.allrecipes.com/how-to-boil-an-egg/

From here you can see your getaddrinfo problem: Your system is trying to open a url at a host named allrecipes.comhttp.

This looks to be a problem based upon your assumption that WEBSITE must be prepended to every href you pull from the html.

You can handle the case of an absolute vs relative href with something like this and a function to determine if the url is absolute:

import urlparse
def is_absolute(url):
    # See https://stackoverflow.com/questions/8357098/how-can-i-check-if-a-url-is-absolute-using-python
    return bool(urlparse.urlparse(url).netloc)

href = elem.get('href')                                                 
if not is_absolute(href):
    href = WEBSITE + href

urlopen error [Errno 11001] getaddrinfo failed?

Tags:

python-3.x

Kill Noise

1 Answers

rrauenza

Recent Activity

Donate For Us

urlopen error [Errno 11001] getaddrinfo failed?

Tags:

python-3.x

Kill Noise

1 Answers

rrauenza

Related questions

Recent Activity

Donate For Us