Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

urlopen error [Errno 11001] getaddrinfo failed?

Tags:

python-3.x

Hello everyone I am a beginner programmer in language Python and I need help.

this is my code in Python, it gives an error, please help to fix

urllib.error.URLError: urlopen error [Errno 11001] getaddrinfo failed

Python:

# -*- coding: utf-8 -*-

import urllib.request
from lxml.html import parse

WEBSITE = 'http://allrecipes.com'

URL_PAGE = 'http://allrecipes.com/recipes/110/appetizers-and-snacks/deviled-eggs/?page='

START_PAGE = 1
END_PAGE = 5

def correct_str(s):
    return s.encode('utf-8').decode('ascii', 'ignore').strip()

for i in range(START_PAGE, END_PAGE+1):
    URL = URL_PAGE + str(i)
    HTML = urllib.request.urlopen(URL)

    page = parse(HTML).getroot()

    for elem in page.xpath('//*[@id="grid"]/article[not(contains(@class, "video-card"))]/a[1]'):
        href = WEBSITE + elem.get('href')
        title = correct_str(elem.find('h3').text)

        recipe_page = parse(urllib.request.urlopen(href)).getroot()
        print(correct_str(href))
        photo_url = recipe_page.xpath('//img[@class="rec-photo"]')[0].get('src')

        print('\nName:  |', title)
        print('Photo: |', photo_url)

This into command prompt: python I get this error:

Traceback (most recent call last):
http://allrecipes.com/recipe/236225/crab-stuffed-deviled-eggs/
  File "C:\Users\Ivan\AppData\Local\Programs\Python\Python35-32\lib\urllib\request.py", line 1240, in do_open

    h.request(req.get_method(), req.selector, req.data, headers)
Name:  | Crab-Stuffed Deviled Eggs
  File "C:\Users\Ivan\AppData\Local\Programs\Python\Python35-32\lib\http\client.py", line 1083, in request
Photo: | http://images.media-allrecipes.com/userphotos/720x405/1091564.jpg
    self._send_request(method, url, body, headers)
  File "C:\Users\Ivan\AppData\Local\Programs\Python\Python35-32\lib\http\client.py", line 1128, in _send_request
    self.endheaders(body)
  File "C:\Users\Ivan\AppData\Local\Programs\Python\Python35-32\lib\http\client.py", line 1079, in endheaders
    self._send_output(message_body)
  File "C:\Users\Ivan\AppData\Local\Programs\Python\Python35-32\lib\http\client.py", line 911, in _send_output
    self.send(msg)
  File "C:\Users\Ivan\AppData\Local\Programs\Python\Python35-32\lib\http\client.py", line 854, in send
    self.connect()
  File "C:\Users\Ivan\AppData\Local\Programs\Python\Python35-32\lib\http\client.py", line 826, in connect
    (self.host,self.port), self.timeout, self.source_address)
  File "C:\Users\Ivan\AppData\Local\Programs\Python\Python35-32\lib\socket.py", line 693, in create_connection
    for res in getaddrinfo(host, port, 0, SOCK_STREAM):
  File "C:\Users\Ivan\AppData\Local\Programs\Python\Python35-32\lib\socket.py", line 732, in getaddrinfo
    for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
socket.gaierror: [Errno 11001] getaddrinfo failed

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:/Users/Ivan/Dropbox/parser/test.py", line 27, in <module>
    recipe_page = parse(urllib.request.urlopen(href)).getroot()
  File "C:\Users\Ivan\AppData\Local\Programs\Python\Python35-32\lib\urllib\request.py", line 162, in urlopen
    return opener.open(url, data, timeout)
  File "C:\Users\Ivan\AppData\Local\Programs\Python\Python35-32\lib\urllib\request.py", line 465, in open
    response = self._open(req, data)
  File "C:\Users\Ivan\AppData\Local\Programs\Python\Python35-32\lib\urllib\request.py", line 483, in _open
    '_open', req)
  File "C:\Users\Ivan\AppData\Local\Programs\Python\Python35-32\lib\urllib\request.py", line 443, in _call_chain
    result = func(*args)
  File "C:\Users\Ivan\AppData\Local\Programs\Python\Python35-32\lib\urllib\request.py", line 1268, in http_open
    return self.do_open(http.client.HTTPConnection, req)
  File "C:\Users\Ivan\AppData\Local\Programs\Python\Python35-32\lib\urllib\request.py", line 1242, in do_open
    raise URLError(err)
urllib.error.URLError: <urlopen error [Errno 11001] getaddrinfo failed>

Process finished with exit code 1
like image 955
Kill Noise Avatar asked Jun 06 '16 21:06

Kill Noise


1 Answers

I'll attempt to explain three main ways to dig into a programming problem:

(1) Use a debugger. You could walk through your code and examine variables before they are used and before they throw an exception. Python comes with pdb. In this problem you would step through the code and print out the href before urlopen().

(2) Assertions. Use Python's assert to assert assumptions in your code. You could, for example, assert not href.startswith('http')

(3) Logging. Log relevant variables before they are used. This is what I used:

I added the following to your code...

href = WEBSITE + elem.get('href')                                       
print(href)     

And got...

Photo: | http://images.media-allrecipes.com/userphotos/720x405/1091564.jpg
http://allrecipes.comhttp://dish.allrecipes.com/how-to-boil-an-egg/

From here you can see your getaddrinfo problem: Your system is trying to open a url at a host named allrecipes.comhttp.

This looks to be a problem based upon your assumption that WEBSITE must be prepended to every href you pull from the html.

You can handle the case of an absolute vs relative href with something like this and a function to determine if the url is absolute:

import urlparse
def is_absolute(url):
    # See https://stackoverflow.com/questions/8357098/how-can-i-check-if-a-url-is-absolute-using-python
    return bool(urlparse.urlparse(url).netloc)

href = elem.get('href')                                                 
if not is_absolute(href):
    href = WEBSITE + href                                               
like image 173
rrauenza Avatar answered Oct 16 '22 06:10

rrauenza