Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I catch a 404 error in urllib? (python 3)

I've been reading tens of examples for similar issues, but I can't get any of the solutions I've seen or their variants to run. I'm screen scraping, and I just want to ignore 404 errors (skip the pages). I get

'AttributeError: 'module' object has no attribute 'HTTPError'.

I've tried 'URLError' as well. I've seen the near identical syntax accepted as working answers. Any ideas? Here's what I've got:

import urllib
import datetime
from bs4 import BeautifulSoup

class EarningsAnnouncement:
    def __init__(self, Company, Ticker, EPSEst, AnnouncementDate, AnnouncementTime):
        self.Company = Company
        self.Ticker = Ticker
        self.EPSEst = EPSEst
        self.AnnouncementDate = AnnouncementDate
        self.AnnouncementTime = AnnouncementTime

webBaseStr = 'http://biz.yahoo.com/research/earncal/'
earningsAnnouncements = []
dayVar = datetime.date.today()
for dte in range(1, 30):
    currDay = str(dayVar.day)
    currMonth = str(dayVar.month)
    currYear = str(dayVar.year)
    if (len(currDay)==1): currDay = '0' + currDay
    if (len(currMonth)==1): currMonth = '0' + currMonth
    dateStr = currYear + currMonth + currDay
    webString = webBaseStr + dateStr + '.html'
    try:
        #with urllib.request.urlopen(webString) as url: page = url.read()
        page = urllib.request.urlopen(webString).read()
        soup = BeautifulSoup(page)
        tbls = soup.findAll('table')
        tbl6= tbls[6]
        rows = tbl6.findAll('tr')
        rows = rows[2:len(rows)-1]
        for earn in rows:
            earningsAnnouncements.append(EarningsAnnouncement(earn.contents[0], earn.contents[1],
            earn.contents[3], dateStr, earn.contents[3]))
    except urllib.HTTPError as err:
        if err.code == 404:
            continue
        else:
            raise

    dayVar += datetime.timedelta(days=1)
like image 654
StatsViaCsh Avatar asked Jul 20 '13 20:07

StatsViaCsh


1 Answers

It looks like for urllib (not urllib2) that the exception is urllib.error.HTTPError, not urllib.HTTPError. See the documentation for more information.

like image 143
Kyle Avatar answered Oct 13 '22 22:10

Kyle