Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python Doesn't Have Permission To Access On This Server / Return City/State from ZIP

What I'm trying to do is retrieve the city and state from a zip code. Here's what I have so far:

def find_city(zip_code):
    zip_code = str(zip_code)
    url = 'http://www.unitedstateszipcodes.org/' + zip_code
    source_code = requests.get(url)
    plain_text = source_code.text
    index = plain_text.find(">")
    soup = BeautifulSoup(plain_text, "lxml")
    stuff = soup.findAll('div', {'class': 'col-xs-12 col-sm-6 col-md-12'})

I also tried using id="zip-links", but that didn't work. But here's the thing: when I run print(plain_text) I get the following:

<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title>403 Forbidden</title>
</head><body>
<h1>Forbidden</h1>
<p>You don't have permission to access /80123
on this server.<br />
</p>
</body></html>

So I guess my question is this: is there a better way to get a city and state from a zip code? Or is there a reason that unitedstateszipcodes.gov isn't cooperating. After all, it is easy enough to see the source and tags and text. Thank you

like image 235
MANA624 Avatar asked Dec 25 '15 23:12

MANA624


2 Answers

You need to add a user-agent:

headers = {"User-agent":"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.80 Safari/537.36"}
def find_city(zip_code):
    zip_code = str(zip_code)
    url = 'http://www.unitedstateszipcodes.org/' + zip_code
    source_code = requests.get(url,headers=headers)

Once you do, the response is 200 and you get the source:

In [8]:  url = 'http://www.unitedstateszipcodes.org/54115'

In [9]: headers = {"User-agent":"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.80 Safari/537.36"}

In [10]:  url = 'http://www.unitedstateszipcodes.org/54115'
In [11]: source_code = requests.get(url,headers=headers)
In [12]: source_code.status_code
Out[12]: 200

If you want the details it is easily parsed:

In [59]:  soup = BeautifulSoup(plain_text, "lxml")

In [60]: soup.find('div', id='zip-links').h3.text
Out[60]: 'ZIP Code: 54115'

In [61]: soup.find('div', id='zip-links').h3.next_sibling.strip()
Out[61]: 'De Pere, WI 54115'

In [62]:  url = 'http://www.unitedstateszipcodes.org/90210'

In [63]: source_code = requests.get(url,headers=headers).text

In [64]:  soup = BeautifulSoup(source_code, "lxml")

In [65]: soup.find('div', id='zip-links').h3.text
Out[66]: 'ZIP Code: 90210'

In [70]: soup.find('div', id='zip-links').h3.next_sibling.strip()
Out[70]: 'Beverly Hills, CA 90210'

You could also store each result in a database and first try to do a lookup in the database.

like image 62
Padraic Cunningham Avatar answered Sep 28 '22 00:09

Padraic Cunningham


I think you are taking a longer route to solve an easy problem!

Try pyzipcode

>>> from pyzipcode import ZipCodeDatabase
>>> zcdb = ZipCodeDatabase()
>>> zipcode = zcdb[54115]
>>> zipcode.zip
u'54115'
>>> zipcode.city
u'De Pere'
>>> zipcode.state
u'WI'
>>> zipcode.longitude
-88.078959999999995
>>> zipcode.latitude
44.42042
>>> zipcode.timezone
-6
like image 34
python Avatar answered Sep 28 '22 02:09

python