I'm trying to download a zip file ("tl_2008_01001_edges.zip") from an ftp census site using urllib. What form is the zip file in when I get it and how do I save it?
I'm fairly new to Python and don't understand how urllib works.
This is my attempt:
import urllib, sys
zip_file = urllib.urlretrieve("ftp://ftp2.census.gov/geo/tiger/TIGER2008/01_ALABAMA/Autauga_County/", "tl_2008_01001_edges.zip")
If I know the list of ftp folders (or counties in this case), can I run through the ftp site list using the glob function?
Thanks.
The urllib. request module defines functions and classes which help in opening URLs (mostly HTTP) in a complex world — basic and digest authentication, redirections, cookies and more. See also. The Requests package is recommended for a higher-level HTTP client interface.
The urllib module in Python 3 allows you access websites via your program. This opens up as many doors for your programs as the internet opens up for you. urllib in Python 3 is slightly different than urllib2 in Python 2, but they are mostly the same.
parse — Parse URLs into components. This module defines a standard interface to break Uniform Resource Locator (URL) strings up in components (addressing scheme, network location, path etc.), to combine the components back into a URL string, and to convert a “relative URL” to an absolute URL given a “base URL.”
urllib. request is part of the standard library and does not need installing.
True, if you want to avoid adding any dependencies, urllib is available. But note that even the Python official documentation recommends the requests library: "The Requests package is recommended for a higher-level HTTP client interface."
Use urllib2.urlopen()
for the zip file data and directory listing.
To process zip files with the zipfile
module, you can write them to a disk file which is then passed to the zipfile.ZipFile
constructor.
Retrieving the data is straightforward using read()
on the file-like object returned
by urllib2.urlopen()
.
Fetching directories:
>>> files = urllib2.urlopen('ftp://ftp2.census.gov/geo/tiger/TIGER2008/01_ALABAMA/').read().splitlines()
>>> for l in files[:4]: print l
...
drwxrwsr-x 2 0 4009 4096 Nov 26 2008 01001_Autauga_County
drwxrwsr-x 2 0 4009 4096 Nov 26 2008 01003_Baldwin_County
drwxrwsr-x 2 0 4009 4096 Nov 26 2008 01005_Barbour_County
drwxrwsr-x 2 0 4009 4096 Nov 26 2008 01007_Bibb_County
>>>
Or, splitting for directory names:
>>> for l in files[:4]: print l.split()[-1]
...
01001_Autauga_County
01003_Baldwin_County
01005_Barbour_County
01007_Bibb_County
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With