I want to be able to download a page and all of its associated resources (images, style sheets, script files, etc) using Python. I am (somewhat) familiar with urllib2 and know how to download individual urls, but before I go and start hacking at BeautifulSoup + urllib2 I wanted to be sure that there wasn't already a Python equivalent to "wget --page-requisites http://www.google.com".
Specifically I am interested in gathering statistical information about how long it takes to download an entire web page, including all resources.
Thanks Mark
Requests is a versatile HTTP library in python with various applications. One of its applications is to download a file from web using the file URL. Or download it directly from here and install manually.
Download multiple files in parallel with Python To start, create a function ( download_parallel ) to handle the parallel download. The function ( download_parallel ) will take one argument, an iterable containing URLs and associated filenames (the inputs variable we created earlier).
Websucker? See http://effbot.org/zone/websucker.htm
websucker.py doesn't import css links. HTTrack.com is not python, it's C/C++, but it's a good, maintained, utility for downloading a website for offline browsing.
http://www.mail-archive.com/[email protected]/msg13523.html [issue1124] Webchecker not parsing css "@import url"
Guido> This is essentially unsupported and unmaintaned example code. Feel free to submit a patch though!
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With