I'm learning Python by following Automate the Boring Stuff. This program is supposed to go to http://xkcd.com/ and download all the images for offline viewing.
I'm on version 2.7 and Mac.
For some reason, I'm getting errors like "No schema supplied" and errors with using request.get() itself.
Here is my code:
# Saves the XKCD comic page for offline read import requests, os, bs4, shutil url = 'http://xkcd.com/' if os.path.isdir('xkcd') == True: # If xkcd folder already exists shutil.rmtree('xkcd') # delete it else: # otherwise os.makedirs('xkcd') # Creates xkcd foulder. while not url.endswith('#'): # If there are no more posts, it url will endswith #, exist while loop # Download the page print 'Downloading %s page...' % url res = requests.get(url) # Get the page res.raise_for_status() # Check for errors soup = bs4.BeautifulSoup(res.text) # Dowload the page # Find the URL of the comic image comicElem = soup.select('#comic img') # Any #comic img it finds will be saved as a list in comicElem if comicElem == []: # if the list is empty print 'Couldn\'t find the image!' else: comicUrl = comicElem[0].get('src') # Get the first index in comicElem (the image) and save to # comicUrl # Download the image print 'Downloading the %s image...' % (comicUrl) res = requests.get(comicUrl) # Get the image. Getting something will always use requests.get() res.raise_for_status() # Check for errors # Save image to ./xkcd imageFile = open(os.path.join('xkcd', os.path.basename(comicUrl)), 'wb') for chunk in res.iter_content(10000): imageFile.write(chunk) imageFile.close() # Get the Prev btn's URL prevLink = soup.select('a[rel="prev"]')[0] # The Previous button is first <a rel="prev" href="/1535/" accesskey="p">< Prev</a> url = 'http://xkcd.com/' + prevLink.get('href') # adds /1535/ to http://xkcd.com/ print 'Done!'
Here are the errors:
Traceback (most recent call last): File "/Users/XKCD.py", line 30, in <module> res = requests.get(comicUrl) # Get the image. Getting something will always use requests.get() File "/Library/Python/2.7/site-packages/requests/api.py", line 69, in get return request('get', url, params=params, **kwargs) File "/Library/Python/2.7/site-packages/requests/api.py", line 50, in request response = session.request(method=method, url=url, **kwargs) File "/Library/Python/2.7/site-packages/requests/sessions.py", line 451, in request prep = self.prepare_request(req) File "/Library/Python/2.7/site-packages/requests/sessions.py", line 382, in prepare_request hooks=merge_hooks(request.hooks, self.hooks), File "/Library/Python/2.7/site-packages/requests/models.py", line 304, in prepare self.prepare_url(url, params) File "/Library/Python/2.7/site-packages/requests/models.py", line 362, in prepare_url to_native_string(url, 'utf8'))) requests.exceptions.MissingSchema: Invalid URL '//imgs.xkcd.com/comics/the_martian.png': No schema supplied. Perhaps you meant http:////imgs.xkcd.com/comics/the_martian.png?
The thing is I've been reading the section in the book about the program multiple times, reading the Requests doc, as well as looking at other questions on here. My syntax looks right.
Thanks for your help!
Edit:
This didn't work:
comicUrl = ("http:"+comicElem[0].get('src'))
I thought adding the http: before would get rid of the no schema supplied error.
No schema means you haven't supplied the http://
or https://
supply these and it will do the trick.
Edit: Look at this URL string!:
URL '//imgs.xkcd.com/comics/the_martian.png':
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With