Sorry that the title wasn't very clear, basically I have a list with a whole series of url's, with the intention of downloading the ones that are pictures. Is there anyway to check if the webpage is an image, so that I can just skip over the ones that arent?
Thanks in advance
On your computer, go to images.google.com. Search for the image. In Images results, click the image. At the top of your browser, click the address bar to select the entire URL.
Imaging tests use different forms of energy, such as x-rays (high-energy radiation), ultrasound (high-energy sound waves), radio waves, and radioactive substances. They may be used to help diagnose disease, plan treatment, or find out how well treatment is working.
You can use requests module. Make a head request and check the content type. Head request will not download the response body.
import requests
response = requests.head(url)
print response.headers.get('content-type')
There is no reliable way. But you could find a solution that might be "good enough" in your case.
You could look at the file extension if it is present in the url e.g., .png
, .jpg
could indicate an image:
>>> import os
>>> name = url2filename('http://example.com/a.png?q=1')
>>> os.path.splitext(name)[1]
'.png'
>>> import mimetypes
>>> mimetypes.guess_type(name)[0]
'image/png'
where url2filename()
function is defined here.
You could inspect Content-Type
http header:
>>> import urllib.request
>>> r = urllib.request.urlopen(url) # make HTTP GET request, read headers
>>> r.headers.get_content_type()
'image/png'
>>> r.headers.get_content_maintype()
'image'
>>> r.headers.get_content_subtype()
'png'
You could check the very beginning of the http body for magic numbers indicating image files e.g., jpeg may start with b'\xff\xd8\xff\xe0'
or:
>>> prefix = r.read(8)
>>> prefix # .png image
b'\x89PNG\r\n\x1a\n'
As @pafcu suggested in the answer to the related question, you could use imghdr.what()
function:
>>> import imghdr
>>> imghdr.what(None, b'\x89PNG\r\n\x1a\n')
'png'
You can use mimetypes
https://docs.python.org/3.0/library/mimetypes.html
import urllib
from mimetypes import guess_extension
url="http://example.com/image.png"
source = urllib.urlopen(url)
extension = guess_extension(source.info()['Content-Type'])
print extension
this will return "png"
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With