How to test if a webpage is an image

Sorry that the title wasn't very clear, basically I have a list with a whole series of url's, with the intention of downloading the ones that are pictures. Is there anyway to check if the webpage is an image, so that I can just skip over the ones that arent?

Thanks in advance

How do you check if an URL is an image?

On your computer, go to images.google.com. Search for the image. In Images results, click the image. At the top of your browser, click the address bar to select the entire URL.

What are image tests?

Imaging tests use different forms of energy, such as x-rays (high-energy radiation), ultrasound (high-energy sound waves), radio waves, and radioactive substances. They may be used to help diagnose disease, plan treatment, or find out how well treatment is working.

You can use requests module. Make a head request and check the content type. Head request will not download the response body.

import requests
response = requests.head(url)
print response.headers.get('content-type')

There is no reliable way. But you could find a solution that might be "good enough" in your case.

You could look at the file extension if it is present in the url e.g., .png, .jpg could indicate an image:

>>> import os
>>> name = url2filename('http://example.com/a.png?q=1')
>>> os.path.splitext(name)[1]
'.png'
>>> import mimetypes
>>> mimetypes.guess_type(name)[0]
'image/png'

where url2filename() function is defined here.

You could inspect Content-Type http header:

>>> import urllib.request
>>> r = urllib.request.urlopen(url) # make HTTP GET request, read headers
>>> r.headers.get_content_type()
'image/png'
>>> r.headers.get_content_maintype()
'image'
>>> r.headers.get_content_subtype()
'png'

You could check the very beginning of the http body for magic numbers indicating image files e.g., jpeg may start with b'\xff\xd8\xff\xe0' or:

>>> prefix = r.read(8)
>>> prefix # .png image
b'\x89PNG\r\n\x1a\n'

As @pafcu suggested in the answer to the related question, you could use imghdr.what() function:

>>> import imghdr
>>> imghdr.what(None, b'\x89PNG\r\n\x1a\n')
'png'

You can use mimetypes https://docs.python.org/3.0/library/mimetypes.html

import urllib
from mimetypes import guess_extension

url="http://example.com/image.png"
source = urllib.urlopen(url)
extension = guess_extension(source.info()['Content-Type'])
print extension

this will return "png"

How to test if a webpage is an image

Tags:

python

list

python-3.x

urllib

user3662991

People also ask

3 Answers

salmanwahed

jfs

user2314737

Recent Activity

Donate For Us

How to test if a webpage is an image

Tags:

python

list

python-3.x

urllib

user3662991

People also ask

3 Answers

salmanwahed

jfs

user2314737

Related questions

Recent Activity

Donate For Us