Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to test if a webpage is an image

Sorry that the title wasn't very clear, basically I have a list with a whole series of url's, with the intention of downloading the ones that are pictures. Is there anyway to check if the webpage is an image, so that I can just skip over the ones that arent?

Thanks in advance

like image 729
user3662991 Avatar asked Mar 14 '15 09:03

user3662991


People also ask

How do you check if an URL is an image?

On your computer, go to images.google.com. Search for the image. In Images results, click the image. At the top of your browser, click the address bar to select the entire URL.

What are image tests?

Imaging tests use different forms of energy, such as x-rays (high-energy radiation), ultrasound (high-energy sound waves), radio waves, and radioactive substances. They may be used to help diagnose disease, plan treatment, or find out how well treatment is working.


3 Answers

You can use requests module. Make a head request and check the content type. Head request will not download the response body.

import requests
response = requests.head(url)
print response.headers.get('content-type')
like image 59
salmanwahed Avatar answered Oct 14 '22 22:10

salmanwahed


There is no reliable way. But you could find a solution that might be "good enough" in your case.

You could look at the file extension if it is present in the url e.g., .png, .jpg could indicate an image:

>>> import os
>>> name = url2filename('http://example.com/a.png?q=1')
>>> os.path.splitext(name)[1]
'.png'
>>> import mimetypes
>>> mimetypes.guess_type(name)[0]
'image/png'

where url2filename() function is defined here.

You could inspect Content-Type http header:

>>> import urllib.request
>>> r = urllib.request.urlopen(url) # make HTTP GET request, read headers
>>> r.headers.get_content_type()
'image/png'
>>> r.headers.get_content_maintype()
'image'
>>> r.headers.get_content_subtype()
'png'

You could check the very beginning of the http body for magic numbers indicating image files e.g., jpeg may start with b'\xff\xd8\xff\xe0' or:

>>> prefix = r.read(8)
>>> prefix # .png image
b'\x89PNG\r\n\x1a\n'

As @pafcu suggested in the answer to the related question, you could use imghdr.what() function:

>>> import imghdr
>>> imghdr.what(None, b'\x89PNG\r\n\x1a\n')
'png'
like image 34
jfs Avatar answered Oct 14 '22 22:10

jfs


You can use mimetypes https://docs.python.org/3.0/library/mimetypes.html

import urllib
from mimetypes import guess_extension

url="http://example.com/image.png"
source = urllib.urlopen(url)
extension = guess_extension(source.info()['Content-Type'])
print extension

this will return "png"

like image 44
user2314737 Avatar answered Oct 14 '22 22:10

user2314737