Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Django: check if an image exists at some particular url

Just wondering is there any way I can check whether the url links to a valid image or not in Django.

like image 901
Rajat Saxena Avatar asked Jun 29 '12 11:06

Rajat Saxena


2 Answers

Using requests and PIL to verify that it's actually a valid image:

>>> import requests
>>> from PIL import Image
>>> from StringIO import StringIO
>>> r = requests.get('http://cdn.sstatic.net/stackoverflow/img/sprites.png')
>>> im = Image.open(StringIO(r.content))
>>> im
<PIL.PngImagePlugin.PngImageFile image mode=RGBA size=238x1073 at 0x2845EA8>
like image 129
jterrace Avatar answered Sep 21 '22 20:09

jterrace


Here's a failsafe method. First, parse the url to get the domain and the rest.

>>> from urllib.parse import urlparse
>>> url = 'http://example.com/random/folder/path.html'
>>> parse_object = urlparse(url)
>>> parse_object.netloc
'example.com'
>>> parse_object.path
'/random/folder/path.html'
>>> parse_object.scheme
'http'

Now, use the above information to get the content type. Use the parse_object.netloc instead of sstatic.net, and the parse_object.path instead of the hardcoded path.

>>> import httplib
>>> conn = httplib.HTTPConnection("sstatic.net")
>>> conn.request("HEAD", "/stackoverflow/img/favicon.ico")
>>> res = conn.getresponse()
>>> print res.getheaders()
[('content-length', '1150'), ('x-powered-by', 'ASP.NET'), ('accept-ranges', 'bytes'),         ('last-modified', 'Mon, 02 Aug 2010 06:04:04 GMT'), ('etag', '"2187d82832cb1:0"'), ('cache-control', 'max-age=604800'), ('date', 'Sun, 12 Sep 2010 13:39:26 GMT'), ('content-type', 'image/x-icon')]

This tells you it's an image (image/* mime-type) of 1150 bytes. Enough information for you to decide if you want to fetch the full resource.

EDIT

For shortened urls, like http://goo.gl/IwruD which points to http://ubuntu.icafebusiness.com/images/ubuntugui2.jpg, in the response that you get, there's an additional parameter called 'location'.

Here's what I'm talking about:

>>> import httplib
>>> conn = httplib.HTTPConnection("goo.gl")
>>> conn.request("HEAD", "/IwruD")
>>> res = conn.getresponse()
>>> print res.getheaders()
[('x-xss-protection', '1; mode=block'),
 ('x-content-type-options', 'nosniff'),
 ('transfer-encoding', 'chunked'),
 ('age', '64'),
 ('expires', 'Mon, 01 Jan 1990 00:00:00 GMT'),
 ('server', 'GSE'),
 ('location', 'http://ubuntu.icafebusiness.com/images/ubuntugui2.jpg'),
 ('pragma', 'no-cache'),
 ('cache-control', 'no-cache, no-store, max-age=0, must-revalidate'),
 ('date', 'Sat, 30 Jun 2012 08:52:15 GMT'),
 ('x-frame-options', 'SAMEORIGIN'),
 ('content-type', 'text/html; charset=UTF-8')]

While in the direct url, you wouldn't find it.

>>> import httplib
>>> conn = httplib.HTTPConnection("ubuntu.icafebusiness.com")
>>> conn.request("HEAD", "/images/ubuntugui2.jpg")
>>> res = conn.getresponse()
>>> print res.getheaders()
[('content-length', '78603'), ('accept-ranges', 'bytes'), ('server', 'Apache'), ('last-modified', 'Sat, 16 Aug 2008 01:36:17 GMT'), ('etag', '"1fb8277-1330b-45489c3ad2640"'), ('date', 'Sat, 30 Jun 2012 08:55:46 GMT'), ('content-type', 'image/jpeg')]

You can look for that using a simple code:

>>> r = res.getheaders()
>>> redirected = False
>>> for e in r:
>>>     if(e[0] == 'location'):
>>>         redirected = e
>>>
>>> if(redirected != False):
>>>     print redirected[1]
'http://ubuntu.icafebusiness.com/images/ubuntugui2.jpg'
like image 30
Sidd Avatar answered Sep 23 '22 20:09

Sidd