In the following code how to test for if the type is url or if the type is an image
for dictionaries in d_dict:
type = dictionaries.get('type')
if (type starts with http or https):
logging.debug("type is url")
else if type ends with .jpg or .png or .gif
logging.debug("type is image")
else:
logging.debug("invalid type")
You cannot tell what type a resource is purely from its URL. It is perfectly valid to have an GIF file at a URL without a .gif
file extension, or with a misleading file extension like .txt
. In fact it is quite likely, now that URL-rewriting is popular, that you'll get image URLs with no file extension at all.
It is the Content-Type
HTTP response header that governs what type a resource on the web is, so the only way you can find out for sure is to fetch the resource and see what response you get. You can do this by looking at the headers returned by urllib.urlopen(url).headers
, but that actually fetches the file itself. For efficiency you may prefer to make HEAD request that doesn't transfer the whole file:
import urllib2
class HeadRequest(urllib2.Request):
def get_method(self):
return 'HEAD'
response= urllib2.urlopen(HeadRequest(url))
maintype= response.headers['Content-Type'].split(';')[0].lower()
if maintype not in ('image/png', 'image/jpeg', 'image/gif'):
logging.debug('invalid type')
If you must try to sniff type based on the file extension in a URL path part (eg because you don't have a net connection), you should parse the URL with urlparse
first to remove any ?query
or #fragment
part, so that http://www.example.com/image.png?blah=blah&foo=.txt
doesn't confuse it. Also you should consider using mimetypes
to map the filename to a Content-Type, so you can take advantage of its knowledge of file extensions:
import urlparse, mimetypes
maintype= mimetypes.guess_type(urlparse.urlparse(url).path)[0]
if maintype not in ('image/png', 'image/jpeg', 'image/gif'):
logging.debug('invalid type')
(eg. so that alternative extensions are also allowed. You should at the very least allow .jpeg
for image/jpeg
files, as well as the mutant three-letter Windows variant .jpg
.)
Use regular expressions.
import re
r_url = re.compile(r"^https?:")
r_image = re.compile(r".*\.(jpg|png|gif)$")
for dictionaries in d_dict:
type = dictionaries.get('type')
if r_url.match(type):
logging.debug("type is url")
else if r_image.match(type)
logging.debug("type is image")
else:
logging.debug("invalid type")
Two remarks: type
is a builtin, and images could be loaded from an URL too.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With