Get Image size WITHOUT loading image into memory

People also ask

How do I get the size of an image in Python?

open() is used to open the image and then . width and . height property of Image are used to get the height and width of the image.

If you don't care about the image contents, PIL is probably an overkill.

I suggest parsing the output of the python magic module:

>>> t = magic.from_file('teste.png')
>>> t
'PNG image data, 782 x 602, 8-bit/color RGBA, non-interlaced'
>>> re.search('(\d+) x (\d+)', t).groups()
('782', '602')

This is a wrapper around libmagic which read as few bytes as possible in order to identify a file type signature.

Relevant version of script:

https://raw.githubusercontent.com/scardine/image_size/master/get_image_size.py

[update]

Hmmm, unfortunately, when applied to jpegs, the above gives "'JPEG image data, EXIF standard 2.21'". No image size! – Alex Flint

Seems like jpegs are magic-resistant. :-)

I can see why: in order to get the image dimensions for JPEG files, you may have to read more bytes than libmagic likes to read.

Rolled up my sleeves and came with this very untested snippet (get it from GitHub) that requires no third-party modules.

Look, Ma! No deps!

#-------------------------------------------------------------------------------
# Name:        get_image_size
# Purpose:     extract image dimensions given a file path using just
#              core modules
#
# Author:      Paulo Scardine (based on code from Emmanuel VAÏSSE)
#
# Created:     26/09/2013
# Copyright:   (c) Paulo Scardine 2013
# Licence:     MIT
#-------------------------------------------------------------------------------
#!/usr/bin/env python
import os
import struct

class UnknownImageFormat(Exception):
    pass

def get_image_size(file_path):
    """
    Return (width, height) for a given img file content - no external
    dependencies except the os and struct modules from core
    """
    size = os.path.getsize(file_path)

    with open(file_path) as input:
        height = -1
        width = -1
        data = input.read(25)

        if (size >= 10) and data[:6] in ('GIF87a', 'GIF89a'):
            # GIFs
            w, h = struct.unpack("<HH", data[6:10])
            width = int(w)
            height = int(h)
        elif ((size >= 24) and data.startswith('\211PNG\r\n\032\n')
              and (data[12:16] == 'IHDR')):
            # PNGs
            w, h = struct.unpack(">LL", data[16:24])
            width = int(w)
            height = int(h)
        elif (size >= 16) and data.startswith('\211PNG\r\n\032\n'):
            # older PNGs?
            w, h = struct.unpack(">LL", data[8:16])
            width = int(w)
            height = int(h)
        elif (size >= 2) and data.startswith('\377\330'):
            # JPEG
            msg = " raised while trying to decode as JPEG."
            input.seek(0)
            input.read(2)
            b = input.read(1)
            try:
                while (b and ord(b) != 0xDA):
                    while (ord(b) != 0xFF): b = input.read(1)
                    while (ord(b) == 0xFF): b = input.read(1)
                    if (ord(b) >= 0xC0 and ord(b) <= 0xC3):
                        input.read(3)
                        h, w = struct.unpack(">HH", input.read(4))
                        break
                    else:
                        input.read(int(struct.unpack(">H", input.read(2))[0])-2)
                    b = input.read(1)
                width = int(w)
                height = int(h)
            except struct.error:
                raise UnknownImageFormat("StructError" + msg)
            except ValueError:
                raise UnknownImageFormat("ValueError" + msg)
            except Exception as e:
                raise UnknownImageFormat(e.__class__.__name__ + msg)
        else:
            raise UnknownImageFormat(
                "Sorry, don't know how to get information from this file."
            )

    return width, height

[update 2019]

Check out a Rust implementation: https://github.com/scardine/imsz

As the comments allude, PIL does not load the image into memory when calling .open. Looking at the docs of PIL 1.1.7, the docstring for .open says:

def open(fp, mode="r"):
    "Open an image file, without loading the raster data"

There are a few file operations in the source like:

 ...
 prefix = fp.read(16)
 ...
 fp.seek(0)
 ...

but these hardly constitute reading the whole file. In fact .open simply returns a file object and the filename on success. In addition the docs say:

open(file, mode=”r”)

Opens and identifies the given image file.

This is a lazy operation; this function identifies the file, but the actual image data is not read from the file until you try to process the data (or call the load method).

Digging deeper, we see that .open calls _open which is a image-format specific overload. Each of the implementations to _open can be found in a new file, eg. .jpeg files are in JpegImagePlugin.py. Let's look at that one in depth.

Here things seem to get a bit tricky, in it there is an infinite loop that gets broken out of when the jpeg marker is found:

    while True:

        s = s + self.fp.read(1)
        i = i16(s)

        if i in MARKER:
            name, description, handler = MARKER[i]
            # print hex(i), name, description
            if handler is not None:
                handler(self, i)
            if i == 0xFFDA: # start of scan
                rawmode = self.mode
                if self.mode == "CMYK":
                    rawmode = "CMYK;I" # assume adobe conventions
                self.tile = [("jpeg", (0,0) + self.size, 0, (rawmode, ""))]
                # self.__offset = self.fp.tell()
                break
            s = self.fp.read(1)
        elif i == 0 or i == 65535:
            # padded marker or junk; move on
            s = "\xff"
        else:
            raise SyntaxError("no marker found")

Which looks like it could read the whole file if it was malformed. If it reads the info marker OK however, it should break out early. The function handler ultimately sets self.size which are the dimensions of the image.

There is a package on pypi called imagesize that currently works for me, although it doesn't look like it is very active.

Install:

pip install imagesize

Usage:

import imagesize

width, height = imagesize.get("test.png")
print(width, height)

Homepage: https://github.com/shibukawa/imagesize_py

PyPi: https://pypi.org/project/imagesize/

I often fetch image sizes on the Internet. Of course, you can't download the image and then load it to parse the information. It's too time consuming. My method is to feed chunks to an image container and test whether it can parse the image every time. Stop the loop when I get the information I want.

I extracted the core of my code and modified it to parse local files.

from PIL import ImageFile

ImPar=ImageFile.Parser()
with open(r"D:\testpic\test.jpg", "rb") as f:
    ImPar=ImageFile.Parser()
    chunk = f.read(2048)
    count=2048
    while chunk != "":
        ImPar.feed(chunk)
        if ImPar.image:
            break
        chunk = f.read(2048)
        count+=2048
    print(ImPar.image.size)
    print(count)

Output:

(2240, 1488)
38912

The actual file size is 1,543,580 bytes and you only read 38,912 bytes to get the image size. Hope this will help.

Another short way of doing it on Unix systems. It depends on the output of file which I am not sure is standardized on all systems. This should probably not be used in production code. Moreover most JPEGs don't report the image size.

import subprocess, re
image_size = list(map(int, re.findall('(\d+)x(\d+)', subprocess.getoutput("file " + filename))[-1]))

The OP was interested in a "faster" solution, I was curious about the fastest solution and I am trying to answer that with a real-world benchmark.

I am comparing:

cv2.imread: https://www.kite.com/python/docs/cv2.imread
PIL.open: https://pillow.readthedocs.io/en/stable/reference/Image.html
opsdroid/image_size: https://github.com/opsdroid/image_size
shibukawa/imagesize_py: https://github.com/shibukawa/imagesize_py
kobaltcore/pymage_size: https://github.com/kobaltcore/pymage_size

I am running the following code on 202897 mostly JPG files.

"""
pip install opsdroid-get-image-size --user
pip install pymage_size
pip install imagesize
"""

import concurrent.futures
from pathlib import Path

import cv2
import numpy as np
import pandas as pd
from tqdm import tqdm
from PIL import Image
import get_image_size
import imagesize
import pymage_size

files = [str(p.resolve())
         for p in Path("/data/").glob("**/*")
         if p.suffix in {".jpg", ".jpeg", ".JPEG", ".JPG", ".png", ".PNG"}]

def get_shape_cv2(fname):
    img = cv2.imread(fname)
    return (img.shape[0], img.shape[1])

with concurrent.futures.ProcessPoolExecutor(8) as executor:
    results = list(tqdm(executor.map(get_shape_cv2, files), total=len(files)))

def get_shape_pil(fname):
    img=Image.open(fname)
    return (img.size[0], img.size[1])

with concurrent.futures.ProcessPoolExecutor(8) as executor:
    results = list(tqdm(executor.map(get_shape_pil, files), total=len(files)))

def get_shape_scardine_size(fname):
    try:
        width, height = get_image_size.get_image_size(fname)
    except get_image_size.UnknownImageFormat:
        width, height = -1, -1
    return (width, height)

with concurrent.futures.ProcessPoolExecutor(8) as executor:
    results = list(tqdm(executor.map(get_shape_scardine_size, files), total=len(files)))

def get_shape_shibukawa(fname):
    width, height = imagesize.get(fname)
    return (width, height)

with concurrent.futures.ProcessPoolExecutor(8) as executor:
    results = list(tqdm(executor.map(get_shape_shibukawa, files), total=len(files)))

def get_shape_pymage_size(fname):
    img_format = pymage_size.get_image_size(fname)
    width, height = img_format.get_dimensions()
    return (width, height)

with concurrent.futures.ProcessPoolExecutor(8) as executor:
    results = list(tqdm(executor.map(get_shape_pymage_size, files), total=len(files)))

Results:

cv2.imread: 8m23s
PIL.open: 2m00s
opsdroid/image_size: 29s
shibukawa/imagesize_py: 29s
kobaltcore/pymage_size: 29s

So the opsdroid, shibukawa and kobaltcore perform at the same speed. Another interesting point for me would now be to better understand which of the libraries has the best format support.

[EDIT] So I went ahead and tested if the fast libraries provide different results:

# test if the libs provide the same results
def show_size_differences(fname):
    w1, h1 = get_shape_scardine_size(fname)
    w2, h2 = get_shape_pymage_size(fname)
    w3, h3 = get_shape_shibukawa(fname)
    if w1 != w2 or w2 != w3 or h1 != h2 or h2 != h3:
        print(f"scardine: {w1}x{h1}, pymage: {w2}x{h2}, shibukawa: {w3}x{h3}")

with concurrent.futures.ProcessPoolExecutor(8) as executor:
    results = list(tqdm(executor.map(show_size_differences, files), total=len(files)))

And they don't.

Related questions
                            
                                How Big can a Python List Get?
                            
                                BeautifulSoup Grab Visible Webpage Text
                            
                                Flask-SQLAlchemy import/context issue
                            
                                Printing tuple with string formatting in Python
                            
                                What are WSGI and CGI in plain English?
                            
                                How can I make pandas dataframe column headers all lowercase?
                            
                                SQLAlchemy ORM conversion to pandas DataFrame
                            
                                pip: no module named _internal
                            
                                how to convert an RGB image to numpy array?
                            
                                How to create a density plot in matplotlib?
                            
                                Datetime current year and month in Python
                            
                                pip install failing with: OSError: [Errno 13] Permission denied on directory
                            
                                Confusion between numpy, scipy, matplotlib and pylab
                            
                                hash function in Python 3.3 returns different results between sessions
                            
                                Best way to handle list.index(might-not-exist) in python?
                            
                                How to get exception message in Python properly
                            
                                How to read pickle file?
                            
                                Wheel file installation
                            
                                NumPy: function for simultaneous max() and min()
                            
                                Django - how to create a file and save it to a model's FileField?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Get Image size WITHOUT loading image into memory

Tags:

python

image

image-processing

People also ask

Recent Activity

Donate For Us