Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Reading a JPEG in Python (PIL) with broken header

I'm trying to open a jpeg file in Python 2.7,

from PIL import Image
im = Image.open(filename)

Which didn't work for me,

>>> im = Image.open(filename)
Traceback (most recent call last):
  File "<pyshell#810>", line 1, in <module>
    im = Image.open(filename)
  File "/usr/lib/python2.7/dist-packages/PIL/Image.py", line 1980, in open
    raise IOError("cannot identify image file")
IOError: cannot identify image file

though when trying out on external viewers, it opened fine. Digging in a bit, it turns out that the JpegImageFile._open method from PIL's JpegImagePlugin.py file raises a SyntaxError exception due to several extraneous 0x00 bytes before the 0xFFDA marker in the JPEG's file header,

Corrupt JPEG data: 5 extraneous bytes before marker 0xda

That is, while other programs I tried simply ignored the unknown 0x00 marker towards the end of the header, PIL prefered to raise an exception, not allowing me to open the image.

QUESTION: Apart from editing PIL's code directly, is there any workaround for opening JPEGs with problematic headers?

The relevant code from the JpegImageFile class which raises the exception appears below, for your convenience:

def _open(self):

    s = self.fp.read(1)

    if ord(s[0]) != 255:
        raise SyntaxError("not a JPEG file")

    # Create attributes
    self.bits = self.layers = 0

    # JPEG specifics (internal)
    self.layer = []
    self.huffman_dc = {}
    self.huffman_ac = {}
    self.quantization = {}
    self.app = {} # compatibility
    self.applist = []
    self.icclist = []

    while 1:

        s = s + self.fp.read(1)

        i = i16(s)

        if MARKER.has_key(i):
            name, description, handler = MARKER[i]
            # print hex(i), name, description
            if handler is not None:
                handler(self, i)
            if i == 0xFFDA: # start of scan
                rawmode = self.mode
                if self.mode == "CMYK":
                    rawmode = "CMYK;I" # assume adobe conventions
                self.tile = [("jpeg", (0,0) + self.size, 0, (rawmode, ""))]
                # self.__offset = self.fp.tell()
                break
            s = self.fp.read(1)
        elif i == 0 or i == 65535:
            # padded marker or junk; move on
            s = "\xff"
        else:
            raise SyntaxError("no marker found")
like image 779
Shlomi A Avatar asked Mar 24 '14 13:03

Shlomi A


Video Answer


1 Answers

PIL doesn't like corrupt data in the header and falls over as you've discovered.

I've made a pull request to Pillow (the friendly PIL fork) that should fix this problem.

It's not yet been accepted, but hopefully it'll be there for version 2.5.0 due out in a couple of months. In the meantime, you can try it out here: https://github.com/python-imaging/Pillow/pull/647

As a workaround, you could use something like ImageMagick to first convert the problematic images to something like png, and then use them in PIL/Pillow.

like image 91
Hugo Avatar answered Sep 23 '22 04:09

Hugo