I have a lot of jpeg files with varying image size. For instance, here is the first 640 bytes as given by hexdump of an image of size 256*384(pixels):
0000000: ffd8 ffe0 0010 4a46 4946 0001 0101 0048 ......JFIF.....H
0000010: 0048 0000 ffdb 0043 0003 0202 0302 0203 .H.....C........
0000020: 0303 0304 0303 0405 0805 0504 0405 0a07 ................
0000030: 0706 080c 0a0c 0c0b 0a0b 0b0d 0e12 100d ................
I guess the size information mus be within these lines. But am unable to see which bytes give the sizes correctly. Can anyone help me find the fields that contains the size information?
I have converted the CPP code from the top answer into a python script.
"""
Source: https://stackoverflow.com/questions/2517854/getting-image-size-of-jpeg-from-its-binary#:~:text=The%20header%20of%20a%20JPEG,Of%20Frame%2C%20type%20N).
"""
def get_jpeg_size(data):
"""
Gets the JPEG size from the array of data passed to the function, file reference: http:#www.obrador.com/essentialjpeg/headerinfo.htm
"""
data_size=len(data)
#Check for valid JPEG image
i=0 # Keeps track of the position within the file
if(data[i] == 0xFF and data[i+1] == 0xD8 and data[i+2] == 0xFF and data[i+3] == 0xE0):
# Check for valid JPEG header (null terminated JFIF)
i += 4
if(data[i+2] == ord('J') and data[i+3] == ord('F') and data[i+4] == ord('I') and data[i+5] == ord('F') and data[i+6] == 0x00):
#Retrieve the block length of the first block since the first block will not contain the size of file
block_length = data[i] * 256 + data[i+1]
while (i<data_size):
i+=block_length #Increase the file index to get to the next block
if(i >= data_size): return False; #Check to protect against segmentation faults
if(data[i] != 0xFF): return False; #Check that we are truly at the start of another block
if(data[i+1] == 0xC0): #0xFFC0 is the "Start of frame" marker which contains the file size
#The structure of the 0xFFC0 block is quite simple [0xFFC0][ushort length][uchar precision][ushort x][ushort y]
height = data[i+5]*256 + data[i+6];
width = data[i+7]*256 + data[i+8];
return height, width
else:
i+=2; #Skip the block marker
block_length = data[i] * 256 + data[i+1] #Go to the next block
return False #If this point is reached then no size was found
else:
return False #Not a valid JFIF string
else:
return False #Not a valid SOI header
with open('path/to/file.jpg','rb') as handle:
data = handle.read()
h, w = get_jpeg_size(data)
print(s)
According to the Syntax and structure section of the JPEG page on wikipedia, the width and height of the image don't seem to be stored in the image itself -- or, at least, not in a way that's quite easy to find.
Still, quoting from JPEG image compression FAQ, part 1/2 :
Subject: [22] How can my program extract image dimensions from a JPEG file?
The header of a JPEG file consists of a series of blocks, called "markers". The image height and width are stored in a marker of type SOFn (Start Of Frame, type N).
To find the SOFn you must skip over the preceding markers; you don't have to know what's in the other types of markers, just use their length words to skip over them.
The minimum logic needed is perhaps a page of C code.
(Some people have recommended just searching for the byte pair representing SOFn, without paying attention to the marker block structure. This is unsafe because a prior marker might contain the SOFn pattern, either by chance or because it contains a JPEG-compressed thumbnail image. If you don't follow the marker structure you will retrieve the thumbnail's size instead of the main image size.)
A profusely commented example in C can be found in rdjpgcom.c in the IJG distribution (see part 2, item 15).
Perl code can be found in wwwis, from http://www.tardis.ed.ac.uk/~ark/wwwis/.
(Ergh, that link seems broken...)
Here's a portion of C code that could help you, though : Decoding the width and height of a JPEG (JFIF) file
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With