Greetings,
I'm going to get exif info from some images using android. I know there are some standard java lib's out there that I could use with the device. I'm sure I will end up using one.
But in the meantime can someone explain to me how this information is encoded inside a JPG? Where / how would you usually get the info from the document. When I opent he document up with a text editor its all binary.
Curious as to how it works and how I could potentially read the data in question.
EXIF data is stored embedded into the physical file of the image, and therefore can only be accessed by specific software. The information is removed when a photo is converted to formats other than JPEG, such as PNG or GIF.
EXIF data is another layer beyond that metadata, and it's specific to photographs and video footage. It provides an additional level of detail, that helps confirm certain aspects of the digital evidence we are recovering and analyzing, however, like anything else that is human-made, it's susceptible to manipulation.
EXIF (Exchangeable Image File Format) files store important data about photographs. Almost all digital cameras create these data files each time you snap a new picture. An EXIF file holds all the information about the image itself — such as the exposure level, where you took the photo, and any settings you used.
Image encoding is used on dataset features that are image files, like jpg and png files. The Color mode and Image transformation are conversion steps that are performed before an image is sent to the model, both during training and at inference, when you query a deployed model.
I'm a bit late to the party, but having written a Java library for processing Exif (among other types of metadata) I thought I'd chime in.
Exif is built upon TIFF, the Tagged Image File Format. So we first have to examine TIFF:
Think of the structure as a tree with primitive values at the leaves. TIFF is self describing about its structure, but it doesn't dictate anything about what the values at the leaves actually mean.
Really you can store any kind of data in TIFF, it's not coupled to images.
The TIFF file has a generic header:
MM
or II
in ASCII. This tells you what order to consider all the future bytes in -- LSB or MSB first.0x002A
IFDs have equally simple structure:
Tags have a simple representation in 12 bytes:
The data types are predefined. For example: 1 represents 8-bit unsigned integers, and 12 represents 64-bit floating point numbers.
So with all that you can go ahead and follow the data file. Some observations:
0x1234
has 4 integers: {1,2,3,4}
To decode TIFF into Exif, you need to apply the dictionary that defines what each IFD represents, and what each tag ID within those IFDs represent.
Most users of my library are processing JPEG files. JPEGs have a completely different structure, comprising a sequence of segments. Each segment has an identifier and a block of bytes. Exif is found in the APP1
(numeric value 0xe1
) segment of a JPEG file. Once you have that, you must skip past a few leading bytes (Exif\0\0
) before seeing the MM
or II
that denote the start of the TIFF formatted Exif data.
Here's a binary dump of one of my library's sample images:
In order:
FF D8
is the JPEG 'magic number'.FF
marks a JPEG segment start.E1
indicates the JPEG segment type (this is APP1
, where Exif lives).18 B3
(6,323 decimal) gives the length of the segment (including the size bytes), so we know that all the Exif data for this JPG file will sit within the next 6,321 bytes. Note that in JPG, multi-byte values are encoded with Motorolla ordering, although nested Exif data may use Intel ordering.45 78 69 66 00 00
or in ASCII Exif\0\0
is the Exif preamble. APP1
is not exclusively reseved for Exif, so this discriminates.4D 4D
or MM
indicates we have Motorolla byte order in this Exif block00 2A
is our standard TIFF marker, as discussed above00 00 00 08
is the offset (8 bytes) to the first IFD, relative to the TIFF header (MM
in this case). This points directly to the next byte in the sequence in this case, though it doesn't have to.00 08
opens our first IFD and tells we'll have 8 tags coming up01 0F
is the ID for the first tag in the first IFD, in this case the manufacturer of the camera00 02
is the type of the value (2 means it's an ASCII string)00 00 00 16
is the number of components, meaning we'll have a 22-byte string00 00 01 B2
(434 decimal) is a pointer to the location of that string, relative to the TIFF header (MM
). You can't see it in this screenshot, but it points to 45 41 53 54 4D 41 4E 20 4B 4F 44 41 4B 20 43 4F 4D 50 41 4E 59 00
which is EASTMAN KODAK COMPANY
in ASCIICamera raw files (CR2/NEF/ORW...) generally use TIFF, however they mostly use different tags to those for Exif. The second pair of bytes in these files will be different to 00 2A
as well, indicating the type of TIFF dictionary that ought to be applied.
If you search for the string "Exif" you will find the start of the Exif data -- it's quite complicated, and I would recommend using a library -- (e.g. my company's DotImage if you were using .NET).
Here's a high level description though:
The Exif itself is inside of an AppMarker -- the three bytes before will be E1 (AppMarker 1) and the size of the marker's data in the endianness of the file. Two bytes after the Exif you will see the endianness marker (e.g. 49 49
means II
which means Intel, little endian -- that means that 2 bytes numbers have the low byte first in the file).
The rest of the data uses offsets extensively, the offset is from the location of the first endian byte (the 49 in the above case)
8 bytes from this offset is a 2-byte number which is the number of exif tags. If you are in II
byte order, reverse the bytes to read the length.
Then there will be this number of 12 byte records. Each one is:
2 bytes: Tag ID
2 bytes: Tag Type
4 bytes: Length
4 bytes: data if the data is 4 bytes or less, or an offset to the data
After the N 12 byte records, you will have the data pointed to by each offset used in the above N records. You need to look up ids and types to see what they mean and how they are represented.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With