Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the purpose of the Zip64 'end of central directory locator'?

In the Zip64 format, there is a header called

Zip64 end of central directory locator

that contains the offset to the zip64 end of central directory record. Why would you need this record when you can search for the 'zip64 end of central directory' record by its magic number?

EDIT: Please note that the only way to look up the locator is by looking up the magic number for locator. The point here is that why bother searching for the locator with the locator magic number in the first place when you can directly search the zip64 end of central directory record also by its magic number?

like image 680
JosephH Avatar asked Nov 18 '11 19:11

JosephH


1 Answers

Navigating directly to a byte offset in a file is significantly faster than searching for a magic number. Additionally, there is no guarantee that the magic number won't be found elsewhere within the data, which could cause the implementation to read from incorrect data if it starts reading from an invalid but "assumed correct" location.

After doing some additional implementation around this myself, I think the most significant thing to note is that "special purpose data may reside in the zip64 extensible data sector field" (following the Zip64 end of central directory record). Multiple of these fields may exist, and each starts with a header ID of 2 bytes, followed by a data size of 4 bytes - followed by the actual "special purpose data" - allowing for multiple 2^32 bytes (4 GB) of data. While this may seem extreme, doing so could certainly lead to needing to span disks between the locator and the "Zip64 end of central directory record". Larger amounts of data here would not only take longer to scan for the signature, but the random chance of accidentally finding the minimal 4 byte / 32-bit "zip64 end of central directory" signature will increase with the length of the data.

"the only way to look up the locator is by looking up the magic number for locator" is not true. If it exists, it should be immediately before the "End of central directory record". Reading back 20 bytes from there, then reading the next 4 bytes should yield the "zip64 end of central dir locator signature" - which can be used as a sanity check (rather than scanning for it).

like image 87
ziesemer Avatar answered Nov 08 '22 07:11

ziesemer