Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why do file formats have magic numbers?

Tags:

file-format

For example, Portable Executable has several, including the famous "MZ" at the beginning, as well as the "PE\0\0" at the start of the PE header. The Rar file format has the "Rar!" header at the beginning, and several others have similar "magic values" in the file.

What purpose do such magic values serve?

like image 553
Billy ONeal Avatar asked Oct 01 '10 17:10

Billy ONeal


3 Answers

Because users change the file extension, or other programs steal the file extension, it allows the application to cancel processing of a file in an unknown format instead of trying its best and then failing anyway.

like image 102
Ben Voigt Avatar answered Sep 28 '22 20:09

Ben Voigt


the concept of magic numbers goes back to unix and pre-dates the use of file extensions. The original idea of the shell was that all 'executable' would look the same - it didn't matter how the file had been created or what program should be used to evaluate it. The shell would look at the contents of the file and determine the appropriate file. Microsoft came along and chose a different approach and the era of file extensions was born. Then to make things 'nicer' for users microsoft chose to 'hide' these extensions and the era of trojan files which look like they are of one type but really have a different extension and are processed by a different file was born.

like image 21
user464099 Avatar answered Sep 28 '22 19:09

user464099


If two applications store data differently, but are constructed such that a file for one might possibly also be a valid (but meaningless) file for the other, very bad things can happen. A program may think it has successfully loaded the file (unaware that the data is meaningless) and then write back a file which to it would be semantically identical, but which would no longer be meaningfully readable by the application that wrote it (or anything else for that matter).

Using magic numbers doesn't entirely prevent this, but it can help at least somewhat.

BTW, trying to guess about the format of data is often very dangerous. For example, suppose one has a list of what are probably dates in the format nn-nn-nn. If one doesn't know what format the dates are in, there may be enough information to pretty well guess the format (e.g. if one of the records is 12-31-99, then absent information to the contrary, the dates are probably mm-dd-yy) but if all dates are within the first 12 days of a month, the data could easily be misinterpreted. Suppose, though, the data were preceded by something saying "MM-DD-YY". Then the risks of misinterpretation could be reduced.

like image 26
supercat Avatar answered Sep 28 '22 20:09

supercat