Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Determine filetype by magic number

I had to check filetype in file uploader to determine if file was image (jpg, png) and I decided to do it by reading file's magic number (4 first bytes) with FileReader but I have some doubts about this method:

  1. Is this method safe? Is there a way to upload non jpg file as jpg with this method?

  2. I've seen filetypes with different magic numbers size like 2, 4, 6 bytes... So If I had to make a generic method to determine not just image filetype but the others as well, I would have to read the maximum amount of bytes (to determine largest magic number) from the file, right?

like image 372
Elolawyn Avatar asked Oct 28 '25 05:10

Elolawyn


1 Answers

To your first question, it's not going to harm anything by checking the magic numbers of the file, if that's what you mean by "safe". It's a perfectly valid and acceptable means of validating that a file type is at least nominally what it says it is. It sure beats just checking a filetype or even the MIME type. It's not going to be a foolproof method, but most foolproof methods are too heavy for a client-side validation. And so the accepted answer is right, you should be validating server-side as well.

What this does get you is better UX. You'd be surprised how many people will rename a file and think that's all it takes to convert the filetype. This is going to catch that while file extensions and even MIME types won't. If Aunt Jane sees that your app only accepts JPG, well she's liable to rename her PNG to JPG and try uploading it. If your app only checks the MIME or file extension, it's going to incorrectly upload that JPG and then your server is going to have to respond to your client-side app to tell you it's not an acceptable file format. If you validate magic numbers you're saving a trip to the server and back, and providing near-instant feedback to the user.

This is the same reason we validate a zip code or email. It doesn't verify it's right but it helps catch user errors and provide a more enjoyable and responsive experience.

To your second question, the rough idea there is to make an object of all acceptable magic numbers, iterate though that object to find the longest one and then read that many bytes from the file. There might be further logic for some cases, but generally this is enough to catch the majority of file types.

Here is a list of magic numbers. If you sort by offset you'll see that most filetypes do not have an offset and of those that do, the only filetype I think I've ever used are dmg and iso. Additionally, of the filetypes with no offset it looks like (cursory glance, I could be wrong) that the longest magic number is 35 bytes. You could probably safely just read the first 35 bytes of a file and not even bother iterating through your list, 35 bytes is so small it might actually be faster than bothering with the iteration step to determine the longest signature in your array.

like image 53
Ian Pringle Avatar answered Oct 29 '25 19:10

Ian Pringle