Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to determine file type?

Tags:

c#

file

audio

mp3

I need to know if my file is audio file: mp3, wav, etc...
How to do this?

like image 822
Sergey Avatar asked Nov 14 '10 14:11

Sergey


2 Answers

Well, the most robust way would be to write a parser for the file types you want to detect and then just try – if there are no errors, it's obviously of the type you tried. This is an expensive approach, however, but it would ensure that you can successfully load the file as well since it will also check the rest of the file for semantic soundness.

A much less expensive variant would be to look for “magic” bytes – signatures at the start or known offsets of the file. For example, if a file starts with an ID3 tag you can be reasonably sure it's an MP3 file. If a file starts with RIFF¼↕☻ WAVEfmt, then it's a WAV file. However, such detection cannot guarantee you that the file is really of that type – it could just be the signature and following that garbage.

like image 182
Joey Avatar answered Nov 15 '22 01:11

Joey


While you can use the extension to make a reasonable guess as to what the file is it's not guaranteed to work 100% of the time. If you are targeting Windows then it will work 99.9% of the time as that's how Windows keeps track of what file is what type.

If you are getting your files from non-Windows sources the only sure way is to open the file and look for a specific string or set of bytes which will unambiguously identify it. For example, you could look for the ID3 tags in an mp3 file:

The ID3v1 tag occupies 128 bytes, beginning with the string TAG.

or

ID3v2 tags are of variable size, and usually occur at the start of the file, to aid streaming media.

How far you go depends on how robust you want your solution to be, and does rely on there being a header or pattern that's always present.

Doing it this way can help guard against malicious content where someone posts a piece of malware as a mp3 file (say) and hopes that it will just be run by a program prone to some exploit (a buffer overrun perhaps).

like image 43
ChrisF Avatar answered Nov 15 '22 01:11

ChrisF