Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using the linux 'file' command to determine type (ie. image, audio, or video)

The word file here refers to the shell file command, and not actual files. I want to determine whether a file is a, for example, video file (.mpg, .mkv, .avi). file is pretty good at returning image for image files, video for video files, and audio for audio files (and application/x-empty for some reason for text). My question is how reliable this is for identifying types. If I did a simple

file -ib deliverance.avi | grep video

would that work for all of the main video files outlined here?

like image 775
puk Avatar asked Nov 12 '11 01:11

puk


2 Answers

The results from file are less than perfect, and it has more problems with some types of files than others. File basically just looks for particular pieces of binary data in predictable patterns to figure out filetypes.

Unfortunately, in particular, some of the filetypes often used for video fall into this "problematic" category. The newer container formats like .mp4 and .mkv usually have several different MIME types that should properly depend on what type of data is being contained. For example, an .mp4 could properly be identified as video/mp4, audio/mp4, or application/mp4 depending on the content.

In practice, file often makes guesses that simply conform with common usage, and it may work perfectly well for you. For example, while I mentioned some theoretical difficulties with identifying Matroska files correctly, file basically just assumes that any Matroska file is a video. On the other hand, the usage of the Ogg container is more evenly split between audio and video, and I believe the current version of file just splits the difference, and identifies Ogg files as application/ogg, which wouldn't fall into any of your categories.

The one thing I can say with certainty is that you want the most up-to-date version of file you can get your hands on. The "magic" files that contain the patterns to match against and the MIME types that will result from a match are updated fairly often to include newer filetypes like WebM, or just to improve accuracy for older types.

like image 66
John Flatness Avatar answered Sep 30 '22 19:09

John Flatness


file works by referencing the header of the file against a "magic number" file. I suspect the best way to see how robust file is to check your local magic number file (possibly /usr/share/magic but see man file for details) for the file types from your referenced list.

like image 39
frankc Avatar answered Sep 30 '22 18:09

frankc