I am beginning to wonder if this is even possible as multiple searches on SO, Google, Bing and linuxquestions.org have turned up nothing.
I am interested in extending the magic patterns located in /usr/share/magic
(used by the file(1)
utility) to recognize files based on data at or near the end of the file. I have been able to do this for the beginning of a file, as well as for arbitrary offsets into the file from the beginning.
The man page does a pretty good job of illustrating some standard usage cases; unfortunately, it does not seem like there is a way to index from the end as opposed to the beginning. The only workaround I could come up with was to adopt a scripted approach using tac
and/or lreverse
but feel these may be unfriendly to binary data.
Also, I wanted to avoid any other scripted processing - I feel like this should be doable with the right file magic. Any ideas?
The file command uses the /etc/magic file in its attempt to identify the type of a binary file. Essentially, /etc/magic contains templates that show what different types of files look like. The magic file contains lines that describe magic numbers, which identify particular types of files.
You can man magic for a description for how to create your own magic file. Then use file -C -m <your magic file> to compile it, and file -m <your magic file> to use it.
Magic numbers are the first few bytes of a file that are unique to a particular file type. These unique bits are referred to as magic numbers, also sometimes referred to as a file signature. These bytes can be used by the system to “differentiate between and recognize different files” without a file extension.
The magic file contains lines describing magic numbers, which identify particular types of files. Lines beginning with a > or & character represent continuation lines to a preceding main entry: > If the file command finds a match on the main entry line, these additional patterns are checked.
It's not possible. file(1)
is designed to work with pipes too. You can not use lseek(2)
on pipes to get to the end of the file. Reading the whole file until the end would be very slow (and file(1)
tries hard to be fast) and if it is actually reading from a pipe, it may never encounter the end of the file, which would be even worse.
As for the documentation, in case of open source software, the source code itself is the ultimate documentation. If you get stuck in a case like this, it is always a good idea to have a look. The function file_or_fd()
in src/magic.c
gives the clue. Use the Source, Luke! ;-)
In your specific case, I would have a second look at the file format in question, and if it really can not be parsed by file(1)
, then a short Perl or Python script should do the trick. Good luck!
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With