Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do you write a magic file test pattern to match the end of a file?

I am beginning to wonder if this is even possible as multiple searches on SO, Google, Bing and linuxquestions.org have turned up nothing.

I am interested in extending the magic patterns located in /usr/share/magic (used by the file(1) utility) to recognize files based on data at or near the end of the file. I have been able to do this for the beginning of a file, as well as for arbitrary offsets into the file from the beginning.

The man page does a pretty good job of illustrating some standard usage cases; unfortunately, it does not seem like there is a way to index from the end as opposed to the beginning. The only workaround I could come up with was to adopt a scripted approach using tac and/or lreverse but feel these may be unfriendly to binary data.

Also, I wanted to avoid any other scripted processing - I feel like this should be doable with the right file magic. Any ideas?

like image 684
jayce Avatar asked Feb 10 '11 17:02

jayce


People also ask

What is a magic file and how do we use it with the file command?

The file command uses the /etc/magic file in its attempt to identify the type of a binary file. Essentially, /etc/magic contains templates that show what different types of files look like. The magic file contains lines that describe magic numbers, which identify particular types of files.

How do I compile a magic file?

You can man magic for a description for how to create your own magic file. Then use file -C -m <your magic file> to compile it, and file -m <your magic file> to use it.

What is magic number of a file?

Magic numbers are the first few bytes of a file that are unique to a particular file type. These unique bits are referred to as magic numbers, also sometimes referred to as a file signature. These bytes can be used by the system to “differentiate between and recognize different files” without a file extension.

What is a magic file in Shell?

The magic file contains lines describing magic numbers, which identify particular types of files. Lines beginning with a > or & character represent continuation lines to a preceding main entry: > If the file command finds a match on the main entry line, these additional patterns are checked.


1 Answers

It's not possible. file(1) is designed to work with pipes too. You can not use lseek(2) on pipes to get to the end of the file. Reading the whole file until the end would be very slow (and file(1) tries hard to be fast) and if it is actually reading from a pipe, it may never encounter the end of the file, which would be even worse.

As for the documentation, in case of open source software, the source code itself is the ultimate documentation. If you get stuck in a case like this, it is always a good idea to have a look. The function file_or_fd() in src/magic.c gives the clue. Use the Source, Luke! ;-)

In your specific case, I would have a second look at the file format in question, and if it really can not be parsed by file(1), then a short Perl or Python script should do the trick. Good luck!

like image 59
Mackie Messer Avatar answered Oct 03 '22 12:10

Mackie Messer