How do I diff two binary files?
I have two versions of a program, version 1 and version 2. I've made a small number of changes between the two version, but unfortunately haven't been backing up regularly, and so although I've got the source for version 2, I only have the binary of version 1. I need to find out what, exactly, I changed between the two versions. I've tried creating an objdump of the two versions and then using diff to find the changes, but that doesn't work because the offsets are different, and so diff considers almost every line to have changed.
For example, one line might be bgez v0,4074d0<daemonize+0xd4>
in version 1, and bgez v0,4073d4<daemonize+0xd4>
in version 2. These are copied directly from the dump files - you can see the two lines do the same thing, but diff can't distinguish them. The files are too big for me to examine every line manually; How do I detect functionality changes, while ignoring differences in offset?
You can use Fc.exe to compare two ASCII or binary files on a line-by-line basis. It offers several command-line options. For example, use the fc /b command to compare two binary files. For a complete list of options, type fc /?
First: diff The command most likely to come to mind for this task is diff. The diff command will show you the differences between two text files or tell you if two binaries are different, but it also has quite a few very useful options.
diff determines whether a file is text or binary by checking the first few bytes in the file; the exact number of bytes is system dependent, but it is typically several thousand. If every byte in that part of the file is non-null, diff considers the file to be text; otherwise it considers the file to be binary.
I eventually solved this by removing the raw instructions and offset markers so I only had the assembly, then using sed to strip out every digit, and filtering diff to ignore changes consisting of only 1 line. I was a little surprised that it worked, but it did.
It is possible. I am currently working on a project that is capable of searching function and memory pointer addresses from a compiled file within a new/modified binary file. It supports windows PE and ELF binaries on x86 and x86_64. There is also a paper describing the approach. It works good for my reversing project, where I have to update all hooks and memory addresses frequently when binary updates are made. But there are other use-cases as well.
Check it out here.
The trick is that it does not rely on weak text comparisons, it disassembles the binaries and compares all functions by measuring the geometric distance between them using code metrics.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With