Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Binary diff and patch utility for a virtual machine image [closed]

I need to release some software quite frequently, and the software is contained as a VMWare disk file, i.e., .vmdk file. What I want is some kind of binary diff and patch utility to make the delta generated as small as possible.

like image 336
tianyapiaozi Avatar asked Jan 19 '11 01:01

tianyapiaozi


People also ask

Does diff work for binary files?

diff determines whether a file is text or binary by checking the first few bytes in the file; the exact number of bytes is system dependent, but it is typically several thousand. If every byte in that part of the file is non-null, diff considers the file to be text; otherwise it considers the file to be binary.

How can you tell the difference between two binary files in Linux?

Use the command cmp to check if two files are the same byte by byte. The command cmp does not list differences like the diff command. However it is handy for a fast check of whether two files are the same or not (especially useful for binary data files).

How does Bsdiff work?

BSDIFF generates a patch <patchfile> between two binary files. It compares <pre-conditioned-file> to <malformed-file> and writes a <patchfile> suitable for use by BSPATCH. BSPATCH applies a patch built with BSDIFF, it generates <malformed-file> using <pre-conditioned-file>, and <patchfile> from BSDIFF.

How do I use a .diff file?

Applying a DIFF File in the Command LineCopy the DIFF files to the root directory of your store. Open the terminal on the server or access the server remotely via SSH. Replace /path/to/cscart/root/directory with the actual path to the root directory of your store. Replace example.


2 Answers

Let me start off with tried-and-true approaches, then point out some more recent approaches.

approaches that I have seen work with binary files

A long time ago, people expanded the old and the new versions of a binary file into temporary "text" files (every byte expanded to 3 bytes: 2 hex digits and a newline). Then run these two "text" files through an old version of "diff" (that definitely couldn't handle binary files) to make a patch file. Then we transmitted that "text" patch file over communication lines that were not yet 8-bit-clean. On the receive end, one expanded the old binary file into a temporary text version, then patched that old text file, and then compressed the new text file back into a binary file (compressing each pair of hex digits into one byte, and throwing away the newlines and any carriage returns that might have crept in).

More recently, I have been using rsync (or some utility built on top of it such as Unison). It handles arbitrary binary files just fine. I generally do a live update, with Unison running on my local machine and rsync running on the file server, talking back and forth to each other.

No matter how a patch file is generated, you can use any data compression utility to compress that file.

approaches that, as far as I know, ought to work with binary files

StackOverflow: "how to crate a PATCH file for the binary difference output file" suggests using bsdiff.

Another StackOverflow question implies that "vimdiff" seems to handle arbitrary bytes adequately.

StackOverflow: "Useful Binary Diff Tool" mentions a few other binary difference tools.

I hear that some tools based on rsync -- "rdiff" and "rdiff-backup" and "duplicity" -- allow you create a patch file. Then a person who receives that patch file can use it to update their old binary file to a new binary file.

The Wikipedia claims that recent versions of the standard "diff" and "patch" utilities support binary files. Have you tried that?

cutting-edge research in executable file compression

If you are interested in cutting-edge research on making the delta file as small as possible when updating executable files, you'll want to check out "How Courgette works" by Stephen Adams 2009 at The Chromium Projects.

Among other things, the computer that receives the patch "disassembles" the old application, converting all absolute addresses and offsets into symbols; then patches the disassembled code; then "reassembles" the patched code into the new version of the application.

like image 158
David Cary Avatar answered Nov 03 '22 09:11

David Cary


Try xdelta.

I was looking for some binary diff tools for very large files (one LVM logical volume and its snapshots, because LVM doesn't support snapshot of snapshot yet) and xdelta works for me.

like image 29
Dai Qizhi Avatar answered Nov 03 '22 10:11

Dai Qizhi