Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Existing solution for file deltas/versioning in Java

When versioning or optimizing file backups one idea is to use only the delta or data that has been modified.

This sounds like a simple idea at first but actually determining where unmodified data ends and new data starts comes accross as a difficult task.

Is there an existing framework that already does something like this or an efficient file comparison algorithm?

like image 210
James P. Avatar asked Feb 13 '11 04:02

James P.


5 Answers

XDelta is not Java but is worth looking at anyway. There is Java version of it but I don't know how stable is it.

like image 142
Sasha O Avatar answered Oct 12 '22 17:10

Sasha O


Instead of rolling your own, you might consider leveraging an open source version control system (eg, Subversion). You get a lot more than just a delta versioning algorithm that way.

like image 33
Jim Ferrans Avatar answered Oct 12 '22 15:10

Jim Ferrans


It sounds like you are describing a difference based storage scheme. Most source code control systems use such systems to minimize their storage requirements. The *nix "diff" command is capable of generating the data you would need to implement it on your own.

like image 29
Chris Nava Avatar answered Oct 12 '22 15:10

Chris Nava


Here's a Java library that can compute diffs between two plain text files:

http://code.google.com/p/google-diff-match-patch/

I don't know any library for binary diffs though. Try googling for 'java binary diff' ;-)

like image 1
python dude Avatar answered Oct 12 '22 17:10

python dude


As for my opinion, Bsdiff tool is the best choice for binary files. It uses suffix sorting (Larsson and Sadakane's qsufsort) and takes advantage of how executable files change. Bsdiff was written in C++ by Colin Percival. Diff files created by Bsdiff are generally smaller than the files created by Xdelta.

It is also worth noting that Bsdiff uses bzip2 compression algorithm. Binary patches created by Bsdiff sometimes can be further compressed using other compression algorithms (like the WinRAR archiver's one).

Here is the site where you can find Bsdiff documentation and download Bsdiff for free: http://www.daemonology.net/bsdiff/

like image 1
Nikolai Samteladze Avatar answered Oct 12 '22 15:10

Nikolai Samteladze