Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Binary Delta Storage

I'm looking for a binary delta storage solution to version large binary files (digital audio workstation files)

When working with DAW files, the majority of changes, especially near the end of the mix are very small in comparison to the huge amount of data used to store raw data (waves).

It would be great to have a versioning system for our DAW files, allowing us to roll back to older versions.

The system would only save the difference between the binary files (diff) of each version. This would give us a list of instructions to change from the current version to the previous version without storing the full file for every single version.

Is there any current versioning systems that do this? I've read that SVN using binary diff's to save space in the repo... But I've also read that it doesn't actually do that for binary files only text files... Not sure. Any ideas?

My plan of action as of right now is to continue research into preexisiting tools, and if none exist, become comfortable with c/c++ reading binary data and creating the tool myself.

like image 867
Colton Phillips Avatar asked Aug 29 '11 18:08

Colton Phillips


2 Answers

I can't comment on the reliability or connection issues that might exist when committing a large file across the network (one referenced post hinted at problems). But here is a little bit of empirical data that you may find useful (or not).

I have been doing some tests today studying disk seek times and so had a reasonably good test case readily at hand. I found your question interesting, so I did a quick test with the files I am using/modifying. I created a local Subversion repository and added two binary files to it (sizes shown below) and then committed the files a couple of times after changes were made to them. The smaller binary file (.85 GB) simply had data added to the end of it each time. The larger file (2.2GB) contains data representing b-trees consisting of "random" integer data. The updates to that file between commits involved adding approximately 4000 new random values, so it would have modified nodes spread somewhat evenly throughout the file.

Here are the original file sizes along with the size/count of all files in the local subversion repository after the commit:

file1    851,271,675  
file2  2,205,798,400 

1,892,512,437 bytes in 32 files and 32 dirs

After the second commit:

file1    851,287,155  
file2  2,207,569,920  

1,894,211,472 bytes in 34 files and 32 dirs

After the third commit:

file1    851,308,845  
file2  2,210,174,976  

1,897,510,389 bytes in 36 files and 32 dirs

The commits were somewhat lengthy. I didn't pay close attention because I was doing other work, but I think each one took maybe 10 minutes. To check out a specific revision took about 5 minutes. I would not make a recommendation one way or other based on my results. All I can say is that it seemed to work fine and no errors occurred. And the file differencing seemed to work well (for these files).

like image 107
Mark Wilkins Avatar answered Sep 29 '22 05:09

Mark Wilkins


Subversion might work, depending on your definition of large. This question/answer says that it works well as long as your files are less than 1 GB.

like image 39
Shane Wealti Avatar answered Sep 29 '22 03:09

Shane Wealti