Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

methods to store binary files in SVN

Are There different methods to store binary files in SVN? if so, what are they, and how I modify the storage options?

I read that there are 4 ways to store binary files in SVN:

  1. Compressed tar - import - export.
  2. Tar - import - export.
  3. import - export.
  4. Efficient check-in.

Which of those are the most useful for time efficiancy? and how do I set the SVN to use any of those methods?

Thanks, Oded.


I have many small-sized binary files and a few large-sized ones. All are changed frequently. I'm currently working on CVS and switching to SVN soon and I wanted to know about ways to store binaries.

I read Performance tuning Subversion (mentioned above) and found it useful but no examples made so I didn't exactly understand how to do each of the 4 ways he suggested.

My basic question is weather or not the defaults are good (and what are they?) My first consideration is time-efficient and then space. Thanks :)

like image 380
Oded Avatar asked Jul 20 '09 06:07

Oded


People also ask

How does SVN store binary files?

By default, Subversion treats all file data as literal byte strings, and files are always stored in the repository in an untranslated state.

How are binary files stored?

A binary file is one that does not contain text. It is used to store data in the form of bytes, which are typically interpreted as something other than textual characters. These files usually contain instructions in their headers to determine how to read the data stored in them.

Which function can be used to store data in a binary file?

To write into a binary file, you need to use the function fwrite(). The functions takes four arguments: Address of data to be written in disk, Size of data to be written in disk, number of such type of data and pointer to the file where you want to write.

Can we store binary files in git?

Git LFS is a Git extension used to manage large files and binary files in a separate Git repository. Most projects today have both code and binary assets. And storing large binary files in Git repositories can be a bottleneck for Git users. That's why some Git users add Git Large File Storage (LFS).


2 Answers

You don't set Subversion to use any of those methods, you specify which method to use when putting files into the repository. And by "method", I don't mean any of the 4 you mention, but rather just "import" or "commit", and you'll have to keep telling Subversion about the method chosen each time you want to store a new revision of that file into the repository.

See Performance tuning Subversion.

As you can see from the description there, in order to use "method 1", compressing to tar and then use import, they have to themselves compress all the binary files into a .tar file, and then use the import command of Subversion to add the files into the repository.

Also note that caveat there, the import command stores files as new files, not as deltas to a previous revision, so it might be time-efficient, but not space-efficient, if few changes to a big file has been committed.

Subversion by itself only does commits and imports. A commit is a new revision to an existing file, stored as a sequence of deltas (or the first revision of a new file, which isn't), and an import is just a new file. Anything else you'll have to do yourself.

If the binary files are only changed now and then, this might be worth looking more into, but if they are changed regularly, I'd suggest just using Subversion as normal, with the commit command.

Also note that the typical advice when it comes to binary files is that you instead of the binary file store the source code to whatever it is that produces those binary files, if possible, and then re-runs the tools to reproduce the actual binary files. If the binary files are time or space-consuming to reproduce, only then do you also store the binary files in question.

Binary files have the problem of not really being good to compare, and thus if developer a and b both retrieves the latest version, and then developer a commits a new revision before developer b tries to do the same, some type of conflict will occur. Developer B might be left with no option but to try to figure out the changes by himself.


Edit: Let me emphasize what I mean by COMMIT and IMPORT.

The main difference is that COMMIT will, assuming you already have the file in the repository already, try to diff the file in your working copy against the previous repository version, and store only the changes. This will take time, and memory, in order to work out those differences, but will typically result in a small revision changeset in your repository. In other words, disk space on your Subversion server will be less impacted than with the IMPORT command.

IMPORT, on the other hand, will import the new file as though you just gave it a new file and said "forget about the previous one, just store this file", and thus no time or memory will be spent on working out the differences, but the resulting changeset in the repository will be larger. In other words, disk space on your Subversion server will be more impacted than with the COMMIT command, but IMPORT will typically run much faster.

Any other workflow you want to impose has to be done outside of Subversion. This includes the TAR command and compression options available in your operating system. If you want to go with "method 1", you, yourself, has to manually compress the file(s) you want to import into a single .tar file before you give it to Subversion. You can not ask Subversion to do any of that for you. You can of course make script files that automate the process somewhat, but still, it's not a Subversion problem.

I would do some serious tests with this to figure out if the gains are actually worth the extra work you will impose on your Subversion workflow.

like image 160
Lasse V. Karlsen Avatar answered Oct 27 '22 22:10

Lasse V. Karlsen


Could you describe your situation in more detail?

Do you have several smallish binary files that all change together? A few large binary files that change independently? Do your files change frequently?

Have you actually found that the defaults aren't good enough? I've always just added binary files in the same way as normal and found it to just work. Like any performance problem, I wouldn't try to make things complicated unless you've got a good reason to - in which case, please share that reason with us.

like image 35
Jon Skeet Avatar answered Oct 27 '22 23:10

Jon Skeet