Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Managing documents using GIT

I am working on a website where I will be able to create project and upload data to each of my products. The data could be mostly in the form of spreadsheet docs, images, pdfs etc. Ideally, I would like to use a VCS (git pref) kind of setup where each time I update a particular document, I could just commit that document to a repo. Any ideas on how I could go about implementing will be helpful.

like image 407
Anush Shetty Avatar asked Jan 11 '11 08:01

Anush Shetty


2 Answers

You can call git in a subshell after each upload.

But I don't think using any VCS it's good solution for document versioning, especially in web application. This is because with office-like documents you will use mostly binary data. VCS sucks (no exceptions) when comes to binary data. You will not be able to do any diff, and metadata management is not suited for such things - author of commit is mostly bounded to particular account (and you will be using probably one system account for git), no additional information (except base file information: size, permissions, ctime) is stored, so you will have to store it (authorship, permissions for web application users, additional meta-data) some near by by yourself. Also note that several users can commit data at the same time, so there will be branches in your versioning. When you will have huge dataset (and with binary office files it can come quicker than you think), you will not be able to partition such repository.

IMO, using VCS here gives you very small gain and introduces additional problems.

I'd advice keeping metadata in database (file name, revisions, additional stuff), and keep file revisions on disk. Keep each file with revisions in separate, unique dir. One tip here: don't use file names that comes from upload. Use hash functions to calculate unique name based on content and metadata.

like image 155
cezio Avatar answered Oct 26 '22 11:10

cezio


There isn't an universal "commit on save" feature (at least one integrated with all the editors associated with the document types you mention)

The easiest way would be a background job which would commit (or 'git add -A && git commit -m "xxx" in the case of Git) every 5 minutes for instance.

Actually, Mark Longair comments:

flashbake is designed to be run from cron to do what you describe in the second paragraph with some kind of reasonable commit message.
I'm not sure that that's what the original poster is after, though.

Original project here:

  • Automated backup is nice unless you have files for which you want to view an incremental history.
  • Source control is great for that history but most tools expect the author to manually commit their changes along the way.
  • => A seamless source control solution combines the convenience of automated back up with the power of source version control.
like image 26
VonC Avatar answered Oct 26 '22 09:10

VonC