Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Could I use GIT in a situation where each file is essentially its own repo

Tags:

git

NOTE: Even though (from the discussions that have already taken place) it looks like GIT is in fact not a good fit for this use case, I have opened this question up to a bounty to prompt a more definitive answer, hopefully from someone who has a good deal of experience with GIT. The original question is below.

I have a situation where I have a large collection of files that are independent. By independent I mean that each file doesn't depend on the presence, absence or particular state of the files around it. A good analogy would be a directory of images, where our workflow allows each image to be created, edited and removed independently, and work done on an image has no bearing on the other images in the directory.

Note that this independence is not just incidental, but is critical to our workflow.

Each of these files would benefit from a GIT like work flow. It would be nice to be able to track the changes to each file, have people work on each file in independent branches and then merge their changes when done (so, for the sake of our analogy, imagine these are SVG images, where you might have an artist drawing the image and a translator translating the text content), and access the files from other projects that use GIT.

From my experience, GIT is great when you have a collection of files that are all in a particular state. For example, when you commit a GIT repo after reaching the state of "Production Release 1.2", every file then shares the state of "Production Release 1.2" at that commit.

But I'm not sure how to apply the GIT workflow, or if it is even practical to do so, when each file does not and can not share the state of the files around it. You could place each file in its own GIT repo, but that doesn't seem practical.

So, my questions are:

  1. Is my impression that GIT only works on a collection of related files correct?
  2. If not, what would be the process for using GIT's clone/branch/merge functionality on a file by file basis?

UPDATE

In response to iberbeu: it's not that I see versions as being X.Y, it's that I see GIT commits as assuming all files in a repo have the same version or commit point (however you define a version). In which case the files in a GIT repo are not totally independent.

The issue here is when you take a single repo with all the independent files, clone it into your own local repo and starting working on a branch. At this point all the files are assumed to belong to the branch, even though from the point of view of the work flow we have, you are only working on one single file. However, now all these independent files are "along for the ride", taking on the revision history associated with the single file that you actually want to edit.

So Joe might create a branch of a repo call "Joe Working on Image 1". His branch has Image 1, which he wants to work on, and 10,000 other images that he has no interest in.

Jane might create a branch of the same repo called "Jane working on Image 987". Her branch has Image 987, which she wants to work on, and 10,000 other images that she has no interest in.

This is fine as long as Joe and Jane aren't tempted to start editing some other images in their branch. But if they did, we lose the conceptual model of each image being edited as an independent entity, and edited in isolation from the other images.

So if Joe edited Image 2 in the branch where he should have been editing only Image 1, and merged those changes back into the repo, we now the explicit revision history of Image 2 being edited along side Image 1. But Image 1 and 2 should be completely independent. There should be no notion of Image 2 as it was edited alongside Image 1.

So this is the crux of the question. Does GIT support the notion of the files it controls as isolated entities whose revisions don't correlate to any other file? Or can this only be achieved with individual git repos for every file?

UPDATE 2

It looks like a submodule might be a replacement for having thousands of GIT repos.

like image 940
Phyxx Avatar asked Jan 15 '23 01:01

Phyxx


2 Answers

I don't really see the problem. I think you see a repository as a way of versioning your code (files in this case). Though that is right, the idea can lead you to an error because it doesn't mean that you commit always version in the form X.Y

What I mean is that you can see a repo as a timeline in which you have different states of the content of a folder. It doesn't matter whether the files are related to each other or not.

With git you can always get an old version of a single file, you don't need to go back to a complete state of the repo.

So, there is no difference at all, in your case, between one repo with several independent files and several repos with one file each. Actually there is a big difference, the first option is afordable and the second is imposible to handle.

Actually a normal project has files that are totally independent but they all belongs to the same repo.

like image 62
iberbeu Avatar answered Jan 19 '23 12:01

iberbeu


As others have said, git can be used for many single-file repos, although it's (as you point out) more made for managing a set of files.

To manage thousands of single-file repositories, the Gitslave tool might help. This tool allows to create a bunch of repos, and manage them all in one. Once you have your repos, you can of course work with each one independantly, but Gitslave makes it easy to make group operations on them like push/pull or commit.

This is IMHO a better solution as having many git submodules, as submodules can be tricky to work with.

From the home page:

Gitslave creates a group of related repositories—a superproject repository and a number of slave repositories—all of which are concurrently developed on and on which all git operations should normally operate; so when you branch, each repository in the project is branched in turn. Similarly when you commit, push, pull, merge, tag, checkout, status, log, etc; each git command will run on the superproject and all slave repositories in turn.

[...]

Gitslave does not take over your repository. You may continue to use legacy git commands both inside of a gits cloned repository and outside in a privately git-cloned repository.

like image 30
CharlesB Avatar answered Jan 19 '23 10:01

CharlesB