I notice that if I edit a file in my repo, stage it but don't commit it, edit it again, stage it again but don't commit etc. then for each time I do this a new snapshot is taken and the disk space increases.
Furthermore, if I stage 5 times following tiny edits and finally commit once after all stagings, the disk space of the repo still increases approximately 5x the file size.
My question is, why doesn't git just forget about the other staged versions if only the latest one has a commit sha1 reference to the state? The other 4 staged versions will be garbage collected? Is there a way to checkout a staged state which was never committed?
This command can be performed multiple times before a commit. It only adds the content of the specified file(s) at the time the add command is run; if you want subsequent changes included in the next commit, then you must run git add again to add the new content to the index.
Removing Files To remove a file from Git, you have to remove it from your tracked files (more accurately, remove it from your staging area) and then commit. The git rm command does that, and also removes the file from your working directory so you don't see it as an untracked file the next time around.
Enter one of the following commands, depending on what you want to do: Stage all files: git add . Stage a file: git add example. html (replace example.
The git add command adds a change in the working directory to the staging area. It tells Git that you want to include updates to a particular file in the next commit.
See git fsck --lost-found
.
My question is, why doesn't git just forget about the other staged versions if only the latest one has a commit sha1 reference to the state?
It does ... eventually.
The other 4 staged versions will be garbage collected?
Yes, when git gc
eventually runs automatically. If you want this to happen sooner, you can run git gc
yourself, but there's only rarely any reason to bother (the common case being oops, I did not mean to git add 10terabytes.db
).1
Is there a way to checkout a staged state which was never committed?
Sort of. The git checkout
command cannot do it because git checkout
works by file names, and these staged content-only blobs have no file name. They have only a hash ID. To extract their data, you must first find their hash ID. This is easy to do: you just checksum the data the way Git would, which just means that you need to have the data available first, in order to get the data. :-)
Alternatively, you can do much of what git gc
does, which is:
HEAD
entries from all active work-trees.2
(This is a bit slow, but git fsck
does it for you, so that you do not have to write code to do it.)
From the set of all unreachable objects, select those that have type blob, i.e., files that were git add
ed but never committed. Inspect each blob, using its hash ID to access it, to see if it is the one you wanted. Here git cat-file -p
is useful, or use git fsck --lost-found
, which takes each such blob, de-compresses it, and writes the data to an ordinary file in .git/lost-found/other/
.
1Note that you may also need --prune=
options: git gc
defaults to giving other Git processes 14 days to complete the job of hooking up objects. If you use --prune=all
, make sure no other Git activity is occurring.
2When you remember to include work-trees added via git worktree add
, you will be doing something the Git folks forgot to do. This is a particularly nasty bug, present in Git version 2.5 through 2.14.*: work being done in an added work-tree can be pruned via an automatic git gc
, if you've left that work-tree idle for 2 weeks or more. If you are using git worktree add
, I recommend making sure your Git is at least version 2.15.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With