git add multiple times without commit

Tags:

git

I notice that if I edit a file in my repo, stage it but don't commit it, edit it again, stage it again but don't commit etc. then for each time I do this a new snapshot is taken and the disk space increases.

Furthermore, if I stage 5 times following tiny edits and finally commit once after all stagings, the disk space of the repo still increases approximately 5x the file size.

My question is, why doesn't git just forget about the other staged versions if only the latest one has a commit sha1 reference to the state? The other 4 staged versions will be garbage collected? Is there a way to checkout a staged state which was never committed?

388

asked Jan 26 '19 07:01

sphere

1 Answers

TL;DR

See git fsck --lost-found.

Longer, point-by-point

My question is, why doesn't git just forget about the other staged versions if only the latest one has a commit sha1 reference to the state?

It does ... eventually.

The other 4 staged versions will be garbage collected?

Yes, when git gc eventually runs automatically. If you want this to happen sooner, you can run git gc yourself, but there's only rarely any reason to bother (the common case being oops, I did not mean to git add 10terabytes.db).¹

Is there a way to checkout a staged state which was never committed?

Sort of. The git checkout command cannot do it because git checkout works by file names, and these staged content-only blobs have no file name. They have only a hash ID. To extract their data, you must first find their hash ID. This is easy to do: you just checksum the data the way Git would, which just means that you need to have the data available first, in order to get the data. :-)

Alternatively, you can do much of what git gc does, which is:

Enumerate every object ID in the object database.
Enumerate every reachable object ID. For details on reachability, see Think Like (a) Git. Note that reachability here includes all reflog entries for all references, and all index and HEAD entries from all active work-trees.²
Subtract the second set of object IDs (reachable) from the first set of IDs (all). The resulting IDs are unreferenced, i.e., objects that are candidates for garbage collection.

(This is a bit slow, but git fsck does it for you, so that you do not have to write code to do it.)

From the set of all unreachable objects, select those that have type blob, i.e., files that were git added but never committed. Inspect each blob, using its hash ID to access it, to see if it is the one you wanted. Here git cat-file -p is useful, or use git fsck --lost-found, which takes each such blob, de-compresses it, and writes the data to an ordinary file in .git/lost-found/other/.

¹Note that you may also need --prune= options: git gc defaults to giving other Git processes 14 days to complete the job of hooking up objects. If you use --prune=all, make sure no other Git activity is occurring.

²When you remember to include work-trees added via git worktree add, you will be doing something the Git folks forgot to do. This is a particularly nasty bug, present in Git version 2.5 through 2.14.*: work being done in an added work-tree can be pruned via an automatic git gc, if you've left that work-tree idle for 2 weeks or more. If you are using git worktree add, I recommend making sure your Git is at least version 2.15.

134

answered Sep 28 '22 14:09

torek

Related questions
                            
                                ERROR: Couldn't find any revision to build. Verify the repository and branch configuration for this job
                            
                                Server Side Hooks on Bitbucket
                            
                                Add files to .gitignore directly from git shell
                            
                                How do I find the common ancestor to rebase onto?
                            
                                Visual Studio showing false errors
                            
                                How can I delete all Git remote branches which are older than a year?
                            
                                How to find the branch for a commit with JGit?
                            
                                Composer won't update due to changes on the current branch
                            
                                How to git checkout such a particular commit using GitKraken?
                            
                                Can the Source0 in a RPM Spec be a git repo?
                            
                                Pack file remove it in git
                            
                                How to take ownership of a Pull Request?
                            
                                How to git status (or show) only submodule changes?
                            
                                Does everyone have plantuml in markdown files in a bitbucket repository?
                            
                                GitLab custom wiki sidebar not working
                            
                                How do I prevent Git from auto-detecting user.email?
                            
                                How to show the change itself using git log pickaxe?
                            
                                Manage hotfixes in Heroku pipeline
                            
                                How to control hash length in git diff header
                            
                                error: cannot update the ref <branch>: unable to append to <git location>: Not a directory

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With