Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Where are the git stash elements stored? Is the data structure really a stack?

Tags:

To understand the git stash better, I would like to know: Where and in what data structure are the elements of the git stash stored? In a stack? In an ordered set?


Details:

Every documentation, article and book (except Git Internals) says that the git stash is a stack.

Recently, I found out that you can retrieve and delete the elements in arbitrary order from the stash -- what a helpful feature. Because of this feature and A Hacker's Guide to Git, it seems to me that the stash rather consists of a chronologically ordered set of references somewhere. However, in .git/refs/stash only the latest stash element is stored, in form of a merge commit (which also contains the date the stash element was created).

Is there another (top-secret pre-index stash-cache;) data structure holding all the stash elements? Or does git stash (list|pop|apply) retrieve the elements from its regular object store? How?

So what data structure do the stash elements form? Is the chronological order of the elements implicitly given by the date of the merge commits? If the elements are in fact stored in a stack, how does git retrieve and delete the elements in arbitrary order?

like image 287
DaveFar Avatar asked Mar 17 '17 17:03

DaveFar


1 Answers

As ElpieKay wrote in a comment, all but the current stash are stored in the reflog for the reference refs/stash. See the definitions of "ref" and "reflog" in the gitglossary. Note that branch names like master and develop are one kind of reference, and are short for the full name refs/heads/master and refs/heads/develop respectively. Tag names are another kind of reference; the tag v2.2 is really the reference refs/tags/v2.2.

Most references are prefixed with refs/. In fact, the various HEADs—HEAD itself, and MERGE_HEAD, CHERRY_PICK_HEAD, ORIG_HEAD, and so on—are the only exceptions, and most of those don't have reflogs. HEAD is the only one that does.

Normally, reflog entries are simply linearly-numbered: HEAD@{1} or master@{1} is "the commit to which HEAD or master pointed before its most recent update", master@{2} is the commit two steps ago, and so on. This is described in gitrevisions. For convenience, the current value can be referred-to with @{0}: master@{0} and master always resolve to the same hash ID. If the refs/stash reference were used in the same way as other references, it would work as a queue, rather than a stack—but it's not used that way. Instead, the git stash code explicitly deletes early entries.

Since the numbering is always sequential, deleting an entry causes all the higher numbers to drop down by one. If you manually delete master@{5}, for instance, then what used to be master@{6} is now master@{5}, what used to be master@{7} is now master@{6}, and so on.

Adding a new entry, of course, pushes everything up one. So when you create a new stash, the one that used to be stash aka stash@{0} is now stash@{1}. The one that used to be stash@{1} is now stash@{2}, and so on. With other reflogs, like master, no one calls this "pushing", it's just the ordinary queue in action.

Once you delete stash@{0} aka stash, though, all the higher entries—stash@{1}, stash@{2}, and so on—drop down by one, so now stash has been "popped" and the previous stash@{1} is just stash. Of course, you can also git stash drop stash@{4} to delete that particular entry, keeping 0-to-3 and renumbering 5-and-up. Note that git stash pop of any particular stash just means "apply and, if that seems to succeed, drop".

Note, not entirely incidentally, that each reflog entry also has a time-stamp attached. You can write master@{yesterday} or master@{3.hours.ago} and Git will find the hash ID of the appropriate reflog entry, based on the time stamps.1 Because stash identifiers are just reflog entries, this same syntax works there. (I have never actually found this all that useful anywhere, perhaps because I have no sense of time when I am working and cannot remember what day of the week it is now, much less when I did something earlier. :-) ) To go with these time stamps, most reflogs expire: an old reflog entry will, by default, go away after 90 days, or just 30 days if the object it names is not reachable from the current value of the same reference.2 However, refs/stash itself is exempt from this expiration, by default. All of this is configurable: see all of the gc.reflogExpire settings in the git config documentation.


1If you updated the reference several times in a day but not more than once per hour, @{yesterday} means @{24.hours.ago}. If you updated more than once per hour, multiply by another 60: @{1440.minutes.ago}. If you updated more than once per minute, multiply by 60 again: @{86400.seconds.ago}. The resolution does not go any finer than that.

2This is how Git retains for 30 days, but eventually purges, old commits that were abandoned by a git rebase, for instance. Reachability is a key concept provided by the Directed Acyclic Graph or DAG formed by the tags, commits, and tree objects in the repository. (Blobs are in the DAG but have no part in extending it as they are always leaf nodes. Hence a blob may itself be reachable or unreachable, but it never affects the reachability of any other object.)

like image 90
torek Avatar answered Sep 22 '22 11:09

torek