Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to disambiguate an ambiguous abbreviated sha1 in git

Tags:

git

I was intrigued by Josh Stone's analysis of sha1 abbreviation collisions.

Let's say somebody wrote down an abbreviated commit id, 8b82547e33, at a time when it was unambiguous. But since then other objects have been created with that same prefix, so that now git tells you (twice, for some reason):

$ git show 8b82547e33
error: short SHA1 8b82547e33 is ambiguous.
error: short SHA1 8b82547e33 is ambiguous.
fatal: ambiguous argument '8b82547e33': unknown revision or path not in the working tree.
Use '--' to separate paths from revisions, like this:
'git <command> [<revision>...] -- [<file>...]'

Now, as a human, I could probably tell which object I meant if git would just show me the ambiguous objects. How can I achieve something like the following?

$ git objects-starting-with 8b82547e33
8b82547e33e: commit: l2tp: Restore socket refcount when sendmsg succeeds
8b82547e338: tree [2 files, 26 subtrees]

(Note: the above examples are using a relatively current clone of http://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git.)

like image 965
Matt McHenry Avatar asked Dec 11 '14 17:12

Matt McHenry


2 Answers

You can use git rev-parse, assuming you have at least a 4-digit prefix of the full hash.

git rev-parse --disambiguate=8b82547e33
like image 71
chepner Avatar answered Oct 22 '22 15:10

chepner


With Git 2.11+ (Q4 2016), you won't even have to type git rev-parse --disambiguate=....

Git will list for you the possible candidates!

See commit 5b33cb1 (27 Sep 2016), and commit 1ffa26c, commit fad6b9e, commit 16ddcd4, commit 0c99171, commit 59e4e34, commit 0016043, commit 5d5def2, commit 8a10fea, commit 7243ffd, commit 259942f (26 Sep 2016) by Jeff King (peff).
(Merged by Junio C Hamano -- gitster -- in commit 66c22ba, 06 Oct 2016)

get_short_sha1: list ambiguous objects on error

When the user gives us an ambiguous short sha1, we print an error and refuse to resolve it.
In some cases, the next step is for them to feed us more characters (e.g., if they were retyping or cut-and-pasting from a full sha1). But in other cases, that might be all they have.

For example, an old commit message may have used a 7-character hex that was unique at the time, but is now ambiguous.
Git doesn't provide any information about the ambiguous objects it found, so it's hard for the user to find out which one they probably meant.

This patch teaches get_short_sha1() to list the sha1s of the objects it found, along with a few bits of information that may help the user decide which one they meant.
Here's what it looks like on git.git:

  $ git rev-parse b2e1
  error: short SHA1 b2e1 is ambiguous
  hint: The candidates are:
  hint:   b2e1196 tag v2.8.0-rc1
  hint:   b2e11d1 tree
  hint:   b2e1632 commit 2007-11-14 - Merge branch 'bs/maint-commit-options'
  hint:   b2e1759 blob
  hint:   b2e18954 blob
  hint:   b2e1895c blob
  fatal: ambiguous argument 'b2e1': unknown revision or path not in the working tree.
  Use '--' to separate paths from revisions, like this:
  'git <command> [<revision>...] -- [<file>...]'

We show the tagname for tags, and the date and subject for commits.
For trees and blobs, in theory we could dig in the history to find the paths at which they were present. But that's very expensive (on the order of 30s for the kernel), and it's not likely to be all that helpful.
Most short references are to commits, so the useful information is typically going to be that the object in question isn't a commit. So it's silly to spend a lot of CPU preemptively digging up the path; the user can do it themselves if they really need to.

And of course it's somewhat ironic that we abbreviate the sha1s in the disambiguation hint.
But full sha1s would cause annoying line wrapping for the commit lines, and presumably the user is going to just re-issue their command immediately with the corrected sha1.

We also restrict the list to those that match any disambiguation hint. E.g.:

  $ git rev-parse b2e1:foo
  error: short SHA1 b2e1 is ambiguous
  hint: The candidates are:
  hint:   b2e1196 tag v2.8.0-rc1
  hint:   b2e11d1 tree
  hint:   b2e1632 commit 2007-11-14 - Merge branch 'bs/maint-commit-options'
  fatal: Invalid object name 'b2e1'.

does not bother reporting the blobs, because they cannot work as a treeish.


Update Nov. 2017 (one year later), this disambiguation will be even faster with Git 2.16 (Q1 2018):
See commit 0e87b85 by Derrick Stolee, initially discussed here.

sha1_name: minimize OID comparisons during disambiguation

Minimize OID comparisons during disambiguation of packfile OIDs.

Teach git to use binary search with the full OID to find the object's position (or insertion position, if not present) in the pack-index. The object before and immediately after (or the one at the insertion position) give the maximum common prefix. No subsequent linear search is required.

Take care of which two to inspect, in case the object id exists in the packfile.

If the input to find_unique_abbrev_r() is a partial prefix, then the OID used for the binary search is padded with zeroes so the object will not exist in the repo (with high probability) and the same logic applies.

This commit completes a series of three changes to OID abbreviation code, and the overall change can be seen using standard commands for large repos. Below we report performance statistics for perf test 4211.6 from p4211-line-log.sh using three copies of the Linux repo:

| Packs | Loose  | HEAD~3   | HEAD     | Rel%  |
|-------|--------|----------|----------|-------|
|  1    |      0 |  41.27 s |  38.93 s | -4.8% |
| 24    |      0 |  98.04 s |  91.35 s | -5.7% |
| 23    | 323952 | 117.78 s | 112.18 s | -4.8% |

Update March 2018, the sha1 disambiguation is more robust, since (before 21.7) while finding unique object name abbreviation, the code may accidentally have read beyond the end of the array of object names in a pack.

See commit 21abed5 (27 Feb 2018) by Derrick Stolee (derrickstolee).
(Merged by Junio C Hamano -- gitster -- in commit 7519a60, 08 Mar 2018)

It is also faster:

sha1_name: minimize OID comparisons during disambiguation

Minimize OID comparisons during disambiguation of packfile OIDs.

Teach git to use binary search with the full OID to find the object's position (or insertion position, if not present) in the pack-index.
The object before and immediately after (or the one at the insertion position) give the maximum common prefix. No subsequent linear search is required.

This commit completes a series of three changes to OID abbreviation code, and the overall change can be seen using standard commands for large repos.
Below we report performance statistics for perf test 4211.6 from p4211-line-log.sh using three copies of the Linux repo:

| Packs | Loose  | HEAD~3   | HEAD     | Rel%  |
|-------|--------|----------|----------|-------|
|  1    |      0 |  41.27 s |  38.93 s | -4.8% |
| 24    |      0 |  98.04 s |  91.35 s | -5.7% |
| 23    | 323952 | 117.78 s | 112.18 s | -4.8% |

Git 2.18 (Q2 2018) improve that candidate listing: when a short hexadecimal string is used to name an object but there are multiple objects that share the string as the prefix of their names, the code lists these ambiguous candidates in a help message.
These object names are now sorted according to their types for easier eyeballing.

See commit 5cc044e, commit a885c93, commit 89f32a9, commit 7248672, commit a264f22 (10 May 2018) by Ævar Arnfjörð Bjarmason (avar).
Helped-by: Derrick Stolee (derrickstolee).
(Merged by Junio C Hamano -- gitster -- in commit ab48bc0, 30 May 2018)

get_short_oid: sort ambiguous objects by type, then SHA-1

Change the output emitted when an ambiguous object is encountered so that we show tags first, then commits, followed by trees, and finally blobs.
Within each type we show objects in hashcmp() order.
Before this change the objects were only ordered by hashcmp().

The reason for doing this is that the output looks better as a result, e.g. the v2.17.0 tag before this change on "git show e8f2" would display:

hint: The candidates are:
hint:   e8f2093055 tree
hint:   e8f21ca commit 2013-06-24 - bash prompt: print unique detached HEAD abbreviated object name
hint:   e8f21d02f7 blob
hint:   e8f21d577c blob
hint:   e8f25a3a50 tree
hint:   e8f2625 commit 2017-02-03 - Merge pull request #996 from jeffhostetler/jeffhostetler/register_rename_src
hint:   e8f2650052 tag v2.17.0
hint:   e8f2867228 blob
hint:   e8f28d537c tree
hint:   e8f2a35526 blob
hint:   e8f2bc0 commit 2015-05-10 - Documentation: note behavior for multiple remote.url entries
hint:   e8f2cf6ec0 tree

Now we'll instead show:

hint:   e8f2650052 tag v2.17.0
hint:   e8f21ca commit 2013-06-24 - bash prompt: print unique detached HEAD abbreviated object name
hint:   e8f2625 commit 2017-02-03 - Merge pull request #996 from jeffhostetler/jeffhostetler/register_rename_src
hint:   e8f2bc0 commit 2015-05-10 - Documentation: note behavior for multiple remote.url entries
hint:   e8f2093055 tree
hint:   e8f25a3a50 tree
hint:   e8f28d537c tree
hint:   e8f2cf6ec0 tree
hint:   e8f21d02f7 blob
hint:   e8f21d577c blob
hint:   e8f2867228 blob
hint:   e8f2a35526 blob

With Git 2.36 (Q2 2022), the error output given in response to an ambiguous object name has been improved.

See commit 3a73c1d, commit d2ef3cb, commit 851b3d7, commit ba5e8a0, commit 667a560, commit 6780e68, commit 8d56136 (27 Jan 2022) by Ævar Arnfjörð Bjarmason (avar).
(Merged by Junio C Hamano -- gitster -- in commit 268e6b8, 25 Feb 2022)

object-name: show date for ambiguous tag objects

Signed-off-by: Ævar Arnfjörð Bjarmason

Make the ambiguous tag object output nicer in the case of tag objects such as ebf3c04 ("Git 2.32", 2021-06-06, Git v2.32.0 -- merge) by including the date in the "tagger" header.
I.e.:

$ git rev-parse b7e68
error: short object ID b7e68 is ambiguous
hint: The candidates are:
hint:   b7e68c41d92 tag 2021-06-06 - v2.32.0  <======
hint:   b7e68ae18e0 commit 2019-12-23 - bisect: use the standard 'if (!var)' way to check for 0
hint:   b7e68f6b413 tree
hint:   b7e68490b97 blob
b7e68
[...]

Before this we would emit a "tag" line without a date, e.g.:

hint:   b7e68c41d92 tag v2.32.0               <=====
like image 33
VonC Avatar answered Oct 22 '22 15:10

VonC