Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

In my repo, how long must the longest hash prefix be to prevent any overlap?

The --abbrev-commit flag can be used in conjunction with git log and git rev-list in order to show partial prefixes instead of the full 40-character SHA-1 hashes of commit objects. According to the Pro Git book,

it defaults to using seven characters but makes them longer if necessary to keep the SHA-1 unambiguous [...]

Additionally, short SHAs are at least 4-character long. Still according to the Pro Git book,

Generally, eight to ten characters are more than enough to be unique within a project.

As an example, the Linux kernel, which is a pretty large project with over 450k commits and 3.6 million objects, has no two objects whose SHA-1s overlap more than the first 11 characters.

Since the length of the longest prefix required to prevent any overlap among all prefix hashes of commit objects (11, in the case of the Linux kernel) is a crude indicator of a repo's size, I'd like to programmatically determine the corresponding quantity in my own local repository. How can I do that?

like image 538
jub0bs Avatar asked Sep 04 '15 20:09

jub0bs


People also ask

How long is a git hash?

A commit in git always has a hash that contains 40 characters. But to make the id:s easier to handle it also supports using a short version of the id. The short commit id can actually be any number of characters as long as it's unique for a commit within the same repo.

How many characters is a short git hash?

HG and Git short commit hashes should be 12 characters.

What is the least number of characters to type for git to figure out the commit?

Currently an abbreviated git commit hash is only recognised by trac if it's at least 8 characters.

What is a Githash?

Hashes are what enable Git to share data efficiently between repositories. If two files are the same, their hashes are guaranteed to be the same. Similarly, if two commits contain the same files and have the same ancestors, their hashes will be the same as well.


1 Answers

The following shell script, when run in a local repo, prints the length of the longest prefix required to prevent any overlap among all prefix hashes of commit objects of that repository.

MAX_LENGTH=4;

git rev-list --abbrev=4 --abbrev-commit --all | \
  ( while read -r line; do
      if [ ${#line} -gt $MAX_LENGTH ]; then
        MAX_LENGTH=${#line};
      fi
    done && printf %s\\n "$MAX_LENGTH"
  )

The last time I edited this answer, the script printed

  • "9" when run in a clone of the Git-project repo,
  • "9" when run in a clone of the OpenStack repo,
  • "11" when run in a clone of the Linux-kernel repo.
like image 59
jub0bs Avatar answered Oct 09 '22 12:10

jub0bs