I'm designing a directory structure based on UUIDs so I'm looking at what git does to see if it would be a good model.
I can see that git stores objects in a structure where the first two characters of the hash are used as a directory and the rest of the hash is the file name.
What I'm wondering is why? If there's a big advantage to using the directories why aren't more subdirectories created... say a directory for each one or two characters in the hash creating a tree? If there isn't a big advantage then why the directory with the first two chars?
Git stores every single version of each file it tracks as a blob. Git identifies blobs by the hash of their content and keeps them in . git/objects . Any change to the file content will generate a completely new blob object.
In its simplest form, git hash-object would take the content you handed to it and merely return the unique key that would be used to store it in your Git database. The -w option then tells the command to not simply return the key, but to write that object to the database.
Objects folder is a very important folder in the . git directory. In Git, everything is saved in the objects folder as a hash value. By everything I mean every commit, every tree or every file that you create is saved in this directory.
There are 3 main types of objects that git stores: Blob: This object as we have seen above stores the original content. Tree: This object is used to store directories present in our project. Commit: This object is created whenever a commit is made and abstracts all the information for that particular commit.
Git switches from "loose objects" (in files named like 01/23456789abcdef0123456789abcdef01234567
) to "packs" when the number of loose objects exceeds a magic constant (6700 by default but configurable, gc.auto
). Since SHA-1 values tend to be well-distributed it can approximate total loose objects by looking in a single directory. If there are more than (6700 + 255) / 256 = 27 files in one of the object directories, it's time for a pack-file.
Thus, there's no need for additional fan-out (01/23/4567...
): it's unlikely that you will get that many objects in one directory. And in fact, greater fan-out would tend to make it harder to detect that it is time for an automatic packing, unless you set the threshold value higher (than 6700), because (27 + 255) / 256 is 1—so you'd want to count everything in 01/*/
instead of just 01/
.
One could use 0/1234567...
and allow up to ~419 objects per directory to get the same behavior, but linear directory scans (on any system that still uses those) are O(n2), and 272 is a mere 729, while 4192 is 175561. [Edit: that only applies to file creation, where you have a two stage search, once to find that it's OK to create and a second to find a slot or append. Lookups are still O(n).]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With