Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the internal format of a Git tree object?

What is the format of a Git tree object's content?

The content of a blob object is blob [size of string] NUL [string], but what is it for a tree object?

like image 408
Bystysz Avatar asked Feb 09 '13 19:02

Bystysz


People also ask

What is a tree object in Git?

A "tree" in Git is an object (a file, really) which contains a list of pointers to blobs or other trees. Each line in the tree object's file contains a pointer (the object's hash) to one such object (tree or blob), while also providing the mode, object type, and a name for the file or directory.

What is tree structure in Git?

A Git tree object creates the hierarchy between files in a Git repository. You can use the Git tree object to create the relationship between directories and the files they contain. These endpoints allow you to read and write tree objects to your Git database on GitHub.

What is git tree command?

The git log command is a useful command that allows you to look at Git commits history. However, this text-based log may not be preferred by most users, since the output can be very difficult and complex to visualize and interpret. A more visually appealing way to present this log is in the form of a Git tree.

What are the four different types of Git objects?

Git places only four types of objects in the object store: the blobs, trees, commits, and tags. These four atomic objects form the foundation of Git's higher level data structures. Each version of a file is represented as a blob.


2 Answers

The format of a tree object:

tree [content size]\0[Entries having references to other trees and blobs] 

The format of each entry having references to other trees and blobs:

[mode] [file/folder name]\0[SHA-1 of referencing blob or tree] 

I wrote a script deflating tree objects. It outputs as follows:

tree 192\0 40000 octopus-admin\0 a84943494657751ce187be401d6bf59ef7a2583c 40000 octopus-deployment\0 14f589a30cf4bd0ce2d7103aa7186abe0167427f 40000 octopus-product\0 ec559319a263bc7b476e5f01dd2578f255d734fd 100644 pom.xml\0 97e5b6b292d248869780d7b0c65834bfb645e32a 40000 src\0 6e63db37acba41266493ba8fb68c76f83f1bc9dd 

The number 1 as the first character of a mode shows that is reference to a blob/file. The example above, pom.xml is a blob and the others are trees.

Note that I added new lines and spaces after \0 for the sake of pretty printing. Normally all the content has no new lines. Also I converted 20 bytes (i.e. the SHA-1 of referencing blobs and trees) into hex string to visualize better.

like image 99
lemiorhan Avatar answered Oct 03 '22 18:10

lemiorhan


I try to elaborate a bit more on @lemiorhan answer, by means of a test repo.

Create a test repo

Create a test project in an empty folder:

$ echo ciao > file1             $ mkdir folder1                  $ echo hello > folder1/file2      $ echo hola > folder1/file3      

That is:

$ find -type f           ./file1                    ./folder1/file2            ./folder1/file3            

Create the local Git repo:

$ git init  $ git add .  $ git write-tree  0b6e66b04bc1448ca594f143a91ec458667f420e 

The last command returns the hash of the top level tree.

Read a tree content

To print the content of a tree in human readable format use:

$ git ls-tree 0b6e66 100644 blob 887ae9333d92a1d72400c210546e28baa1050e44    file1   040000 tree ab39965d17996be2116fe508faaf9269e903c85b    folder1 

In this case 0b6e66 are the first six characters of the top tree. You can do the same for folder1.

To get the same content but in raw format use:

$ git cat-file tree 0b6e66 100644 file1 ▒z▒3=▒▒▒$ ▒►Tn(▒▒♣D40000 folder1 ▒9▒]▒k▒◄o▒▒▒i▒♥▒[%  

The content is similar to the one physically stored as a file in compressed format, but it misses the initial string:

tree [content size]\0 

To get the actual content, we need to uncompress the file storing the c1f4bf tree object. The file we want is -- given of the 2/38 path format --:

.git/objects/0b/6e66b04bc1448ca594f143a91ec458667f420e  

This file is compressed with zlib, therefore we obtain its content with:

$ openssl zlib -d -in .git/objects/0b/6e66b04bc1448ca594f143a91ec458667f420e tree 67 100644 file1 ▒z▒3=▒▒▒$ ▒►Tn(▒▒♣D40000 folder1 ▒9▒]▒k▒◄o▒▒▒i▒♥▒[% 

We learn the tree content size is 67.

Note that, since the terminal is not made for printing binaries, it might eat some part of the string or show other weird behaviour. In this case pipe the commands above with | od -c or use the manual solution in the next section.

Generate manually the tree object content

To understand the tree generation process we can generate it ourselves starting from its human readable content, e.g. for the top tree:

$ git ls-tree 0b6e66 100644 blob 887ae9333d92a1d72400c210546e28baa1050e44    file1   040000 tree ab39965d17996be2116fe508faaf9269e903c85b    folder1 

Each object ASCII SHA-1 hash is converted and stored in binary format. If what you need is just a binary version of the ASCII hashes, you can do it with:

$ echo -e "$(echo ASCIIHASH | sed -e 's/../\\x&/g')" 

So the blob 887ae9333d92a1d72400c210546e28baa1050e44 is converted to

$ echo -e "$(echo 887ae9333d92a1d72400c210546e28baa1050e44 | sed -e 's/../\\x&/g')" ▒z▒3=▒▒▒$ ▒►Tn(▒▒♣D   

If we want to create the whole tree object, here is an awk one-liner:

$ git ls-tree 0b6e66 | awk -b 'function bsha(asha)\ {patsplit(asha, x, /../); h=""; for(j in x) h=h sprintf("%c", strtonum("0x" x[j])); return(h)}\ {t=t sprintf("%d %s\0%s", $1, $4, bsha($3))} END {printf("tree %s\0%s", length(t), t)}' tree 67 100644 file1 ▒z▒3=▒▒▒$ ▒►Tn(▒▒♣D40000 folder1 ▒9▒]▒k▒◄o▒▒▒i▒♥▒[%   

The function bsha converts the SHA-1 ASCII hashes to binaries. The tree content is first put into the variable t and then its length is calculated and printed in the END{...} section.

As observed above, the console is not very suitable for printing binaries, so we might want to replace them with their \x## format equivalent:

$ git ls-tree 0b6e66 | awk -b 'function bsha(asha)\ {patsplit(asha, x, /../); h=""; for(j in x) h=h sprintf("%s", "\\x" x[j]); return(h)}\ {t=t sprintf("%d %s\0%s", $1, $4, bsha($3))} END {printf("tree %s\0%s", length(t), t)}' tree 187 100644 file1 \x88\x7a\xe9\x33\x3d\x92\xa1\xd7\x24\x00\xc2\x10\x54\x6e\x28\xba\xa1\x05\x0e\x4440000 folder1 \xab\x39\x96\x5d\x17\x99\x6b\xe2\x11\x6f\xe5\x08\xfa\xaf\x92\x69\xe9\x03\xc8\x5b%                        

The output should be a good compromise for understanding the tree content structure. Compare the output above with the general tree content structure

tree [content size]\0[Object Entries] 

where each Object Entry is like:

[mode] [Object name]\0[SHA-1 in binary format] 

Modes are a subset of UNIX filesystem modes. See Tree Objects on Git manual for more details.

We need to make sure that the results are consistent. To this end, we might compare the checksum of the awk generated tree with the checksum of the Git stored tree.

As for the latter:

$ openssl zlib -d -in .git/objects/0b/6e66b04bc1448ca594f143a91ec458667f420e | shasum 0b6e66b04bc1448ca594f143a91ec458667f420e *-  

As for the home made tree:

$ git ls-tree 0b6e66 | awk -b 'function bsha(asha)\ {patsplit(asha, x, /../); h=""; for(j in x) h=h sprintf("%c", strtonum("0x" x[j])); return(h)}\ {t=t sprintf("%d %s\0%s", $1, $4, bsha($3))} END {printf("tree %s\0%s", length(t), t)}' | shasum 0b6e66b04bc1448ca594f143a91ec458667f420e *-  

The checksum is the same.

Calculate the tree object checksum

The more or less official way to get it is:

$ git ls-tree 0b6e66 | git mktree 0b6e66b04bc1448ca594f143a91ec458667f420e  

To calculate it manually, we need to pipe the content of the script generated tree into the shasum command. Actually we have already done this above (to compare the generated and stored content). The results was:

0b6e66b04bc1448ca594f143a91ec458667f420e *-  

and is the same as with git mktree.

Packed objects

You might find that, for your repo, you are unable to find the files .git/objects/XX/XXX... storing the Git objects. This happens because some or all "loose" objects have been packed into one or more .git\objects\pack\*.pack files.

To unpack the repo, first move the pack files away from their original position, then git-unpack the objects.

$ mkdir .git/pcache    $ mv .git/objects/pack/*.pack .git/pcache/      $ git unpack-objects < .git/pcache/*.pack 

To repack when you are done with experiments:

$ git gc 
like image 24
antonio Avatar answered Oct 03 '22 18:10

antonio