Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does git store file contents as a blob?

Tags:

git

blob

This question may seem trivial to answer, but I am struggling to come up with definitive advantages of storing file contents as blobs and not just its original format (e.g. a text file).

Typically, blobs are used in lieu of other storage formats for media: images, videos, audio, etc. git, at least how I see it commonly used, typically tracks revisions to text files and not multimedia.

To summarize formally: what are the advantages of storing file contents as a blob (converts to binary data) rather than the original format of the revision (e.g. leave it as text)?

like image 398
lolololol ol Avatar asked Feb 24 '18 04:02

lolololol ol


People also ask

What is blob file in Git?

A Git blob (binary large object) is the object type used to store the contents of each file in a repository. The file's SHA-1 hash is computed and stored in the blob object. These endpoints allow you to read and write blob objects to your Git database on GitHub.

How does Git store its data?

Git stores every single version of each file it tracks as a blob. Git identifies blobs by the hash of their content and keeps them in . git/objects . Any change to the file content will generate a completely new blob object.

Does Git store entire files?

Git stores a snapshot of the entire file, not a diff Additionally, many other SCMs store changes as diffs instead of snapshots. However, Git stores an entire snapshot of each modified file.

Where are Git blobs stored?

The contents of your files are stored in blobs, but those blobs are pretty featureless. They have no name, no structure — they're just “blobs”, after all. There it is!


1 Answers

“Blob” just means a sequence of bytes. A blob in Git will contain the same exact data as a file, it’s just that a blob is stored in the Git object database, and a file is stored on the filesystem.

So there is no difference in the format, the only difference is how they are stored.

For example, if you add an image hello.jpg to your repository, and then commit it, you will have two copies of the same data:

  • You will have a file on disk, named hello.jpg, which contains the JPEG data,

  • You will have a blob in your Git object database, named with the hash of its contents, which contains the exact same JPEG data in the same format.

The database can use some fancy tricks to store data efficiently, including compression and using deltas, but in the end it is still storing the exact same data that was in the original file.

A text file is no different. “Text” is just a particular type of data that you can store in a binary file.

like image 100
Dietrich Epp Avatar answered Oct 08 '22 15:10

Dietrich Epp