<div> </div> I found mail where Linux Torvalds says: <blockquote> ...go play with Monotone. Really. They use a "real database". </blockquote> And became interested - why popular VCS's do not use databases, and implement own data storing models to achieve same goals - transactions, durability, etc?

Because databases usually have their storage and retrieval methods designed for tasks largely tangential to those of VCS systems. Using a special approach to managing data provides an ability for implementations to highly optimize their code for the use cases of a VCS system. While the needs of a DVCS storage subsystem might surely be mapped to the relational model of a "real database", why should it be? A DVCS does not need formal queries (and even less does it need SQL) and rather than trying to hint its database subsystem on the ways to go faster it might just implement the fastest and safest ways to access the data it manages. Note that frustration with the Monotone's horrid speed was the reason Linus started writing Git (he did consider existing DVCS solutions first after BitMover pulled the rug from under the feet of Linux developers). And another (lesser-visible) system using real database, Fossil, doesn't have stellar performance (PDF) either. Git started as a minimal set of tools implementing a versioned file system, and its author (Linus Torvalds) originally envisioned that a full-blown VCS will be a tool based on Git. In reality, Git itself started to quickly accumulate features making it a full-blown VCS so that while certain separation of those levels still exists, they're not separate projects. Two other interesting points about Git's storage subsystem: <ul> <li>Originally it just stored its objects in separate files. Afterwards it had been taught to transparently switch the storage of least frequently accessed objects to the so-called "packfiles" which are kind of compressed archives with built-in indexes for fast traversal and access. The point is that the devs studied the performance of the existing solution and carefully engeneered an improvement which worked best to solve the problem at hand.</li> <li>It is being improved with regard to speed. For instance, another pile of patches speeding up the Git index (staging area) has been discussed in the fall of the last year. The point is that such improvements are not coded just for the sake of them but are based on studying the performance on real-world high workloads.</li> </ul> Mercurial, which takes an approach different to Git's in the way it stores its data, uses a special storage format which facilitates usage of differential data. So it appears that the tools which use "real database" might be classified into these broad groups: <ul> <li>"Ideal design". This is Monotone and Fossil. Supposedly the creators of such tools think that using a "real database" gives them all the benefits of using one (such as durability) for free. And these benefits are quite real (and using Sqlite for the storage makes backups a no-brainer). While the benefits are real, code implementing custom storage backends in other VCS systems does provide durability. Note that while "real databases" employ clever tricks to try to ensure the data they store is always correct and consistent then don't do any magic: everything still boils down to using proper ordering of file operations, <code>fsync()</code>s etc.</li> <li>"Enterprisey" way of thinking. This is Veracity for instance, which at least claimed support for RDBMS backends in its commercial plugins. Enterprises usually have had invested in a "big" database like Oracle or SQL Server or whatever and their management like "high-profile" solutions. An upside of using such a system is that it is usually professionally administered, provides fine-grained access controls, backups etc. Obvious downsides of using an RDBMS is lack of distribution (the "D" is missing from "DVCS") and the loss of the gereral ease of setting things up.</li> </ul> <hr> Bonus reading which looks at custom storage formats at a different angle: Keith Packard's thoughts on why repository formats matter and a short comment on some of his points from the Mercurial's main developer.

Why git and mercurial dont use database? [closed]

2 Answers

Because databases usually have their storage and retrieval methods designed for tasks largely tangential to those of VCS systems. Using a special approach to managing data provides an ability for implementations to highly optimize their code for the use cases of a VCS system. While the needs of a DVCS storage subsystem might surely be mapped to the relational model of a "real database", why should it be? A DVCS does not need formal queries (and even less does it need SQL) and rather than trying to hint its database subsystem on the ways to go faster it might just implement the fastest and safest ways to access the data it manages.

Note that frustration with the Monotone's horrid speed was the reason Linus started writing Git (he did consider existing DVCS solutions first after BitMover pulled the rug from under the feet of Linux developers). And another (lesser-visible) system using real database, Fossil, doesn't have stellar performance (PDF) either.

Git started as a minimal set of tools implementing a versioned file system, and its author (Linus Torvalds) originally envisioned that a full-blown VCS will be a tool based on Git. In reality, Git itself started to quickly accumulate features making it a full-blown VCS so that while certain separation of those levels still exists, they're not separate projects.

Two other interesting points about Git's storage subsystem:

Originally it just stored its objects in separate files. Afterwards it had been taught to transparently switch the storage of least frequently accessed objects to the so-called "packfiles" which are kind of compressed archives with built-in indexes for fast traversal and access.

The point is that the devs studied the performance of the existing solution and carefully engeneered an improvement which worked best to solve the problem at hand.
It is being improved with regard to speed. For instance, another pile of patches speeding up the Git index (staging area) has been discussed in the fall of the last year.

The point is that such improvements are not coded just for the sake of them but are based on studying the performance on real-world high workloads.

Mercurial, which takes an approach different to Git's in the way it stores its data, uses a special storage format which facilitates usage of differential data.

So it appears that the tools which use "real database" might be classified into these broad groups:

"Ideal design". This is Monotone and Fossil.

Supposedly the creators of such tools think that using a "real database" gives them all the benefits of using one (such as durability) for free. And these benefits are quite real (and using Sqlite for the storage makes backups a no-brainer).

While the benefits are real, code implementing custom storage backends in other VCS systems does provide durability. Note that while "real databases" employ clever tricks to try to ensure the data they store is always correct and consistent then don't do any magic: everything still boils down to using proper ordering of file operations, fsync()s etc.
"Enterprisey" way of thinking. This is Veracity for instance, which at least claimed support for RDBMS backends in its commercial plugins.

Enterprises usually have had invested in a "big" database like Oracle or SQL Server or whatever and their management like "high-profile" solutions. An upside of using such a system is that it is usually professionally administered, provides fine-grained access controls, backups etc.

Obvious downsides of using an RDBMS is lack of distribution (the "D" is missing from "DVCS") and the loss of the gereral ease of setting things up.

Bonus reading which looks at custom storage formats at a different angle: Keith Packard's thoughts on why repository formats matter and a short comment on some of his points from the Mercurial's main developer.

answered Oct 26 '25 15:10

kostix

Git is designed as a simple key-value data store. In that sense, it can be considered a database, and implementing this database at its core is one of the reasons for its efficiency & flexiblity.

As an alternative answer to your question: Why would they?

answered Oct 26 '25 14:10

Agis

Related questions
                            
                                How to store variable 2D array via Entity Framework?
                            
                                Key-value store as primary database
                            
                                How to implement multiple many to many relations?
                            
                                java.lang.RuntimeException: Unable to copy database file. occuring when trying to pre-populate a ROOM database
                            
                                MySQL Database Export (100 Rows from each table in a database)
                            
                                Storing multiple date ranges
                            
                                Storing view counts in database table
                            
                                What is the Maximum Depth of Embedded Documents Allowed in MongoDb? [duplicate]
                            
                                Scrapy - Database choice [closed]
                            
                                ScyllaDB: Schema vs. No Schema?
                            
                                What is the difference between mongoose.Schema() and new mongoose.Schema()?
                            
                                How can I change current value for identity type primary key column in table in PostgreSQL database? [duplicate]
                            
                                MySql : InnoDB_Force_recovery = 1 leads to table in read only
                            
                                null value for all rows of a column!
                            
                                Python programming finding similar names from a list of names
                            
                                Ideal mysqldump options for backups

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Why git and mercurial dont use database? [closed]

Tags:

git

version-control

database

mercurial

Gill Bates

2 Answers

kostix

Agis

Recent Activity

Donate For Us