Consider this simple python code, which demonstrates a very simple version control design for a dictonary:
def build_current(history):
current = {}
for action, key, value in history:
assert action in ('set', 'del')
if action == 'set':
current[key] = value
elif action == 'del':
del current[key]
return current
history = []
history.append(('set', '1', 'one'))
history.append(('set', '2', 'two'))
history.append(('set', '3', 'three'))
print build_current(history)
history.append(('del', '2', None))
history.append(('set', '1', 'uno'))
history.append(('set', '4', 'four'))
print build_current(history)
for action, key, value in history:
if key == '2':
print '(%s, %s, %s)' % (action, key, value)
Notice that by using the history list you can reconstruct the current dictionary in any state it once existed. I consider this a "forward build" (for lack of a better term) because to build the current dictionary one must start at the beginning and process the entire history list. I consider this the most obvious and straight forward approach.
As I've heard, early version control systems used this "forward build" process, but they were not optimal because most users care more about recent versions of a build. Also, users don't want to download the entire history when they only care about seeing the latest build.
My question then is, what other approaches exist for storing histories in a version control system? Perhaps a "backwards build" could be used? That might allow users to only download recent revisions without needing the entire history. I've also seen a few different formats for storing the history, namely: changesets, snapshots, and patches. What are the differences between changesets, snapshots and patches?
Of the modern popular version controls available, how do they store their histories and what are the advantages of their various designs?
Version control, also known as source control, is the practice of tracking and managing changes to software code. Version control systems are software tools that help software teams manage changes to source code over time.
The most popular way to maintain version control of documents is using a revision control system. It's the best way to manage changes and track who made what changes and when. Document control software is used in software development and many other fields, but they're particularly helpful for managing documents.
The types of VCS are: Local Version Control System. Centralized Version Control System. Distributed Version Control System.
You mentioned these 3 methods of storing (file)-history:
Subversion uses forward deltas(aka patches) in FSFS repositories and backward deltas in BDB Repositories. Note that these implementations have different performance characteristics:
forward deltas are fast in committing, but slow on checkouts(as the "current" version must be reconstructed)
backward deltas are fast in checking out but slow on commit as new deltas must be constructed to construct the new current and rewrite the previous "current" as a bunch of deltas
Also note that FSFS uses a "skipping delta" algorithm which minimizes the jumps to reconstruct a specific version. This skipping delta however is not size optimized as mercurials snapshots; it just minimizes the number of "revisions" you need to build a full version regardless of the overall size.
Here a small ascii art (copied from the specification) of a file with 9 revisions:
0 <- 1 2 <- 3 4 <- 5 6 <- 7
0 <------ 2 4 <------ 6
0 <---------------- 4
0 <------------------------------------ 8 <- 9
where "0 <- 1" means that the delta base for revision 1 is revision 0.
The number of jumps is at most log(N) for N revisions.
Also a very pretty effect on FSFS is that older revision will be written only once and after this they will be only read by further actions. That's why subversion repositories are very stable: as long as you do not have a HW failure on your harddisk, you should be able to get a working repository even if some corruption occurred in the last commit: You still have all older revisions.
In BDB Backend you have constantly rewrite the current revision on checkins/commits, which makes this process prone to data corruption. Also as you store the full text only in current revision, corrupting the data on commit will likely destroy great parts of your history.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With