The Evolution of Version Control Systems

Version control has always been around - it started with simply copying the entire code base into dated directories. Obviously this was not ideal because:

  • Many duplicated files would have to be created, taking up a lot of disk space
  • It's hard to keep track of who made each change, and the purpose of the change - is it a bug fix, or a new feature, or refactoring?
  • When a bug is reported, it's hard to identify the version at which the bug was introduced
  • It forces a linear development flow - any bug fixes or optimisations cannot be applied to previous versions without the risk of breaking the old code

Since then, many popular version control systems (VCS), also known as source control management (SCM), have been created to tackle these issues.

Source Code Control System (SCCS)

Source Code Control System (SCCS) was the first VCS that saw popular adoption. It was a proprietary tool developed in 1972, at Bell Labs by Marc Rochkind, for their IBM OS/360 system (they weren't using UNIX at that point).

The Source Code Control System (SCCS) is a software tool designed to help programming projects control changes to source code. It provides facilities for storing, updating, and retrieving all versions of modules, for controlling updating privileges for identifying load modules by version number, and for recording who made each software change, when and where it was made, and why.

The source code control system, IEEE Transactions on Software Engineering Volume: SE-1 Issue: 4

Since SCCS is proprietory (until 2006), the GNU Project has also written a free version called Compatibly Stupid Source Control (CSSC).

Deltas

SCCS stores all versions of a file in that file; the original version and any revisions are recorded as deltas. Since changes are applied to the code sequentially (one after the other), deltas are also stored sequentially in a chain. To produce a specific version of the file, SCCS starts from the original version, follows the chain of deltas and apply them sequentially.

To provide a conceptual example, let's suppose we have one file under version control, and it's content is simply a short list of words:

apple
clementine
dates

Then let's suppose we add an extra line after line 1e:

apple
banana
clementine
dates

This change can be recorded as a set of instructions:

  • Add a new line after line 1
  • Append the string banana to it

For our second revision, we might delete clementine:

apple
banana
dates

Likewise, this change may be recorded as the instruction:

  • Delete line 3

The sets of instructions that represents the changes are the deltas. Each delta is automatically given a version number, and is stamped with the time and date at which the revision was made, the name of the contributor, and an required comments field to explain why the change was made.

Instead of storing each version of the file as is, SCCS is only storing the original version and the deltas. We have 2 deltas in our example, and to get to the latest version, we will simply start from the original, and apply all the deltas in the correct order:

  • Add a new line after line 1
  • Append the string banana to it
  • Delete line 3

It's important that deltas are applied sequentially, otherwise, our example may end up with the following file instead:

apple
banana
clementine

This means if a portion of the code inside the file has not changed, it will not be duplicated - only changes are recorded. Now imagine that our file is not just 3 lines long, but 500 lines long, and it's not just one file, it's an entire code base with thousands of files - storing only the deltas will save a lot of space.

Mechanism

When developers want to make a change to a file, they can request a copy of the file to be copied into an auxiliary file.

get [name] -e

This produces an auxiliary file suffixed with .a. It also locks the file so no other deltas are able to be added to it.

The developers would then make changes to the auxiliary file. After the intended changes have been completed, the developers would issue this command:

delta [name]

This will tell SCCS to compare the differences between the original file and the modified version, and generate a list of line insertions and deletions that will reproduce the modifications. This is then stored, as text entries prefixed by an I or D, depending on the operation, back in the same file, after existing deltas.

Releases and Levels

SCCS even had a basic concept of branches (discussed later), expressed as releases and levels in the original paper. Each release has its own set of deltas (called levels), and new deltas can be added to the end of the release.

For example, we may start development at release 1, level 1 (1.1). When a new feature is added, for example, we will store that as another delta on release 1, level 2 (1.2). This may continue for a few more revisions until it is ready to be released. Let's assume 4 revisions have been made on release 1, and we are now at 1.5.

After 1.5 is locked (no more new features are to be added to it), a new release is initiated, resetting the level back to 1; so our next delta would be 2.1.

Revision Control System (RCS)

SCCS was the first version control system, and it introduced some important concepts still in use today:

  • Storing only the files changes (as deltas) instead of entire copies of the code base
  • Checksums for each delta were generated to ensure the integrity of the source, and allow the VCS to detect corruption immediately
  • Releases and levels which will go on to become branches

However, as pioneering as it was, there were many limitations:

  • SCCS ran locally, everyone must work on the same machine
  • Once the source code is retrieved by one party and marked for modifications, the code is locked, preventing others from submitting a new delta. This means all developers must work on the same machine, with the same sets of files
  • local time and no timezone

Many of these issues has since been addressed by SCCS, but back in the early 1980s, another VCS was already starting to dominate. The Revision Control System (RCS) was developed by Walter F. Tichy at Purdue University, and released in 1982, to address some of the shortcomings of SCCS at the time.

  • Instead of storing the original version and applying deltas sequentially, RCS does the reverse - it stores the latest version of the source code, and generates reverse patches in order to recreated older versions. This means, for a project with many commits, retrieving the latest version of the source code is going to be quicker.
  • Multiple users can work on the same code base. Users can check-out files, which will be marked so no one else can check out the same file. After the changes have been made, they can check the file back in.
  • Better user interface
  • Added the idea of branches - where different developers can start from the same version of the code, create branches for each feature they're to work on, and then merge their branch back into the main branch

However, RCS did have a few drawbacks. Whilst SCCS had checksums that detected corrupt files, RCS did not have this capability.

SCCS and RCS are both considered the first-generation VCSs, and worked only locally, which means the source code being worked on can only reside on one machine at any time.

Concurrent Versions System (CVS)

The next advancement of VCS came with client-server VCSs, which stores the source code on a single server, and allows users to download the source code on their local machine, modify it, and submit the changes (check-in) to the centralised server.

The first one of these second-generation VCSs is the Concurrent Versions System (CVS), developed by Dick Grune, and came out in 1986. Apart from supporting a client-server architecture, CVS made many other improvements, the most significant of which are:

  • The source code is no longer locked when it is checked out for editing. Users can each work on their own revision of the code, commit their changes and merge them together.
  • Can run scripts following each commit, such as to send emails notifying maintainers of a new change
  • Added the idea of tags

For at least a decade, it was the most popular VCS around; this led to more and better clients being developed to interact with CVS, such as TortoiseCVS, released in 2000. TortoiseCVS was unique in that it added entries to Window's contextual menu, allowing you to right-click on a file, and perform CVS operations like 'Update' and 'Commit'.

Subversion (SVN)

Subversion (SVN) was created as the successor to CVS, designed to be mostly compatible with CVS. It was commissioned by CollabNet, Inc., and the first version was released in 2004, and provided many improvements, most notably:

  • Atomic commits - with CVS, if a commit was interrupted, it can lead to corruption of the repository. With SVN, either the entire commit was successful, or it's not applied at all
  • Recognizing renamed/moved files - previously, renamed/moved files are regarded as an operation where a file is deleted and a new one created, losing the deleted file's history

CVS and Subversion were both a type of centralized VCS (CVCS) - there is a single master repository that holds the 'official' version; any changes must be brought back to this official version so that others can get hold of the changes.

Distributed VCSs (DVCSs)

In April 2005, two new VCSs came onto the scene - Git and Mercurial. They differed from SVN in that they were distributed VCSs (DVCSs). There's no longer a central, 'official' source; instead every client would hold a complete copy of the entire source code, operate on it locally to produce changesets, or patches, and exchanges these changesets with other developers directly. It is a peer-to-peer system, similar to torrents, or Ethereum.

This does not mean there cannot be a central source; in fact most projects using DVCSs do have an agreed-upon 'official' source, but the difference is that it's not a requirement.

This is an improvement over centralized VCSs because:

  • Operations with DVCSs are much quicker, as they are applied locally, whereas a CVCSs operate on a remote server
  • Prevents data loss - since every client holds a complete copy of the repository, if anyone loses their copy, a new copy can be obtained from someone else. With CVCSs, you must backup the copy on the centralized server to ensure data is not lost

A DVCS also suits the open source community a lot more because developers can commit their modifications without having to send it to a centralized server, this means they can have private copies for their own use. It also means they don't require 'committer' permissions from the main repository, they can just commit to their own, modified copy.

Daniel Li

Full-stack Web Developer in Hong Kong. Founder of Brew.

Hong Kong http://danyll.com