Wednesday, May 13, 2009

When you can't afford to use Subversion

A team doing iterative development, and having project data above or around ½ GB, should invest in a fast Configuration Management System.


Let me clarify why:

- iterative development means frequent branching, and frequent branching means frequent download of all data from a branch => fast download is required,

- frequent branching means frequent synchronization between branches => fast and good difference queries are required, and

- a team with a handful or more people updating and committing means many changes => fast update and commit cycles are required.


Why pick on Subversion?

Don't get me wrong, I like Subversion. I think it is doing tremendous benefit worldwide everyday, but some of it strengths are also its limits.

You can work off line with your Subversion checkout, which is cool when traveling around with a laptop. I do so, and I think the ability to add/delete/revert offline is brilliant.

However, this is only possible because of the .svn folders that are created on your hard drive (when doing a checkout) and the algorithms in your subversion client.

In this design decision resides a clever approach - but alas - also a scaling problem.


What Subversion scaling problems?

- .svn folders are created per data folder => each data folder needs to be traversed by the Subversion client algorithm (on the client computer) and synchronized with a server,

- each .svn folder contains a .svn/text-base folder with a server copy of all files in your data folder => the Subversion client copies server data to the client computer, and syncs against this,

This is essential delegation of CMS server work to a client pc, which is clever, but comes at a price. The price is data duplication and added processing time.


Yeah, but with todays PC's?

In 3 years I worked with project data ranging from ½ GB to 2 GB using Perforce; a costly - but fast - CMS that did its job.

The first year a management interest in changing to Subversion blossomed, and a project with a couple of GB data was added to the Subversion server, which resulted in nagging team members. They were used to something faster.

A year after another management interest in changing to Subversion came around, and a group was to investigate the performance of Subversion vs. Perforce. The group could verify that it was a good idea for some projects - those with a lot of data - to use Perforce.

Recently, I worked with another team that had around 4 GB project data when checked out from Subversion. This team had new machines, but it could take 15 min. just to update a project branch.


Symptoms you might avoid by paying for a fast CMS

- you might avoid that people don't want to update often, because the CMS is too slow,

- you might avoid that people don't want to branch, since checking the new branch out is too painful,

- you might limit how often people are interrupted while working while waiting for SCM operations to complete, and

- you might avoid that your continuous integration servers are slow with responses.

No Guarantees? No. Unfortunately not. The elimination of a CMS speed problem does not rid one from all other CMS related problems.


Tips when stuck with a slow CMS

- Think that bad SCM decisions/practices are expensive,

- Keep original third party in its own repository,

- Keep modified third party in its own repository,

- Divide your project data into repositories e.g. products or even components.


Thanks for reading this far. I hope you found some of it useful.

(Please send a comment. Feedback is greatly appreciated.)