Prediction #3: Git and distributed version control invade the enterprise, widespread panic and benefits follow

Posted by

Three years ago, in his presentation on the Git Distributed Version Control Systems (DVCS), Linus Torvalds provoked developers by declaring that “if you actually like using CVS… you should be in some mental institution.” Since then, awareness of and discussion around both DVCS and mental health have grown rapidly. For many considering a migration to DVCS, discussions of trade-offs are filling coffee breaks. After the first migration steps are taken, the conversations get more heated and circle around getting DVCS best practices under control. For teams and organizations modernizing their ALM stack, this year will involve deciding whether to begin moving to a DVCS system. This post discusses the fundamental benefits that DVCS will bring to your Agile ALM stack, and identifies the gaps in tools and best practices that need to be addressed in order to avoid having the inmates get overly annoyed with the management of their asylum.

In 2009, the Eclipse Community Survey showed that 20% of respondents used CVS and 58% used Subversion (SVN), with CVS coming in at 24% for companies over 5,000 employees. While DVCS usage has been growing, the portion claimed by CVS and SVN has also grown in the 2010 survey. The mental model of CVS and SVN is entrenched for the average enterprise developer and builds on decades of the knowledge, tools and best practices that have built up around centralized version control. Tools built on top of CVS and SVN, such as the Eclipse and Mylyn team support, have also modernized these dated protocols and servers. But what the rapid rise of Git in open source has made clear is the limitations of traditional file-based change tracking and the centralized model for commit access control. The current adoption rate of Git is sufficient evidence that DVCS works and scales for open source development. The question then turns to how those benefits can be realized by your teams and organizations.

The key benefit of DVCS is that the collaboration rules and conventions captured by the version control system align with the trust network that forms organically between developers. The centralized version control model relies on component boundaries and restrictions on committer rights for managing the division of labour required for software configuration management (SCM) of complex systems. This perspective falls into the same trap as thinking that a modular, object-oriented decomposition of software is sufficient to manage the complexity of a large system. It is not. Dependency injection, aspect-oriented and declarative programming are key components of the modern programming model. In other words, concerns that crosscut the hierarchical decomposition of any large system need to be captured in order to evolve the implementation of that system effectively.

Tasks, whether in the form of user stories, features or defect fixes, are what define the planning and lifecycle management of large applications. Tasks crosscut the system’s modularity by capturing the discussions between developers and end users around the software, not the API and component boundaries of the software. Aggregations of tasks into user stories, plans and releases drive the evolution of the software over time and define the boundaries that naturally form between developers. Centralized version control systems have focused entirely on changes and components, and ignored this more social dimension of coding. Managing a large system meant breaking a system into components and managing access control with commit rights. This aligned version control with the system’s modularity, not with the way that developers work on the system in terms of experimenting with and collaborating on changes to the system. The crosscutting nature of developer collaboration and review activities needed to be captured and facilitated, and DVCS provides us just that.

In the video above, a visualization of directory structure corresponds to the modularity of the open Mylyn project. After watching the video for a few moments, we see that the activity of the developers themselves layers on top of the software’s modularity and has a structure of its own. This structure corresponds to the tasks, change sets and patch reviews done by the Mylyn committers. By facilitating the common operations needed to support this workflow between developers, such as task-based branching and merging, DVCS changes the game by making it easy for developers to capture their activity within the SCM system itself rather than tracking it through external tools, or not capturing it at all. By expressing the collaboration, review and trust patterns common with open source developers, DVCS has fundamentally transformed SCM.

That’s the good news. The bad news is that it will take the better part of this decade for many organizations to consume the benefits of DVCS. That timeframe may seem pessimistic given that some large organizations are already in the midst of deploying Git and Mercurial. But DVCS is a discontinuous innovation. It both provides a transformational benefit over centralized version control and requires a fundamental shift in developer practices that previously used a considerably simpler model. Consider the introduction of object-oriented programming (OOP). The core ideas were already baked in the 1970s, but it took decades and truckloads of books to turn the ideas into a scalable development practice in the enterprise. OOP introduced polymorphism and type hierarchies, neither of which supported the older style search and grep-based code navigation. A whole new breed of tool support needed to be created in order for developers to get a handle on navigating OOP system structure.

While the switch from centralized to distributed version control is a smaller paradigm shift than the switch from procedural to object-oriented programming, the scope of the change should not be underestimated. Problems will arise in three categories. First, the mental switch required from the average developer, accustomed to committing to the main line of the code with branches being few and far between, will be a source of friction and costly in terms of developer training. Second, tool support for discontinuous integrations tends to lag the initial deployments. For the near future, developers accustomed to full-featured graphical or IDE-based clients will find themselves back in the command-line when working with Git and Mercurial. Those with some muscle memory for command-line version control operations will still need to retrain their brains, since Git’s command line options are more complex. Finally, the biggest challenge organizations adopting DVCS in 2011 will face is a lack of clear best practices. If you are launching a new project with the social dynamics of the Linux Kernel, you’re all set and don’t need to look beyond the best practices that Linus has forged. But if you are a medium or large organization, you will need to define the various practices around branching, merging, rebasing, reviewing and staging that are needed to effectively map tools like Git onto your development processes. The higher complexity of DVCS introduces a broader variability of best practices. Today we are using different Git practices for developing the Mylyn and the Tasktop codebases.

Organizations pushing the bleeding edge of Git adoption will discover that these problems cascade. Neither the change in developer mind-set nor best practices are encapsulated by the Git tool support available today. This means that many developers will struggle to get a handle on needing to gain a sufficient level of Git expertise and best practices, and these will have to be learned individually rather than being captured in the repeatable process provided by tool support. That approach scales to a few dozen motivated developers, but much less so with a few thousand. The top technical people at these organizations will invariably push for DVCS migration, provided they see the fundamental productivity benefits as attainable. But until DVCS is further along the technology adoption lifecycle, for most organizations the move must be made stepwise, starting with non-core and greenfield projects establishing the best practices and conventions. On Tasktop and Mylyn development we have migrated 1/5th and 1/3rd of the codebases respectively, allowing ourselves time to refine the best practices suitable for each of those, and add the EGit support needed for our developers and contributors to be productive.

By architecting version control around the social structure of the software development process, distributed version control has aligned the Agile and task-focused planning process with the versioning and change operations that we use to manage our software’s history. Since both the task-focused and the DVCS movement have focused on the natural cadence of coding and developer collaboration, there has been convergent evolution between tools like Mylyn and Git. In the not-too distant future, the developer will activate a task, the appropriate Gerrit change or topic branch will be created, stashed when the task is deactivated, and a continuous integration build for the topic branch spun up, with failures percolated back to the developer’s desktop as updates to the task. As DVCS best practices evolve, we will see more of this natural mirroring between the Agile planning loop and the DVCS mechanisms converging, lined up with tool-supported best practices.

Successful open source projects have a community and contribution driven planning loop, and can already reap the benefits of DVCS. But the enterprise software planning loop is more complex. In his dogmatic focus on open source software, when proclaiming that everyone using centralized version control was insane, Linus ignored the work required to bring the benefits of open source technologies to the software development community at large. The average developer’s training needs to be accounted for, best practices for incorporating DVCS into the planning loop must be defined, and tool-based automation is needed to bridge the gaps.