GitLab rationale
What is this all about?
It can be a dry topic, but at its core, we’re talking about version control – the way in which we record and interact with source code – and how developers work together for the betterment of the code and to help support each other.
Software typically goes through many versions over its lifetime, more than most other types of creative work. That makes it important to record what changed when. But it also often has many authors – sometimes with multiple authors wanting to work on the same piece of code at the same time.
Early solutions to recording changes involved recording a “change log” within one source file of each software component. If well written, that might give you a high-level idea of what changes had been made by whom and why. But it could not describe exactly how they were made, and it was all too easy to write bad descriptions or none at all, and the location of the changelog was often inconsistent, making it hard to find. Avoiding conflicting changes by different authors required ad-hoc methods such as placing a fluffy toy on one developer’s desk after another, and only the developer currently with the toy was allowed to change the component. This doesn’t work well once everyone no longer shares an office!
In 1996, Acorn started using CVS (an initialism for “Concurrent Versions System”) to manage the RISC OS sources, replacing a mish-mash of solutions that had been used for its component parts up until then. They wrote a number of scripts that wrapped up series of CVS commands to enable collections of source trees for individual components to be used together to build complete ROMs or disc images.
Other version control systems have waxed and waned in popularity since then, but CVS served RISC OS well for the next 23 years. Acorn’s customised wrapper scripts presented a hurdle to migrating to anything else, and CVS’s shortcomings could be overlooked if you had become accustomed to working around them.
But CVS is now (in 2019) becoming very long in the tooth. There hasn’t been a new release in 11 years – half the time RISC OS has been using it – and as with all software, security and obsolescence become a concern as it ages. But there’s also a human angle: we want to attract new developers to RISC OS. CVS is becoming a niche skill, so many of this audience will find CVS a hurdle too, and many of those who have used it before will have found their skills have become rusty through lack of use.
Enter Git.
Why Git?
Many version control systems have competed to replace CVS, but since its introduction in 2005, one in particular, Git, has come to dominance, with estimates varying between 70%-90% usage amongst both open and closed source projects. By comparison, CVS has fallen to only around 1%. So what’s so great about Git?
Firstly, Git fixes many little niggles with CVS. It automatically detects binary files. It handles adding and removing source files better, and it tracks source files when they’re renamed. It records the relationship between versions as a directed acyclic graph rather than as a tree, which in essence means it understands at a fundamental level when branches of development have been merged together. (Branching in general is much faster and easier to do in Git than CVS.) It directly records concurrent changes to multiple files as a single commit, rather than relying on named tags to keep track of which version of one file is contemporaneous with which version of another.
But Git is also far more flexible in how you are able to use it. One of its headline features is that it is a “distributed version control system”, which means that every copy of a source tree contains a complete record of its history. While this can still be used in a client-server way like CVS, it means for example you can use the same methods to share changes in any of the following ways:
- directly between collaborators without having to share them via a public server (particularly useful if you’re currently working under NDA)
- between different machines that you own (say between a workstation and a laptop)
- with a copy on a public server, other than the central project (such as one of GitLab’s “fork” projects)
- on the same machine (for example, when different source trees target different platforms, but share some source code)
It may be stating the obvious, but because Git keeps a complete copy of the history, you can examine old commits and record new ones even when you’re offline. But even when you’re online, most Git operations are much more responsive, because there is no network lag involved.
Git has equivalents for all the operations we used to do with CVS – recording commits, combining them with changes made by other people, and then publishing them. But Git provides a toolbox of other facilities that you can make use of as your confidence grows, none of which were provided by CVS, for example:
- using the index to record only a subset of the changes you’ve made to a file into a commit
- rewriting history to make the narrative of a proposed change more coherent
- making lots of little commits and then squashing them together later
- using stashes for short-term branching
- automated bisecting of history to rapidly track down where a bug was introduced
Git also has the concept of “submodules” which are a fairly close conceptual match for the “Products files” which many of Acorn’s CVS wrapper scripts were there to support – and it’s available out of the box!
Best of all, Git is under active development, so new facilities are likely to be added as time goes on.
Where does GitLab come into this? Why not GitHub?
Git by itself doesn’t have the facilities to operate as a server: to operate in a client-server manner, it requires some other software to handle the server management. For RISC OS, it’s still useful to have a central repository: it means the autobuilders and packagers know where to get their source code from, and it means there’s only one definitive place where version numbers of modules get incremented.
There are actually many programs available that provide a web-to-Git bridge, and a number of companies provide Git repository hosting as a service. One of the most fully featured and best-known such services is GitHub. But ROOL is using GitLab instead. Why?
- GitLab is comparable on a feature level with GitHub. In fact, barring one or two places where the terminology differs (notably, GitHub “pull requests” are called “merge requests” in GitLab) it should be familiar to use for anyone used to GitHub.
- GitLab is fully Open Source, which fits with the RISC OS ethos.
- GitLab can be hosted on our own server, which means we can customise it to better suit RISC OS, and better integrate it with our main website.
GitLab provides the infrastructure to make a Git server work, including multiple ways of visualising everything about the source code, and the mechanisms enabling a code review workflow for new code submissions.
Merge requests are the standard way of accepting code changes in GitLab, and they strongly encourage peer code review. Emphasis on “peer” there – this relies on developers working together to improve each others’ code, and it isn’t intended that ROOL will take on this job in its entirety.
Peer code review is a good thing, especially over the longer term. It encourages:
- better code quality – more readable and maintainable, with better documentation, making future development and bugfixing faster and easier
- the identification and remediation of side-effects that may not have occurred to the original author
- a greater sharing of knowledge and responsibility, avoiding “guruisation” of sections of the source tree
- catching bugs, security issues or license violations before they end up being distributed to end users
As a rule of thumb, to avoid the reviewing role falling disproportionately on a few people’s shoulders, you should try to get involved in reviewing at least as many of other people’s merge requests as you raise yourself.
Because GitLab associates each commit to the central repository with a merge request, in future it will be possible when browsing the source tree to rapidly discover more information about why any given line of source was written, what problems it solved and what alternative approaches might have been considered.
How do I get started?
We’re keen to involve many more people with GitLab than were granted CVS write access. After all, having CVS write access meant having great responsibility placed on you, since you could potentially do a lot of damage if you got things wrong. The good news is that GitLab is far safer to use, as there is the merge request mechanism to act as a filter. So you’re positively encouraged to experiment with your GitLab account to gain experience and confidence.
The first step is to sign up for an account by emailing code@riscosopen.org (please let us know your real name, as we can’t accept pseudonymous contributions). We have written up a cheat sheet describing what you do next.
Planned enhancements
Currently, one of the main downsides of Git vs CVS is that there are no mature native Git clients for developers working directly on a RISC OS machine. We have launched a bounty to attempt to remedy this situation.
We have noted a number of minor niggles with GitLab (such as the unnecessary reduction of nested groups to a breadcrumb even when the web page is very wide). However, GitLab is very much under active development, which means there’s a good chance of getting this sort of thing fixed.
One major feature of GitLab that we haven’t yet configured is Continuous Integration (CI). This helps streamline the code review process by automatically building any code submissions, and where possible running automated tests on the code. This also makes binaries for each applicable target platform available for reviewers to test manually on real hardware. It’s a big technical challenge to get this working, but we’re sure you’ll agree it will be worth it in the end!