Become a Git Super User — Part 0: What is a Version Control System? The introduction
This blog is part of the ‘Become a Git Super User’ series which you can find below:
- Become a Git Super User — Part 0: What is a Version Control System? The introduction
- Become A Git Super User — Part 1: How does Git Work?
- Become a Git Super User — Part 2: Installing and Setting Up Git
Version Control Systems: Where chaos meets sanity, and collaboration finds its rhythm.
Welcome to this series, where we cover all you need to know about version control systems but specifically focus on Git and explore its features. This series is based on the popular book Pro Git by Scott Chacon (co-founder of GitHub) & Ben Straub.
In this series, we first cover what a version control system is and why it is necessary to use one. Then we start introducing what Git is, how it was born, and what makes it the most popular version control system of our time.
After the introduction, we go through the different concepts of Git, such as:
- What is a repository, and how to track your changes?
- Branching in Git
- Introduction to some powerful Git tools
- How to customise Git to your style?
- How does Git work behind the scene?
and more …
Whether you are a beginner or an experienced developer, this series will provide the knowledge and skills you need to take your Git workflow to the next level. By the end of the series, you will have a solid understanding of how Git works, and how to use it effectively to manage your codebase.
Without any further to do, let’s jump into exploring what a Version Control System (VCS) is and where Git comes from.
What is a Version Control System (VCS), and Why Do You Need One?
Developers make many changes to the codebase every day. They need a tool to track codebase differences and jump between code versions when required. They may want to work on developing new features in parallel without affecting the code in production, or there might be a bug in the latest application release. There is a need to revert to the app’s last version until the bug is fixed.
Imagine this scenario; Bob just released a new version of the application which introduces some fantastic features, but there is a bug that doesn’t let users log in to the app. The bug fix would take a few days to finish, meaning users cannot access the app. Bob needs to find a way to revert to the latest working version of the app while working on the bugfix, but how can he access the previous versions? That’s where VCSs come in.
According to the book, VCS is defined as
a system that records changes to a file or set of files over time so that you can recall specific versions later.
In simple terms, you use a VCS to track the changes and switch to a particular version of the files when necessary. While VCSs can be used to track any files a computer can store, their primary use case is monitoring changes in the code base developers are working on, which is the main focus of this series.
What are the different types of VCSs?
There are mainly two types of VCSs which are:
i. Centralised Version Control Systems rely on a central server developers use to submit their changes to the codebase. These VCSs are simple to set up and let the server admins control the workflow. The main risk with these VCSs is that only one copy of the codebase is stored in the server. Suppose that server goes down or becomes corrupted. In that case, developers cannot access the codebase or the app’s previous versions [1].
ii. Distributed Version Control Systems solve the problem with centralised VCSs by allowing the developers to keep a copy of the codebase on their machine, make changes to the local copy, and move the local copy to the remote server once everything is ready to publish. These types of VCSs eliminate the risks of the single point of failure and allow the developers to use any local copies as backups to the remote codebase [2].
Now that you’re up to speed on why we use Version Control Systems and the types there are, let’s look at how they all started and came into existence.
Where did it all begin?
The first generation of VCSs
Version Control Systems (VCSs) have a fascinating history! The first VCS, called SCCS, was created by Bell Labs in 1972 for the UNIX operating system. At the time, SCCS was only available for UNIX and only worked with source code files.
Fast forward ten years and the Revision Control System (RCS) was developed by Walter Tichy in 1982 as an alternative to SCCS, becoming the first cross-platform version control system. RCS is a set of UNIX commands that manage revision groups, which are sets of text documents that evolved from each other through manual editing. It is helpful for any frequently revised text that requires preserving previous revisions. It has been applied to store source text for various documents such as drawings, layouts, specifications, and articles [3]. The GNU project now manages the RCS and can be found here.
The second generation of VCSs
The next wave of VCSs, known as Centralised Version Control, emerged in 1986 with the development of the Concurrent Versions System (CVS). CVS was the first VCS to have a central repository that multiple users could use. However, it was still file-focused and kept track of changes in individual files rather than entire directory trees.
CVS’s file-focused approach had some limitations, such as difficulty in tracking directory structure changes, which were later addressed by newer VCS systems like Subversion and Git.
In the late 1980s, Perforce was introduced, a centralised VCS that was widely used during the .com era. It’s still the biggest repository used inside Google.
In 2000, a new product called Subversion was created by Collabnet Inc and is now maintained by the Apache Software Foundation. It is written in C and was designed to be a more robust centralised solution than CVS. It started supporting non-text files and tracking directory structure changes, such as file renames and moves. You could check in an entire directory tree and check it out.
The third generation of VCSs, Git, was born.
This generation of VCSs revolutionised how developers track the codebase changes and how teams collaborate on a single repository. It all started with the development of the Linux Kernel…
The Linux kernel is the most popular open-source software project of a relatively large scope. During the early years of the Linux kernel maintenance (1991–2002), changes to the software were passed around as patches and archived files. In 2002, the Linux kernel project began using a proprietary Distributed Version Control System (DVCS) called BitKeeper.
In 2005, the relationship between the community that developed the Linux kernel and the commercial company that developed BitKeeper broke down, and the tool’s free-of-charge status was revoked. This change to BitKeeper prompted the Linux development community (particularly Linus Torvalds, the creator of Linux) to develop their tool based on some of the lessons they learned while using BitKeeper, which was Git.
Some of the most essential features of this new VCS were (and still are):
- Speed: Git is designed to deliver superior performance. Unlike other version control systems that rely heavily on network latency and constantly need to communicate with a central server, Git operates primarily on your local machine. Most operations in Git only need to access files and resources on the local machine, not on a network, making it significantly faster. Git’s approach to data storage is also different; it takes ‘snapshots’ of the file system, saving it as a commit. Where other systems would store differences between files, Git keeps the complete file content, allowing it to quickly recreate any version of any file directly from its local repository. Have a look at this if you’re interested in some Git benchmarks.
- Simple design: At the heart of Git is a simple yet highly effective and flexible design. Its object model makes it straightforward and consistent. Each entity in Git (whether it’s a file, directory, or commit) is stored as an object with a unique ID. This ID is a hash of the object’s content, making corruption nearly impossible without detection. Moreover, every operation in Git manipulates these objects, making the model consistent and easy to comprehend.
- Strong support for parallel development (thousands of branches): Git provides exceptional support for non-linear development workflows, primarily through its sophisticated yet user-friendly branching and merging system. In Git, branches are incredibly lightweight, merely pointers to a specific commit, making the creation and deletion of branches nearly instant. This allows thousands of simultaneous, diverging, or converging streams of development to occur side-by-side, fostering the ability to experiment with new ideas in separate branches without affecting the main codebase.
- Fully distributed: In contrast to centralised version control systems, Git is fully distributed. This means every developer’s copy of the project is also a fully-fledged repository with complete history and full version tracking capabilities, independent of network access or a central server. This encourages more frequent commits, improves fault tolerance, and allows for flexible workflows. Developers can work in their own space, make commits, and only merge changes back to the main codebase when ready, enabling seamless collaboration even when offline.
- Able to handle large projects like the Linux kernel efficiently (speed and data size): One of the notable features of Git is its ability to handle large projects. This efficiency is twofold: speed and data size. Git was initially developed to track the Linux kernel, a notably sizable project. Git employs an intelligent storage mechanism where it stores data as snapshots rather than file differences, compressing data where possible. It also uses a series of optimisations, like pack files and delta compression, to manage and minimise the space taken by large projects. As a result, Git can quickly work and traverse the history of even huge codebases.
According to the StackOverflow 2022 Survey, Git is the most popular VCS that more than 93% of respondents use to manage the changes to their code base.
Git is an easy but very powerful tool that every developer has to learn and use,s therefore, we’re going to explore the world of Git in the following articles of this series. The first would be “Become A Git Super User — Part 2: How Does Git Work?”.
References
[1] Patel, S. (2020) Why you should move from centralized version control to distributed version control, GitLab. Available at: https://about.gitlab.com/blog/2020/11/19/move-to-distributed-vcs/ (Accessed: 21 May 2023).
[2] Lithmee. (2019, June 12). What is the difference between centralized and distributed version control? Pediaa.Com. https://pediaa.com/what-is-the-difference-between-centralized-and-distributed-version-control/
[3] Tichy, W. F. (1985). Rcs — a system for version control. Software — Practice and Experience, 15(7), 637–654. https://doi.org/10.1002/spe.4380150703
Thank you for reading this article. I appreciate your time and would love to hear your feedback. Please share your thoughts and opinions in the comment section below.
You can connect with me on LinkedIn, follow my projects on GitHub for more updates, or reach out to me on Twitter for ongoing discussions.
See you in the next articles …