Become A Git Super User — Part 1: How does Git work?
In this series, we are going to learn more about Git and how to use this powerful tool to manage the changes in our codebase. You can find the other parts of the series below:
- Become a Git Super User — Part 0: What is a Version Control System? The introduction
- Become A Git Super User — Part 1: How does Git Work?
- Become a Git Super User — Part 2: Installing and Setting Up Git
Now, It’s time to dive into the Git building blocks and cover the fundamental concepts to make you a Git super user.
As mentioned in the first article of this series “Become a Git Super User — Part 0: What is a Version Control System? The introduction” Git is the most popular Version Control System that is used by developers every day to track changes and store different versions of the code base in a distributed manner.
Git has distinct characteristics that make it stand out from other VCSs. Some of these features are:
- Speed
- Simple Design
- Strong support for parallel development
- Fully distributed architecture
- Ease of handling large and complex projects
And because of these features, Git is used almost everywhere and many development teams interact with their code base using this tool. Therefore, it’s necessary for every developer to understand the fundamental concepts and become comfortable with Git and this is the focus of this article. We’re going to cover the basics step by step and walk through a set of practical examples to get you going.
By the end of this article, you should know:
- How does Git track file changes?
- What are the three states of Git?
- What are the three sections in a Git project?
Let’s start exploring the building blocks of Git.
How does Git track file changes?
In many Version Control Systems (VCSs), the system tracks changes to each file overtime. This is know as delta-based version control and works like below
While this works fine in simple scenarios, Git has a different and more optimised approach in how to track changes.
Git looks at the data as a series of snapshots of the file system. Every time you make a change to the files (when you commit a change), Git takes a snapshot of the files and store a reference to that snapshot. Git also doesn’t store the files that haven’t changed, it simply links to the latest version of the file it had already.
Therefore, we can say that Git thinks about the data as a stream of snapshots like below:
What are the benefits of the snapshot based approach?
The snapshot approach is like taking a photo of the entire file system at each step. This means that you can easily go back in time and see how the file system looked at any point. You can compare the snapshot-based approach (Figure 2) with the delta-based approach (Figure 1) to understand the difference.
Accessing version 4 of File A:
Delta: Δ1 + Δ2
Snapshot: A2
Accessing latest version of all the files:
Deta: (File A)Δ1 + (File A)Δ2 + (File B)Δ1 + (File B)Δ2 + (File C)Δ1 + (File C)Δ2 + (File C)Δ3
Snapshot: A2, B2, C3
Utilising snapshots to access various versions of a file system is a more intuitive and mathematically efficient approach compared to using deltas, as we discussed previously.
Snapshots offer a comprehensive view of the file system’s evolution over time. Each snapshot represents a complete copy of the project at a specific moment, making it easier to track and revert changes compared to the delta approach.
The snapshot approach is also more straightforward, resulting in additional benefits such as increased efficiency in branching and merging, as well as improved handling of numerous changes in complex projects. We will delve deeper into these advantages later in our series.
What are the three states of files in Git?
Whenever you make a change to a file in Git, it will be in one of the following states:
- Modified: This state indicates that the file has been changed but the change has not been registered (committed) in the Git database.
- Staged: The modified file has been marked and is ready to go to the next change snapshot in the Git database.
- Commited: This means the changes have been safely stored in the local Git database.
What are the three sections in a Git project?
Now that you know the three states a changed file can be in, it’s important to learn about the three sections a Git project has. These sections are:
- Working Directory/Tree: The working tree is a checkout of one version (snapshot) of the project. The files are pulled out of the database and placed on disk for you to use and modify. In figure 2, if you check out the version 4 of the changes, your working tree would include A2, B1 and C2.
- Staging Area: This section has the information about what will go into your next commit. The technical term of this section is “Index”.
- Git Directory (Repository): This section contains all the metadata and the database of your Git project. It stores the complete history of the Git project and it’s what gets copied when you clone (copy) a Git repository.
How Git moves the files between different sections is:
- You modify the file in the working tree (Working Tree)
- You mark/stage the files you want to include in the next commit (Staging Area)
- You commit the changes which takes the files in the staging area, creates a new snapshot and stores in Git (Git Directory /Repository)
Conclusion
Now that you know how Git tracks changes using the snapshots and how it moves file between the different states and sections, it’s time for you to install and setup Git on your machine which is what we’re going to cover in the next article “Become a Git Super User — Part 3: Installing and Setting Up Git”.
Thank you for reading this article. I appreciate your time and would love to hear your feedback. Please share your thoughts and opinions in the comment section below.
You can connect with me on LinkedIn, follow my projects on GitHub for more updates, or reach out to me on Twitter for ongoing discussions.
See you in the next articles …