How to become a Git Dumbledore…!(The Secrets of .git Folder)

AJAY NEGI
9 min readNov 3, 2020
Dumbledore committing his magic spells on Git.

What is this blog about?

This blog is for both type of people :

  1. Who knows how to use Git.
  2. Who doesn’t know how to use Git.

If you are first kind of the people, then you are just typing those git commands in your terminal to add,commit and alter your project data without even being sure that how this commands really work internally. I guess that is the case, right? Or else why would have you opened this blog, right ?😜.

And if you are second kind of people and just want to know how git works before learning the command, then you are at the right place, my friend 😇.

This blog will help you understand how git works under the hood

What is Git — A Brief Introduction:

Git is a version control software. “OK, But what is Version Control then?”. Well that is self explanatory, Version Control is a software that helps you keep track on the different versions of a your project/file. Quite simple, right?.

So without wasting any time lets dive into how Git works behind the scenes.

Git — Under the hood.

How does Git keep track of your project?

Probably the first question that will come in your mind is :- How does Git keep track of the files that we are saving in our local device. Well, we will get back to the answer of this question, but before that we have to make a git repository by using the command “git init”.

Initialized Git repository using “git init” command.

Notice the output in the above image when we run the command “git init”: “Initialized empty git repository in /home/ajay/Learning_Git/.git/”. It has made a empty git repository(empty because we don’t have any files in that folder yet), but the question was “how it tracks our progress in the project?”. Well, did you notice that it made a new folder named“.git” inside the “Learning_git” folder. Yes, you have guessed it right, this is the folder that connects git to your project and also notice the “.” before the folder name which means it will be a hidden folder in Mac and Linux OS, so you will not be able to see it simply unless you view hidden folders.

Three States of Git(Just a quick refresher)

Before we dive deep into how git commands work internally lets take a look towards the three states of git.Git has three main states that your files can reside in: modified, staged, and committed:

  • Modified means that you have changed the file but have not committed it to your database yet.
  • Staged means that you have marked a modified file in its current version to go into your next commit snapshot
  • Committed means that the data is safely stored in your local database.

This leads us to the three main sections of a Git project: the working tree(or working directory), the staging area, and the Git directory.

Three sections of git:- working directory, Staging Area and Repository.

The working tree is a single checkout of one version of the project. These files are pulled out of the compressed database in the Git directory and placed on disk for you to use or modify.

The staging area is a file, generally contained in your Git directory, that stores information about what will go into your next commit. Its technical name in Git parlance is the “index”, but the phrase “staging area” works just as well.

The Git directory is where Git stores the metadata and object database for your project. This is the most important part of Git, and it is what is copied when you clone a repository from another computer.The basic Git workflow goes something like this:

  1. You modify files in your working tree.

2. You selectively stage just those changes you want to be part of your next commit, which adds only those changes to the staging area.

3. You do a commit, which takes the files as they are in the staging area and stores that snapshot permanently to your Git directory.

If a particular version of a file is in the Git directory, it’s considered committed. If it has been modified and was added to the staging area, it is staged. And if it was changed since it was checked out but has not been staged, it is modified.

I found this video on udacity, which will help you understand the concept a little better and you can also join their course on git to learn about git thoroughly.

Typical workflow when working with version control.

The Mystery of the contents inside the .git folder:

When you run git init in a new or existing directory, Git creates the .git directory, which is where almost everything that Git stores and manipulates is located. If you want to back up or clone your repository, copying this single directory elsewhere gives you nearly everything you need. So, lets reveal the mystery of this .git folder by using the tree .git command.

The output of "tree .git" showing the contents of .git folder.

Lets have a brief overview about what are these folders are for: the description file is used only by the GitWeb program, so don’t worry about it. The config file contains your project-specific configuration options, and the info directory keeps a global exclude file for ignored patterns that you don’t want to track in a .gitignore file. The hooks directory contains your client- or server-side hook scripts.

This leaves four important entries: the HEAD and (yet to be created) index files, and the objects and refs directories. These are the core parts of Git. The objects directory stores all the content for your database, the refs directory stores pointers into commit objects in that data (branches, tags, remotes and more), the HEAD file points to the branch you currently have checked out, and the index file is where Git stores your staging area information. Now we will only talk about some of its contents which are really important to understand.

Now, lets make a file to add and commit for our investigation to go forward by using the command touch .

Created an empty Readme file using touch command.

Remember this that we have only created this file and it is untracked by the git because we have not added and committed it yet. You don’t believe me?. OK, then lets confirm it with using the command git status.

Git status showing that the Readme file is untracked by it.

So as the terminal is prompting us to add the file, so why keep him waiting, lets add the file using git add Readme.md and to show you how this affects our .git folder, I am going to open a new terminal beside this terminal and type the command watch -n .5 tree .git . So focus on the right terminal and see how its content changes we add Readme.md file to the staging are.

So what happened?. Well when we used the git add command, it added a new folder e6inside the object folder and a file inside it which i am not going to spell here 💁. Actually, It was nothing complex, what git did was that it provided a SHA(a unique code) to the file when you added, and when i say added i mean git copies the content of the file you want to add and save it in that object folder.So, the first two characters of the SHA is the folder name(e6) and the rest of it is the file name(9de29bb…..).

Types of Object that get stored:

  1. BLOB
  2. Tree
  3. Commit

BLOB

When we added Readme file , What happened?, git conceptually took the Readme file and put it in the staging area, but what it actually did, it took the file and copied the contents of the file into a file in the objects directory which represents the BLOB(Binary Large Object), which is basically just a collection of data. Blobs don’t have names it’s just the raw data, and then git took this raw data and ran it through the hashing algorithm and the hashing algorithm always gives a 40 character output which is nothing but the SHA that I mentioned above. What this all tells us is that every time you run git add you are actually putting it in your object’s directory.

Commit and Tree

But what is important with git is we care about the snapshots of the entire project, we dont care about individual files. So, the next step that you are probably familiar with is git commit, which lets you create a snapshot of your repository that references the entire working directory.

So, lets go ahead and commit the file(Don’t forget to check how the output of the right terminal changes.

Notice that now there are two more folders there “1c” and “b8”. Lets see what kind of objects are they by using the command “git show”.

So git tells us that it is a commit object with many other useful information like who the author is, the date it was created and the message we provided.

So, now lets checkout what type of object is the other file.

Git tells us that it is a tree object and not only that it also shows the file we committed, as of now we have only committed 1 file, if we commit more file there names will also get added here. Go on, give it a try.

Master: The default branch

When you create a git repository you get one default branch it is called master branch by default which does not mean anything special. What master is a branch is that its just a pointer or a reference to any commit in your repository. Think about it as a tool for navigating around your repository history. Lets have a look at it.

So, the master branch is located inside the “heads” folder which is inside the “refs” folder and when we looked at the content of the “master”, it shows a SHA, and if you look at the SHA properly, you will recognize that it is the same SHA as that of the commit.What does this mean?. This only means that master is a bookmark that point to a commit, as we only have one commit so far, it is pointing to that commit.

Parent-Child commit relationship

Lets see how git knows the parent of a child commit or any commit that you made. To do that lets create a new file “DontReadme” , stage it and then finally commit it.

Now to see the Parent-Child relationship I am going to use the command “git show — pretty=raw master”.

The tree is now pointing to the new commit we just made, also notice the parent reference, isn't it is the same SHA as that of our earlier commit.So when we create a commit it includes all the data about all the blobs and trees and also includes the metadata about who authored the commit, when it was authored,the committing message and it includes the reference to its parent(the commit that came before it).

Conclusion

Finally! we are the end of BLOG, that was a lot to take in but you did a great job making it to the end of the Blog. So, basically what we learnt is the content which is inside the .git directory mainly the “Object” directory and then we learnt what how the commands like “git add”, “git commit” affects the .git folder and then finally we learnt how does git tracks the relationships between the commit.So, I hope now you have a better understanding of internal git than you had before reading this blog.Lets catch up on my next blog, till then 👇

--

--

AJAY NEGI

Software Engineer Trainee at Mount blue Technologies.