Compared to subversion, what is different in Git?
Everything. Although some commands sound similar, the syntax is similar and some concepts are shared, the fundamentals of git are completely different. The best way to understand git is to forget everything you know about svn and start again!
In this article we’ll examine some of the internals of git and how you interact with them. We’ll compare the various git concepts with similar concepts in subversion.
Svn is centralised, git is distributed. But what does that mean? It is quite a complicated concept but essentially it means that there is no central repository. Everyone has a copy of the repository on their machine, the whole repo, all code, all history, every commit, everything. You do not need access to the monolithic central server to do work.
So how do you know what the single version of the truth is? Well in practice you can nominate the repo on a certain machine as authoritative. This repo could be on a machine under your desk, it could be the lead developer’s machine, it could be Linus Torvalds’ machine, it could be GitHub. For us it is GitHub. GitHub acts as our single version of the truth but in terms of git architecture it is no more ‘special’ than the repo on my machine.
In svn you would ‘checkout’ a repo and create your own working copy on your machine. You would then change some code and ‘commit’ it back into the repo. These concepts also work in git except that the repo is local to you, you do not affect the central server. To get the repo from the central server or push/pull changes to/from it requires an additional step.
- To create your own copy of a repo you clone it
- To fetch changes from a remote repo into your own repo you pull
- To place your local changes into the remote repo you push
Being aware of these differences is important as it fundamentally affects how you interact with repos.
Another thing that git does differently is the way it tracks changes. Subversion tracks individual files and for each file in a changeset it will store a diff for that file. You can then recover the file at a given revision by replaying diffs on top of each other.
Git tracks file contents, not the files themselves (the file and directory structure is stored separately). It does not store diffs, it calculates them from whole files. It also does away with the linear history that svn gives you and drops the concept of a revision.
So how does it implement all this then? Well without getting too much into the internals, most things revolve around a commit object. A commit object is very simple; it records the person that made the change, the person that committed it (these can be different – i.e. when merging), a unique hash that represents the commit (analogous to an svn changeset id, except they aren’t sequential), and the contents of the files that were changed. Crucially it also stores the hash ID of its parent commit. By tracing the ancestors back you can observe the history of the repository. Each commit will show you files that have changed so if you step back far enough you will be able to get a view of every file in the repo.
If you think of a commit as similar to an svn changset then you’re not far wrong in terms of the role they play. Imagine a simple repo where we create a file called test.txt and add a “hello world” line to it. We then commit this to the repo. We then add a second line to the file that says “I’m ugly” and then commit this as well. We end up with a repo with two commits:
As you can see the second commit (456def) is pointing at its parent commit (123abc). What git allows you to do now is to inspect the state of the repo at a given commit. If you inspect the repo at the latest commit (456def) the test.txt file will contain both lines of text. If you inspect the repo at the previous commit then you will only see one line in the file.
If you get this concept then you’re most of the way to understanding git! 🙂