HomePublicationsProjectsBlogAbout

Git 101

November 19, 2019
Table of Content

What is it?

Git is a version control system. It provides efficient ways/workflows to manage versions of documents, especially when there are many contributors. Documents here are usually pieces of source code. It is one of recommended tools that one should use when coding.

To illustrate, consider a situation in which you are writing a program to analyze a dataset (Version 1). After you finish, you realise that some parameters of the program are not correct. So, you correct them (Version 2). Later, your supervisor checks your analysis and confirms you that the parameters you had in the beginning were actually correct. So, you update the parameters accordingly (Version 3).

From the situation above, we have already at least three versions of the program. Naively, we can have a separate file or source code when we change something each time. This is not efficient, and we will end up with a numerous of (useless) files. Moreover, if we do not have such files and assume that there is no difference between Version 1 and Version 3, a question or challenge arises: how do we get Version 1 from Version 2 reliably?

In short, Git is a tool that allows you to systematically maintain/manage differences in your documents or source code. In Git, your documents are typically kept in a repository (i.e. project). Every change you make will span your repository's history when you commit it, adding a unique hash to your repository's history tree. These hashes represent your repository at that certain moment. Your commit will be kept local. If you want other to see or use your commit, you have to push it to a remote server, usually set when the project is created.

Set up

Installation

Unix-based OS often comes with Git. For Windows users, please consult https://git-scm.com/download/win.

SSH Connection

Every Git repository should have a remote server, which hosts your repository's history and allows collaboration. Preferably, Git uses SSH connection between your Git client and such a remote host. In short, SSH connection is a secure communication protocol between two nodes using asymmetric encryption (i.e. a private-public key pair): you encrypt messages with your private key and only nodes that have your public key can decrypt the messages.

Please check out Github's article how to set this SSH Connection up.

Basic Usage

Init - Status - Add - Commit

Let start by creating a directory "my-git-playground" and a file README.md with the content below:

This is my first time using Git. 

README.md is a Markdown file that usually contains the details of for your repository, such as setup guideline or license.

From my experience, using Git command lines are more intuitive. So, let use Terminal or Git Bash if you are using Windows.

# change directory to your "my-git-playground" directory
cd <path-to>/my-git-playground

# initialize Git in this directory
git init

Once you run git init, Git will create a hidden directory .git that keeps all your commit history (i.e. your local repository). You can verify whether you have the directory by running ls -a. The output should be similar to below:

If you run git status, you will see that Git has not tracked README.md yet.

    # output: git status
    On branch master
    
    Initial commit
    
    Untracked files:
      (use "git add <file>..." to include in what will be committed)
    
        README.md
    
    nothing added to commit but untracked files present (use "git add" to track)

Let track README.md by running git add README.md . The command tells Git that this is the change that will be committed: Git will create a temporary pointer HEAD for that. Now, you can run git commit -m "1st commit" to create a commit in your commit history. Running git log will show your repository's commit history.

The commit that you have just made is only in your local repository. Generally, you will also have a remote one that allow you to share or backup your changes. Let use Github for this purpose: You can create a new repository at https://github.com/new.

Once the repository has been created, you will have a Git URL for that repository:

git@github.com:<username>/my-git-playground.git

Let link this remote repository to your local repository using

git remote add origin git@github.com:<username>/my-git-playground.git

Push - Remote Repository

You can run git push -u origin master to push the change to the remote repository. In this command,

  • origin is the name of the remote repository that you want to use. In some cases, you might have several remote repositories.
  • master is the branch name. Until now, we only use master

If you access https://github.com/<username>/my-git-playground, you can see README.md there.

Diff - Checkout

Let make some change to README.md, for example adding My second commit to the file. When running git status , Git tells you that your local repository has some changes

    On branch master
    Your branch is up-to-date with 'origin/master'.
    Changes not staged for commit:
      (use "git add <file>..." to update what will be committed)
      (use "git checkout -- <file>..." to discard changes in working directory)
    
        modified:   README.md
    
    no changes added to commit (use "git add" and/or "git commit -a")

You can use git diff <filename> to see what changes has been made to the file.

    On branch master
    Your branch is up-to-date with 'origin/master'.
    Changes not staged for commit:
      (use "git add <file>..." to update what will be committed)
      (use "git checkout -- <file>..." to discard changes in working directory)
    
        modified:   README.md
    
    no changes added to commit (use "git add" and/or "git commit -a")

Assume that this is the change that you want. You can stage it by using git add and git commit. If not, you can use git checkout <filename> to discard the change.

⚠️Please be conscious when using git checkout <filename>, otherwise you might lose those changes.

Running git push again will push this new commit to the remote repository. This second time and later the option -u origin master is not necessary because your Git client knows that the master branch is binded with origin's master from your first push.

Fetch - Rebase - Stash

Let go to https://github.com/<username>/my-git-playground/blob/master/README.md and edit the file. This is just a simulation of a case when someone has made changes to the remote repository (assuming on master). When you run git fetch, you see that your master branch is outdated (i.e. it is behind the master branch of the remote repository). In this situation, Git prevents you create a new commit, keeping the commit history linear.

To update your local repository's HEAD, you have to run git rebase . If your working directory is not clean, i.e. there are some modifications, Git will not allow you to rebase. To make it the directory clean, you can use git stash before applying git rebase. Once the rebase is done, you call git stash pop to bring those modifications back to the working directory.

Alternatively, you can use git pull for fetching new commits and rebasing instead running the two commands separately.

Summary: Typical Git Workflow