What is Git?: An Overview

Read Time ~ 6 Minutes

After the last “explainer” article, What is Data Science: An Overview, I thought I could explore some other topics within the realm of data an AI and the first one I wanted to cover was Git. This will be a high level overview and won’t go into specifics on exactly how to use Git via commands but rather the general idea of what Git is and its functionalities. If there is any interest, I wouldn’t mind writing a future article that goes over some of the more technical aspects of Git, so just let me know!

Remember back in school when you had to write a report for class over some subject and had a folder with the following contents:

This is sort of what Git does, I'll be it, in a more sophisticated manner.

So, if you screwed up your report and wanted to go back to a previous version that you saved with a different file name, Git allows you to do the same thing but it also allows you to do a lot more! Let’s take a look but before we do I need to dispel a common misconception.

Git is not the same as GitHub!

Git is the actual program that you utilize and GitHub is a website that hosts Git. Git can actually be used offline and locally on your own computer. GitHub just extends that functionality to allow you to use Git with other people. Also, GitHub is not the only website that hosts Git. There are plenty of other websites that offer similar service to GitHub, it’s just that GitHub tends to be the most used.

Now that we got that out of the way let’s take a look a Git!

Version Control

At its core, Git is a version control system that allows you to keep track of changes within a specified directory that you designate. Basically, it allows you to create a snapshot of your project at any given point in time and revert back to those changes if anything blows up.

In order to utilize Git’s version control, let’s take a look at a general workflow using Git. Don’t worry if there is anything that seems unfamiliar or confusing. I will go over everything in more detail afterwards :

  1. Set up a local Git repository for your project.

  2. Inside of the repository, do work! Git will automatically keep track of any changes made to any files.

  3. Once you are happy with the work that you have done you will “add” your changes to a “staging area”.

  4. After you are done working, you will “commit” your changes.

  5. Repeat steps 2-4.

Now, let’s break down these steps in a bit more detail.

Setting up a local Git repository is just that. We need to tell Git what contents to keep track of and the way we do that is by specifying a folder and saying “Hey Git, keep track of everything in here”.

Once you have set up your repository, you put any files you want to track in there and go about working on them like you usually would. So go code, create things, change things, whatever you like. Git will keep track of the changes.

Now comes the “saving” portion of Git, although it’s not exactly the same as saving, but more so “committing to your changes”.

The first step is adding files to a staging area that you have changed and want to be committed later on. You can add multiple files to your staging area and you can add the same file more than once if you have made multiple changes. For instance, let’s say I made a change to a file and was happy with it and added it to the staging area. Right when I did, my boss decides to have me go a different direction and I changed the file I just added again. I can add that file to the staging area and it will overwrite any changes I made and preserve the old ones.

Once you are completely happy with all the changes you have made and added the files you want to “save” to the staging area, you will “commit” your changes. Think of a commit as creating a snapshot of your project at the time of a commit. This is how we are able to go back to a previous state of our project, or previous commit, if things blow up.

After you have committed your changes your project is now saved and you are able to continue working by repeating the previous steps.

This version control has saved MANY projects of mine by being able to revert back to a previous version after I screwed something major up and is the main reason people use Git.

Branches

Another useful feature of Git is the ability to create what’s known as a “branch”.

Think of a branch as a different timeline of your original project.

For instance, maybe I am working on a program and want to add a feature to it but don’t want to risk screwing up anything in the original program if things don’t pan out. I can create a branch which includes everything from the original program, work on the new feature for my program, and then when I am done with the feature merge it back into the original program.

It’s actually pretty common to have a few branches going at any one time and the common branches to use are as follows:

  • “Main" - Used for production and is the version of the program that the customer will interact with. Generally, a very stable version of your project.

  • “Dev” - Used for development of new versions of the program and is later merged into the “Main” branch.

  • “Feature Branch” - Used to create new features to later be merged into the “Dev” branch. Usually, the branch is named after the feature that is being worked on.

There are other branches but these are the main ones that developers tend to create.

Collaboration

So, while this isn’t something specific to Git, I would be doing you a disservice if I didn’t talk about it.

A huge reason people use Git is to collaborate with others, and the way to do this is using a “remote repository”.

A remote repository is just a Git repository, like the one you create on your local computer, but hosted online. This allows others the view, edit, and commit changes on the same project in real time.

Like I stated above, GitHub is by far the most popular Git hosting site out there, though there are others.

This changes the Git workflow up a bit but is generally the same as above:

  1. Set up a remote Git repository for your project on any Git hosting websites like GitHub.

  2. Clone your remote repository to your local computer, effectively creating a local repository.

  3. Inside of the local repository, do work! Git will automatically keep track of any changes made to any files.

  4. Once you are happy with the work that you have done you will “add” your changes to a “staging area”.

  5. After you are done working, you will “commit” your changes.

  6. Finally, once you commit your changes you will “push” them to your remote repository and others will be able to see your changes.

  7. Repeat steps 3-6.

There is one more thing to note. If you are working on the same project with other people using a remote repository, it is always a good idea to “Pull” down any changes from the remote repository to your local repository before working. This insures you are on the most up to date version of the project.

Wrapping it up

There are many other features to Git but this covers the basics and will get you started.

I hope you enjoyed this edition of AI insights. Like I said above, If you would like a more technical overview of Git please let me know!

Until next time.

Andrew-

Have something you want me to write about?

Head over to the contact page and drop me a message. I will be more than happy to read it over and see if I can provide any insights!

Previous
Previous

What is Artificial Intelligence Anyways?

Next
Next

The Art of Data Storytelling: Best Practices to Craft Compelling Narratives