Lecture 4: September 17, 2020¶
- Git basics
- Remotes
- Branches
%%bash
cd /tmp
rm -rf playground # remove if it exists
git clone https://github.com/dsondak/playground.git
Cloning into 'playground'...
%%bash
ls -a /tmp/playground
. .. .git .gitignore README.md environment.yml src tests
Poking around¶
We have a nice smelling fresh repository. We'll explore the repo from the Git
point of view using Git
commands.
log
¶
Log tells you all the changes that have occured in this project as of now.
%%bash
cd /tmp/playground
git log
commit 21575aab16c187bf7a1648cce4b45ac37ead5f13 Author: David Sondak <dsondak@seas.harvard.edu> Date: Thu Sep 17 10:19:10 2020 -0400 Modifying README for class. commit 19f856823d1bd16d859585a34e876235c2695938 Author: David Sondak <dsondak@seas.harvard.edu> Date: Mon Sep 14 09:29:10 2020 -0400 Modifying README for class prep. commit 7a60850817fcee9e3c216ad5e26d2b60bd411150 Author: David <dsondak@seas.harvard.edu> Date: Tue Aug 18 15:32:24 2020 +0000 Updated README with 2020 semester heading. commit 116aa633727a91e328e0e790c2407e93a1e3b4e5 Author: David Sondak <dsondak@seas.harvard.edu> Date: Fri Oct 18 08:15:58 2019 -0400 Complex numbers package. commit 317b35c33f4918fbfdac6f5750cbaf81f619dda2 Author: David Sondak <dsondak@seas.harvard.edu> Date: Thu Sep 12 12:34:42 2019 -0400 Setting the repo year! commit 1a6fb857d43a74cbb9e5fe45ff19f772eac278ba Author: David Sondak <dsondak@users.noreply.github.com> Date: Wed Aug 28 18:10:30 2019 -0400 Update README.md commit 3673e326d853eb6d315d72215eacc0a3f936e2fb Author: David Sondak <dsondak@users.noreply.github.com> Date: Wed Aug 28 18:10:04 2019 -0400 Initial commit
Each one of these "commits" is an SHA hash.
It uniquely identifies all actions that have happened to this repository previously.
The long string of hex digits next to commit
is the long hash and identifies the unique commit.
There is some interesting history here: How much of a git sha is generally considered necessary to uniquely identify a change in a given codebase?
Interested in security? git hash function transition
Getting help with commands¶
If you ever need help on a command, you can find the git man
pages by hyphenating git
and the command name.
Try it!
man git-log
Press the spacebar to scoll down and q
to quit.
status
¶
Status is your window into the current state of your project.
It can tell you which files you have changed and which files you currently have in your staging area.
You should use git status
every other command in git
!
This is especially true in the beginning when you're just learning to understand how things work. (Eventually you can probably relax on this.)
%%bash
cd /tmp/playground
git status
On branch master Your branch is up to date with 'origin/master'. nothing to commit, working tree clean
Pay close attention to that text!
It says we are on the master
branch of our local repository, and that this branch is up-to-date with the master
branch of the upstream repository (or remote) named origin.
We know this because clone
brings down a copy of the remote branch: origin/master
.
origin/master
represents the local copy of the branch that came from the upstream repository (nicknamed origin
in this case).
Branches are different, co-existing versions of your project.
Here we have encountered two of them, but remember there is a third one in the repository we forked from, and perhaps many more, depending on who else made these forks.
Branches represent a snapshot of the project by someone at some particular point in time. In general you will only care about your own branches and those of the "parent" remotes you forked/cloned from. We'll have much more to say about branches later.
Configuration information is stored in a special file called config
, in a hidden folder called .git
in your working directory. (The index and the local repository are stored there as well...more on that in a bit.)
Reminder: Hidden files and directories are preceded by a dot. The only way to see them is to type ls -a
where the a
option tells the ls
command to list hidden files and directories.
A few special files and directories¶
config
¶
%%bash
cd /tmp/playground
cat .git/config
[core] repositoryformatversion = 0 filemode = true bare = false logallrefupdates = true ignorecase = true precomposeunicode = true [remote "origin"] url = https://github.com/dsondak/playground.git fetch = +refs/heads/*:refs/remotes/origin/* [branch "master"] remote = origin merge = refs/heads/master
Notice that this file tells us about a remote called origin
which is simply the Github repository we cloned from. So the process of cloning left us with a remote. The file also tells us about a branch called master
, which "tracks" a remote branch called master
at origin
.
.gitignore
¶
.gitignore
tells us what files to ignore when adding files to the index and comitting to the local repository.
We use this file to ignore temporary data files and such when working in our repository.
Some .gitignore
anatomy¶
Folders are indicated with a /
at the end, in which case all files in that folder are ignored.
One of the lines in the .gitignore
file is *.so
. That line tells Git
to ignore all files with the extension .so
.
Some comments on .gitignore
¶
A .gitignore
file can be specialized to a specific language. The one below is specialized to the Python
language.
Note that when creating a GitHub repo, you are asked if you want to create a .gitignore
file. You don't have to create one, but it's a good idea. Of course, you can always add one later if you so desire.
%%bash
cd /tmp/playground
cat .gitignore
# Byte-compiled / optimized / DLL files __pycache__/ *.py[cod] *$py.class # C extensions *.so # Distribution / packaging .Python build/ develop-eggs/ dist/ downloads/ eggs/ .eggs/ lib/ lib64/ parts/ sdist/ var/ wheels/ *.egg-info/ .installed.cfg *.egg MANIFEST # PyInstaller # Usually these files are written by a python script from a template # before PyInstaller builds the exe, so as to inject date/other infos into it. *.manifest *.spec # Installer logs pip-log.txt pip-delete-this-directory.txt # Unit test / coverage reports htmlcov/ .tox/ .coverage .coverage.* .cache nosetests.xml coverage.xml *.cover .hypothesis/ .pytest_cache/ # Translations *.mo *.pot # Django stuff: *.log local_settings.py db.sqlite3 # Flask stuff: instance/ .webassets-cache # Scrapy stuff: .scrapy # Sphinx documentation docs/_build/ # PyBuilder target/ # Jupyter Notebook .ipynb_checkpoints # pyenv .python-version # celery beat schedule file celerybeat-schedule # SageMath parsed files *.sage.py # Environments .env .venv env/ venv/ ENV/ env.bak/ venv.bak/ # Spyder project settings .spyderproject .spyproject # Rope project settings .ropeproject # mkdocs documentation /site # mypy .mypy_cache/
Making changes¶
Ok! Enough poking around. Let's get down to business and add some files into our folder.
Now let's say that we want to add a new file to the project. The canonical sequence is "edit–add–commit–push".
%%bash
cd /tmp/playground
echo 'In-class demo' >> README.md
git status
On branch master Your branch is up to date with 'origin/master'. Changes not staged for commit: (use "git add <file>..." to update what will be committed) (use "git checkout -- <file>..." to discard changes in working directory) modified: README.md no changes added to commit (use "git add" and/or "git commit -a")
We've modified a file in the working directory, but it hasn't been staged yet. Make sure you read and understand the output.
- Your local
master
branch does not contain anything that is not on the remotemaster
branch. Sogit
says: Your branch is up to date withorigin/master
.
You have some untracked files in your local directory that
git
is not keeping track of.Git
senses this and informs you of this fact and goes one more step to inform you of what those untracked files are. Sometimes you want to stage these files and sometimes you don't. The decision is yours.If you modified a file that Git is already tracking, then Git tells you that the file is modified but the new changes aren't tracked yet.
Git
also tells you that there is nothing to commit but that there are some untracked files and maybe you want to start tracking them.
add
¶
- When you've made a change to a set of files and are ready to create a commit, the first step is to add all of the changed files to the staging area. That is what
add
is for.
- Remember that what you see in the filesystem is your working directory, so the way to see what's in the staging area is with the
status
command.
- This also means that if you add something to the staging area and then edit it again, you'll need to add the file to the staging area again if you want to remember the new changes.
- See the Staging Modified Files section at Git - Recording Changes to the Repository.
%%bash
cd /tmp/playground
git add README.md
git status
On branch master Your branch is up to date with 'origin/master'. Changes to be committed: (use "git reset HEAD <file>..." to unstage) modified: README.md
Now our file is in the staging area (index) waiting to be committed. The file is still not even in our local repository.
WARNING: Do NOT use git add .
¶
Instead of doing git add world.md
you could use git add .
in the top level of the repository. This adds all new files and changed files to the index, and is particularly useful if you have created multiple new files. You should be careful with this because it's a very annoying if you decide that you didn't want to add a file. I usually avoid this if I can, but sometimes it's the way to go. Note: The git add .
sequence is far over-used and can cause collaboration problems. Please refrain from using it, especially if you're new to git
.
commit
¶
When you're satisfied with the changes you've added to your staging area, you can commit those changes to your local repository with the commit
command. Those changes will have a permanent record in the repository from now on.
Every commit has two features you should be aware of:
- The first is a hash. This is a unique identifier for all of the information about that commit, including the code changes, the timestamp, and the author. We saw this already when we used
git log
earlier. - The second is a commit message. This is text that you can (and should) add to a commit to describe what the changes were.
Good commit messages are important!
Commit messages are a way of quickly telling your future self and your collaborators what a commit was about. For even a moderately sized project, digging through tens or hundreds of commits to find the change you're looking for is a nightmare without friendly summaries.
By convention, commit messages start with a single-line summary, then an empty line, then a more comprehensive description of the changes.
This is an okay commit message. The changes are small, and the summary is sufficient to describe what happened.
This is better. The summary captures the important information (major shift, direct vs. helper), and the full commit message describes what the high-level changes were.
This. Don't do this.
%%bash
cd /tmp/playground
git commit -m "Modifying README for in-class demo."
[master e265840] Modifying README for in-class demo. 1 file changed, 1 insertion(+)
%%bash
cd /tmp/playground
git status
On branch master Your branch is ahead of 'origin/master' by 1 commit. (use "git push" to publish your local commits) nothing to commit, working tree clean
The git commit -m
version is just a way to specify a commit message without opening a text editor. If you use a text editor you just say git commit
.
Another nice command is to use git commit
with the -a
option: git commit -a
. Note that git commit -a
is shorthand to stage and commit a file which is already tracked all at once. It will not stage a file that is not yet tracked!
%%bash
cd /tmp/playground
git branch -av
* master e265840 [ahead 1] Modifying README for in-class demo. remotes/origin/HEAD -> origin/master remotes/origin/TFtestbranch 86f51da dummy testfile remotes/origin/aditya_karan 959639c Revert "Adding another message" remotes/origin/aditya_karan_2 1db1013 Adding name. remotes/origin/dls_work 5a1542e Reminder to implement __repr__. remotes/origin/master 21575aa Modifying README for class.
%%bash
cd /tmp/playground
git log --oneline --decorate
e265840 (HEAD -> master) Modifying README for in-class demo. 21575aa (origin/master, origin/HEAD) Modifying README for class. 19f8568 Modifying README for class prep. 7a60850 Updated README with 2020 semester heading. 116aa63 Complex numbers package. 317b35c Setting the repo year! 1a6fb85 Update README.md 3673e32 Initial commit
We see that our branch, master
, has one more commit than the origin/master
branch, the local copy of the branch that came from the upstream repository (nicknamed origin
in this case). Let's push the changes.
push
¶
The push
command takes the changes you have made to your local repository and attempts to update a remote repository with them. If you're the only person working with both of these (which is how a solo GitHub project would work), then push should always succeed.
%%bash
cd /tmp/playground
git push
git status
On branch master Your branch is up to date with 'origin/master'. nothing to commit, working tree clean
To https://github.com/dsondak/playground.git 21575aa..e265840 master -> master
You can go to your remote repo and see the changes!
Note to Mac Users¶
Please add the .DS_Store
file into your .gitignore
file. You shouldn't version this annoying file. Here's what it does: .DS_Store
.
Then, do a git rm .DS_Store
in each directory that contains .DS_Store
. Note: You should try to use your new-found Unix skills to execute a single Unix command line to recursively remove all .DS_Store
files!
Please do this both for your playground
and course repos. Be sure to commit and push!
Breakout Room¶
Depending on time, you may be asked to do this on your own today.
- Figure out whose birthday is closest to a holiday.
- Modify (or create) your
.gitignore
file. - Add at least one file or directory to the
.gitignore
that you would like to ignore. You should discuss this with your group. - Make sure you push these changes to your remote repo!
Remotes and fetch
ing from them¶
If you're working with other people, then it's possible that they have made changes to the remote repository between the time you first cloned it and now. push
may fail!
In our particular case of the playground
repository, this is not going to happen, since you just cloned it and presumably haven't invited anyone to collaborate with you on it.
However you can imagine that the original repository dsondak/playground
, which you are now divorced from, has changed, and that you somehow want to pull those changes in.
That's where fetch
and merge
come in.
remote
¶
We have seen so far that our repository has one "remote", or upstream repository, which has been identified with the word origin
, as seen in .git/config
.
We now wish to add another remote, which we shall call course
, which points to the original repository we forked from.
We want to do this to pull in changes, in case something changed there. This is a very useful workflow to know how to execute and understand.
%%bash
cd /tmp/playground
git remote add course https://github.com/dsondak/playground.git
cat .git/config
[core] repositoryformatversion = 0 filemode = true bare = false logallrefupdates = true ignorecase = true precomposeunicode = true [remote "origin"] url = https://github.com/dsondak/playground.git fetch = +refs/heads/*:refs/remotes/origin/* [branch "master"] remote = origin merge = refs/heads/master [remote "course"] url = https://github.com/dsondak/playground.git fetch = +refs/heads/*:refs/remotes/course/*
Notice that the master
branch only tracks the same branch on the origin
remote. The example in this notebook is a little silly because the origin
and course
remotes are the same. It will make more sense when you do it on your own. Your origin
will be your fork
of the original repo and your course
will be the original repo.
We haven't set up any official connection with the course
remote as yet.
Now let's figure out how to get changes from an upstream repository, be it our origin
upstream that a collaborator has push
ed to, or another course
remote to which one of the teaching staff has posted a change.
fetch
¶
A Scenario¶
Let's say a collaborator has pushed changes to your shared upstream repository while you were editing. Their local repository and the upstream repository now both contain their changes, but your local repository does not. To update your local repository, you run fetch
.
Question: What if you've committed changes in the meantime? Does your local repository contain your changes or theirs?
Answer: It contains a record of both, but they are kept separate. Remember that git
repositories are not copies of your project files. They store all the contents of your files, along with a bunch of metadata, but in its own internal format.
A Scenario...continued¶
Let's say that you and your collaborator both edited the same line of the same file at the same time in different ways. On your respective machines you both add and commit your different changes and your collaborator pushes theirs to the upstream repository before you do.
When you run fetch
, git
adds a record of their changes to your local repository alongside your own.
These are called branches, and they represent different, coexisting versions of your project. The fetch
command adds your collaborator's branch to your local repository, but keeps yours as well.
%%bash
cd /tmp/playground
git fetch course
From https://github.com/dsondak/playground * [new branch] TFtestbranch -> course/TFtestbranch * [new branch] aditya_karan -> course/aditya_karan * [new branch] aditya_karan_2 -> course/aditya_karan_2 * [new branch] dls_work -> course/dls_work * [new branch] master -> course/master
A copy of a new remote branch has been made. To see this, provide the -avv
argument to git branch
.
%%bash
cd /tmp/playground
git branch -avv
* master e265840 [origin/master] Modifying README for in-class demo. remotes/course/TFtestbranch 86f51da dummy testfile remotes/course/aditya_karan 959639c Revert "Adding another message" remotes/course/aditya_karan_2 1db1013 Adding name. remotes/course/dls_work 5a1542e Reminder to implement __repr__. remotes/course/master e265840 Modifying README for in-class demo. remotes/origin/HEAD -> origin/master remotes/origin/TFtestbranch 86f51da dummy testfile remotes/origin/aditya_karan 959639c Revert "Adding another message" remotes/origin/aditya_karan_2 1db1013 Adding name. remotes/origin/dls_work 5a1542e Reminder to implement __repr__. remotes/origin/master e265840 Modifying README for in-class demo.
Indeed, the way git
works is by creating copies of remote branches locally. Then it just compares to these "copy" branches to see what changes have been made.
Sometimes we really do want to merge the changes. For now, we want to merge the change from remotes/course/master
. Eventually, we'll consider a case where you want to simply create another branch yourself and do things on that branch.
merge
¶
Having multiple branches is fine, but at some point you'll want to combine the changes that you've made with those made by others. This is called merging.
There are two general cases when merging two branches:
- First, the two branches are different but the changes are in unrelated places.
- Second, the two branches are different and the changes are in the same locations in the same files.
Scenario 1¶
The first scenario is easy. Git
will simply apply both sets of changes to the appropriate places and put the resulting files into the staging area for you. Then you can commit the changes and push them back to the upstream repository. Your collaborator does the same, and everyone sees everything.
Scenario 2¶
The second scenario is more complicated.
Let's say the two changes set some variable to different values.
Git
can't know which is the correct value.
One solution would be to simply use the more recent change, but this very easily leads to self-inconsistent programs.
A more conservative solution, and the one git
uses, is to simply leave the decision to the user.
When git
detects a conflict that it cannot resolve, merge
fails, and git
places a modified version of the offending file in your project directory. This is important: the file that git
puts into your directory is not actually either of the originals. It is a new file that has special markings around the locations that conflicted. We shall not consider this case yet, but will return to dealing with conflicts soon.
%%bash
cd /tmp/playground
git merge course/master
git status
Already up to date. On branch master Your branch is up to date with 'origin/master'. nothing to commit, working tree clean
We could be ahead of our upstream-tracking repository by some commits...why?
Note that in this contrived example, this is unlikely because origin
and remote
are the same.
%%bash
cd /tmp/playground
git log -3
commit e265840f227f1c99c587a336588e7976fa609b35 Author: David Sondak <dsondak@seas.harvard.edu> Date: Thu Sep 17 12:33:24 2020 -0400 Modifying README for in-class demo. commit 21575aab16c187bf7a1648cce4b45ac37ead5f13 Author: David Sondak <dsondak@seas.harvard.edu> Date: Thu Sep 17 10:19:10 2020 -0400 Modifying README for class. commit 19f856823d1bd16d859585a34e876235c2695938 Author: David Sondak <dsondak@seas.harvard.edu> Date: Mon Sep 14 09:29:10 2020 -0400 Modifying README for class prep.
A git log
may help you diagnose why you are ahead of the upstream by some commits.
If you had edited the README.md
at the same time and committed locally, you would have been asked to resolve the conflict in the merge (the second case above).
These changes are only on our local repo. We would like to have them on our remote repo. Let's push these changes to the origin
now.
%%bash
cd /tmp/playground
git push
git status
On branch master Your branch is up to date with 'origin/master'. nothing to commit, working tree clean
Everything up-to-date
A comment on git pull
¶
You can combine a fetch and a merge together by simply doing a git pull
. This will fail if you and your collaborator have worked on the same file (since you will have to merge by hand), but is a great shortcut when the files worked on are different. I use it all the time on a personal level too, to shift work between two different machines, as long as I am not working on both at the same time. The usual use case is day work on a work computer, and then evening work at home on the laptop. Read the docs if you are interested.
The safest thing to do is to first do the fetch followed by a merge. This is especially useful if you're new to git
. It forces you to think about the steps instead of getting shocked by a pull
that yells at you for a merge conflict.
Git habits¶
Commit early, commit often.
Git is more effective when used at a fine granularity. For starters, you can't undo what you haven't committed, so committing lots of small changes makes it easier to find the right rollback point. Also, merging becomes a lot easier when you only have to deal with a handful of conflicts.
Commit unrelated changes separately.
Identifying the source of a bug or understanding the reason why a particular piece of code exists is much easier when commits focus on related changes. Some of this has to do with simplifying commit messages and making it easier to look through logs, but it has other related benefits: commits are smaller and simpler, and merge conflicts are confined to only the commits which actually have conflicting code.
Do not commit binaries and other temporary files.
Git is meant for tracking changes. In nearly all cases, the only meaningful difference between the contents of two binaries is that they are different. If you change source files, compile, and commit the resulting binary, git sees an entirely different file. The end result is that the git repository (which contains a complete history, remember) begins to become bloated with the history of many dissimilar binaries. Worse, there's often little advantage to keeping those files in the history. An argument can be made for periodically snapshotting working binaries, but things like object files, compiled python files, and editor auto-saves are basically wasted space.
Ignore files which should not be committed
Git comes with a built-in mechanism for ignoring certain types of files. Placing filenames or wildcards in a .gitignore
file placed in the top-level directory (where the .git
directory is also located) will cause git to ignore those files when checking file status. This is a good way to ensure you don't commit the wrong files accidentally, and it also makes the output of git status
somewhat cleaner.
Always make a branch for new changes
While it's tempting to work on new code directly in the master
branch, it's usually a good idea to create a new one instead, especially for team-based projects. The major advantage to this practice is that it keeps logically disparate change sets separate. This means that if two people are working on improvements in two different branches, when they merge, the actual workflow is reflected in the git history. Plus, explicitly creating branches adds some semantic meaning to your branch structure. Moreover, there is very little difference in how you use git.
Write good commit messages
I cannot understate the importance of this.
Seriously. Write good commit messages.
Basic Branching¶
We have encountered branches a few times so far but we haven't really said much about what they are or why they're important. They are very important. In fact, they form a core piece of the
git
workflow.A
git
branch is a "Sticky Note" on the graph. When you switch branches you are moving the "Sticky Note".Suppose you have a newly initialized repository. Your first commit is represented by the A block in the figure below.
A default branch is created and git
named it master
. The name master
has no special meaning to git
.
Now suppose we make a set of two commits (B and C). The master
branch (and our pointer) moves along.
So far so good. Suddenly we find a bug! We could work on the bug in master
but that's not really a good idea. It would make a lot more sense to branch
off of master
and fix the bug on its own branch
. That way, we don't interfere with things on master
. We'll discuss the details of how to create branches in the lecture exercises. For now, suppose we create a new branch called bug1
.
The new branch is a pointer to the same commit as the master
branch (commit C) but the pointer moved from the master
branch to the bug1
branch.
We do some work on the bug1
branch and make two more commits. The pointer and branch now move to commit E.
- Now you decide that the bug you found has been fixed.
- You've modified a file and maybe even added a new file.
You can switch back to the
master
branch.What you'll see is that none of the files that you just fixed and/or created are in your working directory!
- The first couple of times you see this, it feels really uncomfortable.
- However, this is the correct
git
workflow:git
works with snapshots!
How do we get the bug fix into our
master
branch?We already know the command. From the
master
branch, just dogit merge bug1
.
This looks really nice! The merge
brought the two change histories together perfectly.
- The only thing left to do is to delete the
bug1
branch. - We don't need it anymore and so we really don't want it floating around.
- To delete a branch you simply write
git branch -d bug1
.
This looks like a nice clean tree now. If only things were always this simple.
Nonlinear Histories: Workflow Choices¶
Another common scenario is as follows:
- We created our "story branch" off of commit C to address some bug (not the same bug as before!). Call this branch
bug2
. - However, some changes have happened in
master
since we branched off of C. For example,bug1
has been merged intomaster
. - We have made a couple of commits on
bug2
.
Here's the current graph:
Once we're ready to merge our bug fix, we switch back to master
.
Now we attempt to merge.
Our attempted merge should connect the new version H to both E and G (H came from E and G).
Now we delete the bug2
branch since the bug fix has been successfully merged into master
.
- The graph is now a bit of a mess; the history is nonlinear.
- There's nothing particularly wrong about this.
- However, such a history makes it hard to see the changes independently.
- What if another branch came off of G? You could have multiple loops!
There is another way to do merges that helps "linearize" the graph. Let's pick up with our bug2
branch just before we switched to the master
branch for a merge.
This time, instead of starting the merge
process right away, we'll first rebase
.
git rebase master
.
What does rebase
do?
- Undo the changes we made off of C, but remember what they were
- Re-apply those changes on E instead
Now we proceed as usual:
git checkout master
git merge bug2
Now we get a nice linear flow.
Comments on rebase
¶
- The actual change set ordering in the repo mirrors what actually happened. That is, F' and G' came after E rather than in parallel to it.
- We have re-written history; this is controversial.
Basic Rule: Don't rebase public history¶
Never rebase commits once they've been pushed to a public repository.
Some Rough rebase
guidelines¶
Use an interactive rebase to polish a feature branch before merging it into the main code base.
You will work extensively with branches in any real project. In fact, branches are central to the Git
workflow.
For more details on branches in Git
see Chapter 3 of the Git
Book: Git Branching - Branches in a Nutshell.
Branching Demo¶
As you might have noticed by now, everything in Git
is a branch. We have branches on remote (upstream) repositories, copies of remote branches in our local repository, and branches on local repositories which (so far) track remote branches (or more precisely local copies of remote repositories).
%%bash
cd /tmp
rm -rf cs207_david_sondak #remove if it exists
git clone https://github.com/dsondak/cs207_david_sondak.git
Once you're in your course repo, you can look at all the branches and print out a lot of information to the screen.
%%bash
cd /tmp/cs207_david_sondak
git branch -avv
All of these branches are nothing but commit-streams in disguise, as can be seen above. It's a very simple model that leads to a lot of interesting version control patterns.
Since branches are so light-weight, the recommended way of working on software using git
is to create a new branch for each new feature you add, test it out, and if good, merge it into master. Then you deploy the software from master
. We have been using branches under the hood. Let's now lift the hood.
branch
¶
If you run git branch
without having created any branches, it will list only one, called master
. This is the default branch. You have also seen the use of git branch -avv
to show all branches (even remote ones).
It's important to note that this new branch is not active. If you make changes, those changes will still apply to the master
branch, not branch_name
. That is, after executing the git branch branch_name
command you're still on the master
branch and not the branch_name
branch. To change this, you need the next command.
checkout
¶
A note on checkout
¶
Checkout
switches the active branch.
Since branches can have different changes, checkout
may make the working directory look very different.
For instance, if you have added new files to one branch and then check another branch out, those files will no longer show up in the directory. They are still stored in the .git
folder, but since they only exist in the other branch, they cannot be accessed until you check out the original branch.
%%bash
cd /tmp/cs207_david_sondak
git branch lecture4_demos
See what branches we have created.
%%bash
cd /tmp/cs207_david_sondak
git branch
Notice that you have created the lecture4_demos
branch but you're still on the master
branch.
Jump onto the lecture4_demos
branch.
%%bash
cd /tmp/cs207_david_sondak
git checkout lecture4_demos
git branch
Notice that it is bootstrapped off the master
branch and has the same files. You can check that with the ls
command.
%%bash
cd /tmp/cs207_david_sondak
ls
Note: You could have created this branch and switched to it all in one go by using
git checkout -b lecture4_demos
Now let's check the status of our repo.
%%bash
cd /tmp/cs207_david_sondak
git status
Alright, so we're on our new branch but we haven't added or modified anything yet; there's nothing to commit.
Adding a file on a new branch¶
Let's add a new file. Note that this file gets added on this branch only!
Notice that I'm still using the echo
command. Once again, this is only because jupyter
can't work with text editors. If I were you, I'd use vim
, but you can use whatever text editor you like.
%%bash
cd /tmp/cs207_david_sondak
echo '# Things I wish G.R.R. Martin would say: Finally updating A Song of Ice and Fire.' > books.md
git status
We add the file to the index, then commit the file to the local repository on the lecture4_demos
branch.
%%bash
cd /tmp/cs207_david_sondak
git add books.md
git status
%%bash
cd /tmp/cs207_david_sondak
git commit -am "Added another test file to demonstrate git features" # Make sure you really understand what the `-am` option does!
%%bash
cd /tmp/cs207_david_sondak
git status
At this point, we have committed a new file (books.md
) to our new branch in our local repo. Our remote repo is still not aware of this new file (or branch). In fact, our master
branch is still not really aware of this file.
Note: There are really two options at this point:
- Push the current branch to our upstream repo. This would correspond to a "long-lived" branch. You may want to do this if you have a version of your code that you are maintaining.
- Merge the new branch into the local master branch. Depending on your chosen workflow, this may happen much more frequently than the first option. You'll be creating branches all the time for little bug fixes and features. You don't necessary want such branches to be "long-lived". Once your feature is ready, you'll merge the feature branch into the
master
branch,stage
,commit
, andpush
(all onmaster
). Then you'll delete the "short-lived" feature branch.
We'll continue with the first option for now and discuss the other option later.
Long-lived branches¶
Ok, we have committed. Lets try to push!
%%bash
cd /tmp/cs207_david_sondak
git push
Fail! Why? Because git
didn't know what to push to on origin
(the name of our remote repo) and didn't want to assume we wanted to call the branch lecture4_demos
on the remote. We need to tell that to git
explicitly (just like it tells us to).
%%bash
cd /tmp/cs207_david_sondak
git push --set-upstream origin lecture4_demos
Aha, now we have both a remote and a local for lecture4_demos
. We can use the convenient arguments to branch
in order to see the details of all the branches.
%%bash
cd /tmp/cs207_david_sondak
git branch -avv
We make sure we are back on master.
%%bash
cd /tmp/cs207_david_sondak
git checkout master
What have we done?
We created a new local branch, created a file on it, created that same branch on our remote repo, and pushed all the changes. Finally, we went back to our master
branch to continue work there.
Git habits¶
Commit early, commit often.
Git is more effective when used at a fine granularity. For starters, you can't undo what you haven't committed, so committing lots of small changes makes it easier to find the right rollback point. Also, merging becomes a lot easier when you only have to deal with a handful of conflicts.
Commit unrelated changes separately.
Identifying the source of a bug or understanding the reason why a particular piece of code exists is much easier when commits focus on related changes. Some of this has to do with simplifying commit messages and making it easier to look through logs, but it has other related benefits: commits are smaller and simpler, and merge conflicts are confined to only the commits which actually have conflicting code.
Do not commit binaries and other temporary files.
Git is meant for tracking changes. In nearly all cases, the only meaningful difference between the contents of two binaries is that they are different. If you change source files, compile, and commit the resulting binary, git sees an entirely different file. The end result is that the git repository (which contains a complete history, remember) begins to become bloated with the history of many dissimilar binaries. Worse, there's often little advantage to keeping those files in the history. An argument can be made for periodically snapshotting working binaries, but things like object files, compiled python files, and editor auto-saves are basically wasted space.
Ignore files which should not be committed
Git comes with a built-in mechanism for ignoring certain types of files. Placing filenames or wildcards in a .gitignore
file placed in the top-level directory (where the .git
directory is also located) will cause git to ignore those files when checking file status. This is a good way to ensure you don't commit the wrong files accidentally, and it also makes the output of git status
somewhat cleaner.
Always make a branch for new changes
While it's tempting to work on new code directly in the master
branch, it's usually a good idea to create a new one instead, especially for team-based projects. The major advantage to this practice is that it keeps logically disparate change sets separate. This means that if two people are working on improvements in two different branches, when they merge, the actual workflow is reflected in the git history. Plus, explicitly creating branches adds some semantic meaning to your branch structure. Moreover, there is very little difference in how you use git.
Write good commit messages
I cannot understate the importance of this.
Seriously. Write good commit messages.