Identifying breaking commits
Overview
Teaching: 30 min
Exercises: 10 minQuestions
How can I use git to track down problems in code?
Objectives
Learn to identify when and in what commit problems were introduced
Episode setup
First we need to pull down some code from a remote repository, we will need an example with some broken code
which can be found in the broken
branch of our example repository.
$ cd ~/Desktop
and clone the code
$ git clone git@github.com:NOC-OI/intermediate-git-test-repo.git
and change into the fresh repository and switch to the broken
branch.
$ cd intermediate-git-test-repo
$ git switch broken
Tracking down a broken commit
Let’s take a look at the contents of this repository
$ ls
We see a small number of files; let’s have a look inside plot_bouys.py
.
$ nano plot_buoys.py
Let’s try to run the code
$ python plot_bouys.py
This clearly has a problem, as expect. Let’s look at the log history to see if we can spot it.
$ git log --oneline
If we looked at this for a while, can could probably spot the commit that might be causing the issue, the commit labelled “changing function to plot_data”. In reality however, finding the problem wouldn’t be this simple. In general, we might not know what file the problem is in, or where in that file. We may have hundreds of files with hundreds of lines each, and no idea where to start looking. Let’s start by looking at the initial commit.
$ git checkout 2890
And see if the plot_buoys.py
script runs here.
$ python plot_buoys.py
The file runs with no problems from an earlier commit, somewhere since this commit something went wrong. In this section, we will explore ways in which we can investigate the sources of errors.
Let’s move back to the head of the broken
branch.
$ git checkout broken
Tracking down broken commits with git blame
If we know where the problem is in the file, we might ask ourselves what (and who) introduced this problem. What commit introduced this line. Let’s try this with
$ git blame plot_buoys.py
We see that most lines were created in the same two commits, but some were modified in other commits. There are a lot of lines here, let’s focus on the range of lines 57 to 61 (the part not in a function)
$ git blame -L 57,61 plot_buoys.py
That’s better. Let’s take a closer look at the commit on line 61.
$ git show 4445
That’s interesting. We have found a change to that line, but not the one which
altered the function name. Let’s try going back a bit in the history with git checkout
and do this again.
$ git checkout HEAD~1
$ git blame -L 57,61 plot_buoys.py
This still hasn’t found the commit which renamed the function, let’s try going back further.
$ git checkout HEAD~1
$ git blame -L 57,61 plot_buoys.py
We can see that the problematic line was brought in during commit eecf
.
Multiple commits after something breaks can make git blame
a little harder to use.
Challenge: Using git blame across files
We can ask
git blame
to attempt to track changes across files. For example where code is copied and pasted from one file to another or where files are renamed usinggit mv
. We can do this by specifying the-C
option togit blame
. Usegit blame -C
to identify which lines ofplot_buoys.py
came from another file. Then usegit show
orgit checkout
to examine the contents of this file.Solution
$ git blame -C plot_buoys.py
This came from description.txt in commit
73592708
. We can examine this commit with:$ git show 7359
or
$ git checkout 7359 $ cat description.txt $ git checkout broken #get back to the head of the branch
Binary searching with Git
We could checkout each commit one at a time, and check each one, but this is very time consuming. We’d have to check out each commit one at a time, like this
$ git checkout HEAD~7
$ git checkout HEAD~6
$ git checkout HEAD~5
...
$ git checkout HEAD~3
$ git checkout HEAD~2
$ git checkout HEAD~1
We can do better than this if we choose a half way point between the
bad and good commit, check if that is good or bad, and keep choosing a
half way point until we find the commit that causes the code to go
from good to bad. Git can actually help us do this with the git
bisect
command. Let’s try it, first let’s make sure we have reset
HEAD
to the most recent commit on the broken
branch.
$ git checkout broken
$ git bisect start
We mark the current commit as bad
$ git bisect bad HEAD
Then we can mark the commit from the merge as good
$ git bisect good 116c
Git will now drop us at a commit half way between the good and the bad commits, which should be commit 2890. We can verify this with
$ git log --oneline broken
We see some commits marked as bad and good, and git has placed us in the middle commit. Now we can test this commit
$ python plot_buoys.py
It works! The code wasn’t broken at this point. Let’s mark this commit as good
$ git bisect good
Great, git has moved us again. Let’s check where we are this time, it should be commit d022
$ git log --oneline broken
The markers for good and bad have moved, because we’ve given bisect more information, and HEAD
has been placed between them.
$ python plot_buoys.py
This failed, let’s mark this as a bad commit
$ git bisect bad
We found a bad commit, let’s take a look at where we are now:
$ git log --oneline master
Git has marked the good and bad commits, but it doesn’t know yet if the previous commit might have been the first bad one. It needs us to check that. Let’s go ahead and do that
$ python plot_buoys.py
This is also a bad commit, let’s mark it
$ git bisect bad
We’ve now only got one commit left so Git automatically identifies the commit which broke things as eecf
. Had we marked our good commit one commit earlier then
we could have used git bisect good
when we came across the first good commit.
Finally, git has found the commit we were looking for and told us where it is. Let’s see where we are
$ git log --oneline master
Git has marked the relevant commits as bad, but it hasn’t moved us to the first bad commit. It left us in this pending state. Let’s take a look at the content of the breaking commit
$ git show eecf
Git is telling us that the problem was introduced by a change that
happened on line 55 of plot_buoys.py
where plot_buoy_data()
was changed to
plot_data()
. For us, this was probably a problem that is easy enough to
resolve without using bisect, but for a large complex code base when
we don’t know where to start, bisect can instantly point us to the
change which first caused the problem. Let’s exit the bisect state and
go back to master with
$ git bisect reset
This worked great, and we can go through large numbers of commits with this technique, but there was a lot of typing. Can Git do a better job? It turns out that it can. Let’s look at the return value from Python
$ python plot_buoys.py
$ echo $?
The variable $?
is a special variable containing the return value of
the function. In this case it is non-zero, indicating an error. Let’s
look at the historic commit
$ git log --oneline
$ git checkout 2890
And test the code
$ python plot_buoys.py
$ echo $?
In this case the script returns 0, indicating success. This is a common convention in Unix scripts, and you can write your own scripts that follow this convention. Git can use this convention to decide if a commit is good or bad. Let’s try it
$ git bisect start HEAD 2890
Once again, git drops us in the middle of a commit. This time, instead
of running python plot_buoys.py
, we tell Git to run it for us
$ git bisect run 'python plot_buoys.py'
Git does all the boring work for us. Every time it runs the command we gave and gets a zero return value, it marks the commit as good, every time it sees a non-zero value, it marks the commit as bad. It then tells us the first commit if finds which changes the state of the repository from “good” to “bad”. Now that we’re done, we exit again with
$ git bisect reset
One caveat
This is a very powerful debugging tool, but it relies on all your code being in a runnable state, such that Git can automatically identify when this state changes. It works best when used with a branching and merging strategy, to ensure there are no breaking commits on the main branch.
Key Points
git blame
can identify when a problem line was introduced.
git bisect
can be used to binary search through git history to identify lines which first introduced a problem.