Introduction
Overview
Teaching: 10 min
Exercises: 0 minQuestions
What do I do when I need to make complex decisions with my git respository?
How do I collaborate on a software project with others?
Objectives
Understand the range of functionality that exists in git.
Understand the different challenges that arrise with collaborative projects.
Git Refresher
Git is a version control system for tracking changes in computer files and coordinating work on those files among multiple people. It is primarily used for source code management in software development but it can be used to track changes in files in general - it is particularly effective for tracking text-based files (e.g. source code files, CSV, Markdown, HTML, CSS, Tex, etc. files).
Git has several important characteristics:
- support for non-linear development allowing you and your colleagues to work on different parts of a project concurrently,
- support for distributed development allowing for multiple people to be working on the same project (even the same file) at the same time,
- every change recorded by Git remains part of the project history and can be retrieved at a later date, so even if you make a mistake you can revert to a point before it.
It uses a distributed version control model (the “beautiful graph theory tree model”), meaning that there is no single central repository of code. Instead, users share code back and forth to synchronise their repositories, and it is up to each project to define processes and procedures for managing the flow of changes into a stable software product.
Git is powerful and flexible to fit a wide range of use cases and workflows from simple projects written by a single contributor to projects that are millions of lines and have hundreds of co-authors. Furthermore, it does a task that is quite complex. As a result, many users may find it challenging to navigate this complexity. While committing and sharing changes is fairly straightforward, for instance, but recovering from situations such as accidental commits, pushes or bad merges is difficult without a solid understanding of the rather large and complex conceptual model. Case in point, three of the top five highest voted questions on Stack Overflow are questions about how to carry out relatively simple tasks: undoing the last commit, changing the last commit message, and deleting a remote branch.
Mouse-over text: If that doesn’t fix it, git.txt contains the phone number of a friend of mine who understands git. Just wait through a few minutes of ‘It’s really pretty simple, just think of branches as…’ and eventually you’ll learn the commands that will fix everything.
With this lesson our goal is to give a you a more in-depth understanding of the conceptual model of git, to guide you through increasingly complex workflows and to give you the confidence to participate in larger projects.
The diagram below shows a typical software development lifecycle with Git (in our case starting from making changes in a local branch that “tracks” a remote branch) and the commonly used commands to interact with different parts of the Git infrastructure, including:
- working tree -
a local directory (including any subdirectories) where your project files live
and where you are currently working.
It is also known as the “untracked” area of Git or “working directory”.
Any changes to files will be marked by Git in the working tree.
If you make changes to the working tree and do not explicitly tell Git to save them -
you will likely lose those changes.
Using
git add filename
command, you tell Git to start tracking changes to filefilename
within your working tree. - staging area (index) -
once you tell Git to start tracking changes to files
(with
git add filename
command), Git saves those changes in the staging area on your local machine. Each subsequent change to the same file needs to be followed by anothergit add filename
command to tell Git to update it in the staging area. To see what is in your working tree and staging area at any moment (i.e. what changes is Git tracking), run the commandgit status
. - local repository -
stored within the
.git
working tree of your project locally, this is where Git wraps together all your changes from the staging area and puts them using thegit commit
command. Each commit is a new, permanent snapshot (checkpoint, record) of your project in time, which you can share or revert to. - remote repository -
this is a version of your project that is hosted somewhere on the Internet
(e.g., on GitHub, GitLab or somewhere else).
While your project is nicely version-controlled in your local repository,
and you have snapshots of its versions from the past,
if your machine crashes - you still may lose all your work. Furthermore, you cannot
share or collaborate on this local work with others easily.
Working with a remote repository involves pushing your local changes remotely
(using
git push
) and pulling other people’s changes from a remote repository to your local copy (usinggit fetch
orgit pull
) to keep the two in sync in order to collaborate (with a bonus that your work also gets backed up to another machine). Note that a common best practice when collaborating with others on a shared repository is to always do agit pull
before agit push
, to ensure you have any latest changes before you push your own.
Git Version Control Tool
To test your Git installation, type:
$ git help
If your Git installation is working you should see something like:
usage: git [-v | --version] [-h | --help] [-C <path>] [-c <name>=<value>]
[--exec-path[=<path>]] [--html-path] [--man-path] [--info-path]
[-p | --paginate | -P | --no-pager] [--no-replace-objects] [--bare]
[--git-dir=<path>] [--work-tree=<path>] [--namespace=<name>]
[--config-env=<name>=<envvar>] <command> [<args>]
These are common Git commands used in various situations:
start a working area (see also: git help tutorial)
clone Clone a repository into a new directory
init Create an empty Git repository or reinitialize an existing one
work on the current change (see also: git help everyday)
add Add file contents to the index
mv Move or rename a file, a directory, or a symlink
restore Restore working tree files
rm Remove files from the working tree and from the index
examine the history and state (see also: git help revisions)
bisect Use binary search to find the commit that introduced a bug
diff Show changes between commits, commit and working tree, etc
grep Print lines matching a pattern
log Show commit logs
show Show various types of objects
status Show the working tree status
grow, mark and tweak your common history
branch List, create, or delete branches
commit Record changes to the repository
merge Join two or more development histories together
rebase Reapply commits on top of another base tip
reset Reset current HEAD to the specified state
switch Switch branches
tag Create, list, delete or verify a tag object signed with GPG
collaborate (see also: git help workflows)
fetch Download objects and refs from another repository
pull Fetch from and integrate with another repository or a local branch
push Update remote refs along with associated objects
'git help -a' and 'git help -g' list available subcommands and some
concept guides. See 'git help <command>' or 'git help <concept>'
to read about a specific subcommand or concept.
See 'git help git' for an overview of the system.
When you use Git on a machine for the first time, you need to configure a few things:
- your name,
- your email address (the one you used to open your GitHub account with, which will be used to uniquely identify your commits),
- preferred text editor for Git to use (e.g.
nano
or another text editor of your choice), - whether you want to use these settings globally (i.e. for every Git project on your machine).
This can be done from the command line as follows:
$ git config --global user.name "Your Name"
$ git config --global user.email "name@example.com"
$ git config --global core.editor "nano -w"
GitHub Account
GitHub is a free, online host for Git repositories that you will use during the course to store your code in so you will need to open a free GitHub account unless you do not already have one.
Secure Access To GitHub Using Git From Command Line
In order to access GitHub using Git from your machine securely, you need to set up a way of authenticating yourself with GitHub through Git. The recommended way to do that for this course is to set up SSH authentication - a method of authentication that is more secure than sending passwords over HTTPS and which requires a pair of keys - one public that you upload to your GitHub account, and one private that remains on your machine.
GitHub provides full documentation and guides on how to:
A short summary of the commands you need to perform is shown below.
To generate an SSH key pair, you will need to run the ssh-keygen
command from your command line tool/GitBash
and provide your identity for the key pair (e.g. the email address you used to register with GitHub)
via the -C
parameter as shown below.
Note that the ssh-keygen
command can be run with different parameters -
e.g. to select a specific public key algorithm and key length;
if you do not use them ssh-keygen
will generate an
RSA
key pair for you by default.
GitHub now recommends that you use a newer cryptographic standard (such as EdDSA variant algorithm Ed25519),
so please be sure to specify it using the -t
flag as shown below.
It will also prompt you to answer a few questions -
e.g. where to save the keys on your machine and a passphrase to use to protect your private key.
Pressing ‘Enter’ on these prompts will get ssh-keygen
to use the default key location (within
.ssh
folder in your home directory)
and set the passphrase to empty.
$ ssh-keygen -t ed25519 -C "your-github-email@example.com"
Generating public/private ed25519 key pair.
Enter file in which to save the key (/Users/<YOUR_USERNAME>/.ssh/id_ed25519):
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /Users/<YOUR_USERNAME>/.ssh/id_ed25519
Your public key has been saved in /Users/<YOUR_USERNAME>/.ssh/id_ed25519.pub
The key fingerprint is:
SHA256:qjhN/iO42nnYmlpink2UTzaJpP8084yx6L2iQkVKdHk your-github-email@example.com
The key's randomart image is:
+--[ED25519 256]--+
|.. .. |
| ..o A |
|. o.. |
| .o.o . |
| ..+ = B |
| .o = .. |
|o..X *. |
|++B=@.X |
|+*XOoOo+ |
+----[SHA256]-----+
Next, you need to copy your public key (not your private key - this is important!) over to
your GitHub account. The ssh-keygen
command above will let you know where your public key is saved (the file should have the
extension “.pub”), and you can get its contents (e.g. on a Mac OS system) as follows:
$ cat /Users/<YOUR_USERNAME>/.ssh/id_ed25519.pub
ssh-ed25519 AABAC3NzaC1lZDI1NTE5AAAAICWGVRsl/pZsxx85QHLwSgJWyfMB1L8RCkEvYNkP4mZC your-github-email@example.com
Copy the line of output that starts with “ssh-ed25519” and ends with your email address (it may start with a different algorithm name based on which one you used to generate the key pair and it may have gone over multiple lines if your command line window is not wide enough).
Finally, go to your GitHub Settings -> SSH and GPG keys -> Add New page to add a new SSH public key. Give your key a memorable name (e.g. the name of the computer you are working on that contains the private key counterpart), paste the public key from your clipboard into the box labelled “Key” (making sure it does not contain any line breaks), then click the “Add SSH key” button.
Now, we can check that the SSH connection is working:
$ ssh -T git@github.com
What About Passwords?
While using passwords over HTTPS for authentication is easier to setup and will allow you read access to your repository on GitHub from your machine, it alone is not sufficient any more to allow you to send changes or write to your remote repository on GitHub. This is because, on 13 August 2021, GitHub has strengthened security requirements for all authenticated Git operations. This means you would need to use a personal access token instead of your password for added security each time you need to authenticate yourself to GitHub from the command line (e.g. when you want to push your local changes to your code repository on GitHub). While using SSH key pair for authentication may seem complex, once set up, it is actually more convenient than keeping track of/caching your access token.
Key Points
Git version control records text-based differences between files.
Each git commit records a change relative to the previous state of the documents.
Git has a range of functionality that allows users to manage the changes they make.
This complex functionality is especially useful when collaborating on projects with others
Forks
Overview
Teaching: 0 min
Exercises: 0 minQuestions
What are forks?
Objectives
Understand how forks are created.
A fork of a repository is a new repository which shares code and history with the repository it was forked from - it is a server-side copy (clone) of the original repository. This is particularly useful when you want to work on an open source project where you don’t have write permissions to the repository.
Creating a fork and a local copy of the fork
In order to try out the commands in this lesson we need to set up a repository on GitHub:
- Go to https://github.com/NOC-OI/intermediate-git-test-repo
- Click on the
Fork
button on the top right and follow the instructions. When this is process is done, you will be directed to your copy of the repository on GitHub. - Click the green
Code
button. Copy theSSH
orHTTPS
path to the repository to your local machine. Do not download a ZIP file. - Create a local copy. The command will be similar to this but with your user name:
git clone git@github.com:<user-name>/intermediate-git-test-repo.git
Exercise 1: Create a fork
Follow the above instructions to create a fork of the intermediate-git-test-repo. Then create a local copy of it.
We’ll discuss forking further in the Remotes and Branching Models chapters.
Key Points
A fork is a server-side copy of a repository
A fork can be created on Github through the
Fork
button in the top right
Branches
Overview
Teaching: 0 min
Exercises: 0 minQuestions
What are branches?
How do I view the current branches?
How do I manipulate branches?
Objectives
Understand how branches are created.
Learn the key commands to view and manipulate branches.
Branching is a feature available in most modern version control systems. Branching in other version control systems can be an expensive operation in both time and disk space. In git
, branches are a part of your everyday development process. When you want to add a new feature or fix a bug—no matter how big or how small—you spawn a new branch to encapsulate your changes. This makes it harder for unstable code to get merged into the main code base, and it gives you the chance to clean up your future’s history before merging it into the main branch.
The diagram above visualizes a repository with two isolated lines of development, one for a little feature, and one for a longer-running feature. By developing them in branches, it’s not only possible to work on both of them in parallel, but it also keeps the main branch free from questionable code.
The implementation behind Git branches is much more lightweight than other version control system models. Instead of copying files from directory to directory, Git stores a branch as a reference to a commit. In this sense, a branch represents the tip of a series of commits—it’s not a container for commits. The history for a branch is extrapolated through the commit relationships.
(https://www.atlassian.com/git/tutorials/using-branches)
What is a branch?
In git
a branch is effectively a pointer to a snapshot of your changes. It’s important to understand that branches are just pointers to commits. When you create a branch, all Git needs to do is create a new pointer, it doesn’t change the repository in any other way. If you start with a repository that looks like this:
Then, you create a branch using the following command:
git branch crazy-experiment
The repository history remains unchanged. All you get is a new pointer to the current commit:
Note that this only creates the new branch. To start adding commits to it, you need to select it with git switch
, and then use the standard git add
and git commit
commands.
A branch also means an independent line of development. Branches serve as an abstraction for the edit/stage/commit process. New commits are recorded in the history for the current branch, which results in a fork in the history of the project. However, it is really important to remember that each commit only records the incremental change in the document and NOT the full history of changes. Therefore, while we think of a branch as a sequence of commits, each commit is an independent unit of change.
Branching Commands
Creating, deleting, and modifying branches is quick and easy; here’s a summary of the commands:
To list all branches:
git branch
git branch -avv
To create a new branch named <branch>
, which references the same point in history as the current branch.
git branch <branch>
To create a new branch named <branch>
, referencing <start-point>
, which may be specified any way you like, including using a branch name or a tag name:
git branch <branch> <start-point>
To delete the branch <branch>
; if the branch is not fully merged in its upstream branch or contained in the current branch, this command will fail with a warning:
git branch -d <branch>
To delete the branch <branch>
irrespective of its merged status:
git branch -D <branch>
To switch to a different branch <branch>
, updating the working directory to reflect the version referenced by <branch>
.
git switch <branch>
To create a new branch <new>
referencing <start-point>
, and switch to it.
git switch -c <new> <start-point>
The special symbol "HEAD"
can always be used to refer to the current branch. In fact, Git uses a file named HEAD
in the .git
directory to remember which branch is current:
$ cat .git/HEAD
ref: refs/heads/main
Renaming a branch can be done with the -m
tag:
git branch -m <old-branch-name> <new-branch-name>
Exercise 1: Create a feature branch
Create a feature branch in your local copy of intermediate-git-test-repo. Make a change to the
plot_buoys.py
script, such as changing the colour of the buoy symbols on line 28. Commit your change and push it to your fork on Github.Solution
git branch featue-branch git switch featue-branch
Make some edit to
plot_buoys.py
.git add plot_buoys.py git commit -m "Change colour of buoy marker" git push origin feature-branch
Key Points
A branch represents an independent line of development.
git branch
creates a new pointer to the current state of the repository and allows you to make subsequent changes from that state.Subsequent changes are considered to belong to that branch.
The final commit on a given branch is its HEAD.
Remote Repositories
Overview
Teaching: 0 min
Exercises: 0 minQuestions
How can I work in remote teams and with remotely hosted code?
Objectives
Learn to work with multiple remotes
https://www.atlassian.com/git/tutorials/syncing
Git’s distributed collaboration model, which gives every developer their own copy of the repository, complete with its own local history and branch structure. Users typically need to share a series of commits rather than a single “changeset”. Instead of committing a “changeset” from a working copy to the central repository, Git lets you share entire branches between repositories.
Git remote
The git remote command lets you create, view, and delete connections to other repositories. Remote connections are more like bookmarks rather than direct links into other repositories. Instead of providing real-time access to another repository, they serve as convenient names that can be used to reference a not-so-convenient URL.
For example, the diagram above shows two remote connections from your repo into the central repo and another developer’s repo. Instead of referencing them by their full URLs, you can pass the origin and john shortcuts to other Git commands.
The git remote
command is essentially an interface for managing a list of remote entries that are stored in the repository’s ./.git/config
file. The following commands are used to view the current state of the remote list.
Git is designed to give each developer an entirely isolated development environment. This means that information is not automatically passed back and forth between repositories. Instead, developers need to manually pull upstream commits into their local repository or manually push their local commits back up to the central repository. The git remote
command is really just an easier way to pass URLs to these “sharing” commands.
View Remote Configuration
To list the remote connections of your repository to other repositories you can use the git remote
command:
git remote
If you test this in the local repo you’ve created, you should get only one connection, origin
:
origin
When you clone a repository with git clone
, git
automatically creates a remote connection called origin
pointing back to the cloned repository. This is useful for developers creating a local copy of a central repository, since it provides an easy way to pull upstream changes or publish local commits. This behaviour is also why most Git-based projects call their central repository origin.
We can ask git
for a more verbose (-v
) answer which gives us the URLs for the connections:
git remote -v
For our training repository this should return:
origin git@github.com:<user-name>/intermediate-git-test-repo.git (fetch)
origin git@github.com:<user-name>/intermediate-git-test-repo.git (push)
As expected these point to the repository you cloned.
Create and Modify Connections
The git remote
command also lets you manage connections with other repositories. The following commands will modify the repo’s ./.git/config
file. The result of the following commands can also be achieved by directly editing the ./.git/config
file with a text editor.
Create a new connection to a remote repository. After adding a remote, you’ll be able to use <name>
as a convenient shortcut for <url>
in other Git commands.
git remote add <name> <url>
Remove the connection:
git remote rm <name>
Rename a connection:
git remote rename <old-name> <new-name>
To get high-level information about the remote <name>
:
git show <name>
Exercise 1: Add a connection to your neighbour’s repository.
Having this kind of access to individual developers’ repositories makes it possible to collaborate outside of the central repository. This can be very useful for small teams working on a large project.
Solution
git remote add john http://dev.example.com/john.git
Multiple remotes
Remember that when you create a new branch without specifying a starting point, then the starting point will be the current state and branch. In order to avoid confusion, ALWAYS branch from the stable version. Here is how you would branch from your own origin/main branch:
git fetch origin main
git branch <branch> origin/main
git switch <branch>
You must fetch first so that you have the most recent state of the repository.
If there is another “true” version/state of the project, then this connection may be set as upstream (or something else). Upstream
is a common name for the stable repository, then the sequence will be:
git fetch upstream main
git branch <branch> upstream/main
git switch <branch>
Now we can set the NOC-OI version of our repository as the upstream for our local copy.
Exercise 2: Setting upstream and checking out remote branches.
Set the https://github.com/NOC-OI/intermediate-git-test-repo repo as the upstream locally. Then check which branches exist upstream and create a local branch with the starting point of the upstream branch called
develop
.Solution
git remote add upstream git@github.com:NOC-OI/intermediate-git-test-repo.git git fetch upstream git branch -vv -a git branch develop upstream/develop git switch develop
Now examine the state of your repository with
git branch
,git branch -vva
(to see all the branches),git remote -v
andgit remote show upstream
.
We can see that we are now on a local branch develop
, which is
configured to track the develop
branch in upstream
. Running git
push
and git pull
in this branch will automatically push to the
upstream branch. We can verify this with
$ git pull -v
This branch has a small commit which is not in your origin
remote.
Exercise 3: Pushing to origin.
Push these changes we’ve pulled down to own remote repository (
origin
).Solution
$ git push origin develop
We can configure as many remotes as we like. If you work closely with friends or colleagues, it could be common for you to want to pull interesting changes from their remotes, incorporate those into your current branches, and push those changes to your remote.
Key Points
The
git remote
command allows us to create, view and delete connections to other repositories.Remote connections are like bookmarks to other repositories.
Other git commands (
git fetch
,git push
,git pull
) use these bookmarks to carry out their syncing responsibilities.
Branching Models
Overview
Teaching: 0 min
Exercises: 0 minQuestions
What is a branching model?
Why do you need one?
What are the most common branching models?
Objectives
Learn about the importance of a branching model.
What is a branching model/strategy?
Branches are primarily used as a means for teams to develop features giving them a separate workspace for their code. These branches are usually merged back to a main branch upon completion of work. (You may come across the term ‘master’, rather than ‘main’. This used to be the usual term, but is now gradually becoming less common.) In this way, features (and any bug and bug fixes) are kept apart from each other allowing you to fix mistakes more easily.
This means that branches protect the mainline of code and any changes made to any given branch don’t affect other developers.
A branching strategy, therefore, is the strategy that software development teams adopt when writing, merging and deploying code when using a version control system.
It is essentially a set of rules that developers can follow to stipulate how they interact with a shared codebase.
Such a strategy is necessary as it helps keep repositories organized to avoid errors in the application and the dreaded merge hell when multiple developers are working simultaneously and are all adding their changes at the same time. Such merge conflicts would eventually deter the combination of contributions from multiple developers.
Thus, adhering to a branching strategy will help solve this issue so that developers can work together without stepping on each other’s toes. In other words, it enables teams to work in parallel to achieve faster releases and fewer conflicts by creating a clear process when making changes to source control.
When we talk about branches, we are referring to independent lines of code that branch off the main branch, allowing developers to work independently before merging their changes back to the code base.
In this and the following episodes, we will outline some of the branching strategies that teams use in order to organize their workflow where we will look at their pros and cons and which strategy you should choose based on your needs, objectives and your team’s capabilities.
Why do you need a branching model?
As mentioned above, having a branching model is necessary to avoid conflicts when merging and to allow for the easier integration of changes into the main trunk.
A BRANCHING MODEL AIMS TO:
- Enhance productivity by ensuring proper coordination among developers
- Enable parallel development
- Help organize a series of planned, structured releases
- Map a clear path when making changes to software through to production
- Maintain a bug-free code where developers can quickly fix issues and get these changes back to production without disrupting the development workflow
Git Branching Models
Some version control systems are Very Opinionated about the branching models that can be used. git
is very much (fortunately or unfortunately) not. This means that there are many different ways to do development in a team and the team needs to explicitly agree on how and when to merge contributions to the main branch. So the first rule of git
granching is: “Talk about your branching model.” The second rule is: “Talk about your branching model.” If in doubt, do what other people around you are doing. If they don’t do anything, call a friend.
That said, there are a number of established (and less so) branching models that are used with git
. These include, but are not limited to:
-
Centralized workflow: enables all team members to make changes directly to the main branch. Every change is logged into the history. In this workflow, the contributors do not use other branches. Instead they all make changes on the main branch directly and commit to it. This works for individual developers or small teams which communicate very well, but can be tricky for larger teams: the code is in constant state of flux and developers keep changes local until they are ready to release.
-
Trunk-based development: is somewhat similar to the centralized workflow. The development happens on a single branch called
trunk
. When changes need to be merged, each developer pulls and rebases from the trunk branch and resolves conflicts locally. This can work if small merges are made frequently and is more successful if there is CI/CD. -
Feature branch workflow: every small change or “feature” gets its own branch where the developers make changes. Once the feature is done, they submit a merge/pull request and merge it into the main branch. Features branches should be relatively short-lived. The benefit of this model is that the main branch is not poluted by unfinished features. Good for teams.
-
Gitflow: is a model where the main development happens in a develop branch with feature branches. When the develop branch is ready for a release (or to go into production), a team member creates a release branch which is tested and eventually merged onto the dev and eventually main branch.
-
GitHub flow: similar to the branching workflow. (Further info)
-
GitLab flow: is a simplified version of Gitflow. (Further info)
-
Oneflow: is similar to Gitflow but relies on the maintanance of one long-lived branch. It is meant to be simpler, without a develop branch but feature branches still exist. (Further info)
-
Forking workflow (e.g. astropy): is a model where each contributor creates a
fork
or a complete copy of the repository. Every contributor effectively has two repositories: his own and the main (upstream) one. Changes are made as pull requests against the main repository. This model is popular with open source projects because the vast majority of contributors do not need to have priviledges in the main repository.
A longer description of some of these can be found here.
In summary, there are many different ways to collaborate on a project. Look at the pros and cons and select one that fits the needs and organization of your team and project. In the following several sections we look at some of these models in more detail.
Feature Branch Workflow
While it is technically OK to commit your changes directly to main
branch,
and you may often find yourself doing so for some minor changes,
the best practice is to use a new branch for each separate and self-contained unit/piece of work
you want to add to the project.
This unit of work is also often called a feature
and the branch where you develop it is called a feature branch.
Each feature branch should have its own meaningful name -
indicating its purpose (e.g. “issue23-fix”).
If we keep making changes and pushing them directly to the main
branch on GitHub,
then anyone who downloads our software from there will get all of our work in progress -
whether or not it is ready to use!
So, working on a separate branch for each feature you are adding is good for several reasons:
- it enables the main branch to remain stable while you and the team explore and test the new code on a feature branch,
- it enables you to keep the untested and not-yet-functional feature branch code under version control and backed up,
- you and other team members may work on several features at the same time independently from one another, and
- if you decide that the feature is not working or is no longer needed - you can easily and safely discard that branch without affecting the rest of the code.
Gitflow Workflow
In the Gitflow workflow,
we typically have a main branch which is the version of the code that is
tested, stable and reliable.
Then, we normally have a development branch
(called develop
or dev
by convention)
that we use for work-in-progress code.
As we work on adding new features to the code,
we create new feature branches that first get merged into develop
after a thorough testing process.
After even more testing - develop
branch will get merged into main
.
The points when feature branches are merged to develop
,
and develop
to main
depend entirely on the practice/strategy established in the team.
For example, for smaller projects
(e.g. if you are working alone on a project or in a very small team),
feature branches sometimes get directly merged into main
upon testing,
skipping the develop
branch step.
In other projects,
the merge into main
happens only at the point of making a new software release.
Whichever is the case for you, a good rule of thumb is -
nothing that is broken should be in main
.
An example is shown in the diagram below.
An example of Gitflow workflow
Adapted from Git Tutorial by sillevl (Creative Commons Attribution 4.0 International License)
Forking Workflow
The forking workflow is popular among open source software projects and often used in conjunction with a branching model.
The focus of this workflow is to keep the “upstream main” stable while allowing anyone to work on their own contributions independently. Contributions are then suggested and accepted via pull requests. There is not necessarily a develop branch, but you may have release branches.
Source: GitHub
In order to understand the forking workflow, let’s first take a look at some special words and roles needed (we’ve already talked about some of these today!):
upstream - Remote repository containing the “true copy”
origin - Remote repository containing the forked copy
Pull request(PR) - Merge request from fork to upstream (a request to add your suggestions to the “original copy”)
Maintainer - Someone with write access to upstream who vets PRs
Contributor - Someone who contributes to upstream via PRs
Release manager - A maintainer who also oversees releases
Here is some info about workflows used in a couple of projects as real life examples:
Example release workflow for the astropy Python package
Spacetelescope (STScI) style guide for release workflow
Exercise 1: Suggest your changes via pull request
Earlier in this workshop, you pushed a feature branch up to
origin
in which you had made a small change toplot_buoys.py
. Go to your repository (your fork) on GitHub and find the tab called “Pull requests”. Click the green “new pull request” button. Then find and click the blue link uder “Compare changes” called “compare across fork”. Select your username and branch name from the right menus. Then click the big green button under the menus called “create pull request”.
Key Points
A branching model is a pre-agreed way of merging branches into the main branch.
A branching model is needed when multiple contributors are making changes to a single project.
Merging
Overview
Teaching: 0 min
Exercises: 0 minQuestions
How do I merge a branch changes?
Objectives
Learn about
git merge
.
When you are collaborating, you will have to merge a branch independent if your branch may or may not have diverged from the main branch. Most of the Git hosting platform like GiHub or GitLab allows you to merge a branch from their web interface but you can also merge the branches from your machine using git merge
.
There are 2 ways to merge:
-
non-fast-forward merged (recommended)
-
fast forward merged
Reminder: when starting work on a new feature, be careful where you branch from!
git remote add upstream https://github.com/mpi-astronomy/advanced-git-training.git
git fetch upstream
git checkout -b develop upstream/develop
Non-fast-forward Merge
Merges branch by creating a merge commit. Prompts for merge commit message. Ideal for merging two branches.
git checkout main
git merge --no-ff <branch> -m "Message"
The --no-ff
flag causes the merge to always create a new commit object, even if the merge could be performed with a fast-forward. This avoids losing information about the historical existence of a feature branch and groups together all commits that together added the feature.
Exercise: Creating a non-fast-forward merge.
Create a new Git repository that has the following tree.
* 69fac81 (main) Merge branch 'gitignore' |\ | * 5537012 (gitignore) Add .gitignore |/ * 6ec7c0f Add README
Solution
git init touch README.md git add README.md git commit -m 'Add README' git checkout -b gitignore touch .gitignore git add .gitignore git commit -m "Add .gitignore" git checkout main git merge --no-ff gitignore
Fast-forward Merge
If there are no conflicts with the main branch, a “fast-forward” merge can be executed with. This will NOT create a merge commit! Aborts merge if it cannot be done. Ideal for updating a branch from remote.
git checkout main
git merge --ff-only <branch>
If using the fast-forward merge, it is impossible to see from the git
history which of the commit objects together have implemented a feature. You would have to manually read all the log messages. Reverting a whole feature (i.e. a group of commits), is a true headache in the latter situation, whereas it is easily done if the –no-ff flag was used.
For a good illustration of fast-forward merge (and other concepts), see this thread.
Exercise: Creating a fast-forwad merge.
Consider the following Git tree
* a78b99f (main) Add title | * 3d88062 (remote) Add .gitignore |/ * 86c4247 Add README
Is possible to run a fast-forward merge to incorporate the branch
remote
intomain
?Solution
It is not possible to run a fast-forward merge because of commit
a78b99f
.
Three-way Merge
Similar to --no-ff
, but there may be dragons. Forced upon you when there’s an intermediate change since you branched.
May prompt your to manually resolve
git merge <branch> [-s <strategy>]
See here for a zillion options (“patience”, “octopus”, etc), But also git is only so smart and you are probably smarter.
See here and here for some discussion of merging strategies.
Note: there are a number of external tools that have a graphical interface to allow for merge conflict resolution. Some of these include: kdiff3 (Windows, Mac, Linux), Meld (Windows, Linux), P4Merge (Windows, Mac, Linux), opendiff (Mac), vimdiff (for Vim users), Beyond Compare, GitHub web interface. We do not endorse any of them and use at your own risk. In any case, using a graphical interface does not substitute for understanding what is happening under the hood.
Resolving Conflicts
Squashing
Key Points
git merge --no-ff
is the best way to merge changes
git merge --ff-only
is a good way to pull down changes from remote
Advanced Logging
Overview
Teaching: 0 min
Exercises: 0 minQuestions
How can I find the commits I’m looking for?
Objectives
Learn to search for specific changes
The purpose of any version control system is to record changes to your
code. This gives you the power to go back into your project history to
see who contributed what, figure out where bugs were introduced, and
revert problematic changes. But, having all of this history available
is useless if you don’t know how to navigate it. That’s where the git
log
command comes in.
We’ve already encountered the git log
command for displaying
commits. But, we can alter this output by passing many different
parameters to git log
.
The advanced features of git log
can be split into two categories:
formatting how each commit is displayed, and filtering which commits
are included in the output. Together, these two skills give us the
power to go back into your project and find any information that you
could possibly need.
Let’s grab a repository from the web to start working with
$ git clone git@github.com:sa2c/example-gitflow-modified.git ~/example-gitflow-modified
Formatting Log Output
First, let’s take a look at the many ways in which git log
’s output
can be formatted. Most of these come in the form of flags that let you
request more or less information from git log
.
If you don’t like the default git log
format, you can use git
config
’s aliasing functionality to create a shortcut for any of the
formatting options discussed below. Please see in The git config
Command for how to set up an alias.
Oneline
The --oneline
flag condenses each commit to a single line. By
default, it displays only the commit ID and the first line of the
commit message. If we type
$ git log --oneline
We’ll get output which will look something like this:
0e25143 Merge branch 'feature'
ad8621a Fix a bug in the feature
16b36c6 Add a new feature
23ad9ad Add the initial code base
This is very useful for getting a high-level overview of your project.
Diffs
The git log
command includes many options for displaying diffs with
each commit. Two of the most common options are --stat
and -p
.
The --stat
option displays the number of insertions and deletions to
each file altered by each commit (note that modifying a line is
represented as 1 insertion and 1 deletion).
Let’s try this
$ git log --stat
This is useful when you want a brief summary of the changes introduced by each commit. For example, the following commit added 67 lines to the hello.py file and removed 38 lines:
commit f2a238924e89ca1d4947662928218a06d39068c3
Author: John <john@example.com>
Date: Fri Jun 25 17:30:28 2014 -0500
Add a new feature
hello.py | 105 ++++++++++++++++++++++++-----------------
1 file changed, 67 insertion(+), 38 deletions(-)
The amount of +
and -
signs next to the file name show the
relative number of changes to each file altered by the commit. This
gives you an idea of where the changes for each commit can be found.
If you want to see the actual changes introduced by each commit, you
can pass the -p
option to git log.
$ git log -p
This outputs the entire patch representing that commit:
commit 16b36c697eb2d24302f89aa22d9170dfe609855b
Author: Mary <mary@example.com>
Date: Fri Jun 25 17:31:57 2014 -0500
Fix a bug in the feature
diff --git a/hello.py b/hello.py
index 18ca709..c673b40 100644
--- a/hello.py
+++ b/hello.py
@@ -13,14 +13,14 @@ B
-print("Hello, World!")
+print("Hello, Git!")
For commits with a lot of changes, the resulting output can become quite long and unwieldy. More often than not, if you’re displaying a full patch, you’re probably searching for a specific change. For this, you want to use the pickaxe option.
Graphs
The --graph
option draws (using text) a graph representing the
branch structure of the commit history. This is commonly used in
conjunction with the --oneline
and --decorate
commands to make it
easier to see which commit belongs to which branch:
$ git log --graph --oneline --decorate
For a simple repository with just 2 branches, this will produce the following:
* 0e25143 (HEAD, master) Merge branch 'feature'
|\
| * 16b36c6 Fix a bug in the new feature
| * 23ad9ad Start a new feature
* | ad8621a Fix a critical security issue
|/
* 400e4b7 Fix typos in the documentation
* 160e224 Add the initial code base
The asterisk shows which branch the commit was on, so the above graph
tells us that the 23ad9ad
and 16b36c6
commits are on a topic
branch and the rest are on the master branch.
While this is a nice option for simple repositories, you’re probably better off with a more full-featured visualization tool like gitk or Sourcetree for projects that are heavily branched.
Filtering the Commit History
Formatting how each commit gets displayed is only half the battle of
learning git log
. The other half is understanding how to navigate
the commit history. The rest of this episode introduces some of the
advanced ways to pick out specific commits in your project history
using git log. All of these can be combined with any of the formatting
options discussed above.
By Amount
The most basic filtering option for git log
is to limit the number
of commits that are displayed. When you’re only interested in the last
few commits, this saves you the trouble of viewing all the commits in
a page.
You can limit git log
’s output by including the -<n>
option. For
example, the following command will display only the 3 most recent
commits.
$ git log -3
By Date
If you’re looking for a commit from a specific time frame, you can use
the --after
or --before
flags for filtering commits by date. These
both accept a variety of date formats as a parameter. For example, the
following command only shows commits that were created after July 1st,
2014 (inclusive):
$ git log --after="2014-7-1"
You can also pass in relative references like “1 week ago” and “yesterday”:
$ git log --after="yesterday"
To search for a commits that were created between two dates, you can
provide both a --before
and --after
date. For instance, to display
all the commits added between July 1st, 2014 and July 4th, 2014, you
would use the following:
$ git log --after="2014-7-1" --before="2014-7-4"
Alternatives
Note that the
--since
and--until
flags are synonymous with--after
and--before
, respectively.
By Author
When you’re only looking for commits created by a particular user, use
the --author
flag. This accepts a regular expression, and returns
all commits whose author matches that pattern. If you know exactly who
you’re looking for, you can use a plain old string instead of a
regular expression:
$ git log --author="John"
This displays all commits whose author includes the name John. The author name doesn’t need to be an exact matchit just needs to contain the specified phrase.
You can also use regular expressions to create more complex searches. For example, the following command searches for commits by either Mary or John.
$ git log --author="John\|Mary"
Note that the author’s email is also included with the author’s name, so you can use this option to search by email, too.
If your workflow separates committers from authors, the --committer
flag operates in the same fashion.
By Message
To filter commits by their commit message, use the --grep
flag. This works just like the --author
flag discussed above, but it matches against the commit message instead of the author.
For example, if your team includes relevant issue numbers in each commit message, you can use something like the following to pull out all of the commits related to that issue:
$ git log --grep="JRA-224:"
You can also pass in the -i
parameter to git log to make it ignore case differences while pattern matching.
By File
Many times, you’re only interested in changes that happened to a particular file. To show the history related to a file, all you have to do is pass in the file path. For example, the following returns all commits that affected either the foo.py or the bar.py file:
$ git log -- foo.py bar.py
The --
parameter is used to tell git log
that subsequent arguments
are file paths and not branch names. If there’s no chance of mixing it
up with a branch, you can omit the --
.
By Content (Pickaxe)
It’s also possible to search for commits that introduce or remove a
particular line of source code. This is called a pickaxe, and it takes
the form of -S"<string>"
. For example, if you want to know when the
string Hello, World! was added to any file in the project, you would
use the following command:
$ git log -S"Hello, World!"
If you want to search using a regular expression instead of a string,
you can use the -G"<regex>"
flag instead.
This is a very powerful debugging tool, as it lets you locate all of the commits that affect a particular line of code. It can even show you when a line was copied or moved to another file.
By Range
You can pass a range of commits to git log to show only the commits contained in that range. The range is specified in the following format, where <since>
and <until>
are commit references:
$ git log <since>..<until>
This command is particularly useful when you use branch references as the parameters. It’s a simple way to show the differences between 2 branches. Consider the following command:
$ git log master..feature
The master..feature
range contains all of the commits that are in
the feature branch, but aren’t in the master branch. In other words,
this is how far feature has progressed since it forked off of
master.
Detecting a fork in the history using ranges
Note that if you switch the order of the range (feature..master
),
you will get all of the commits in master, but not in feature. If git
log outputs commits for both versions, this tells you that your
history has diverged.
Filtering Merge Commits
By default, git log
includes merge commits in its output. But, if
your team has an always-merge policy (that is, you merge upstream
changes into topic branches instead of rebasing the topic branch onto
the upstream branch), you’ll have a lot of extraneous merge commits in
your project history.
You can prevent git log
from displaying these merge commits by
passing the --no-merges
flag:
$ git log --no-merges
On the other hand, if you’re only interested in the merge commits, you
can use the --merges
flag:
$ git log --merges
This returns all commits that have at least two parents.
Summary
You should now be fairly comfortable using git log
’s advanced
parameters to format its output and select which commits you want to
display. This gives you the power to pull out exactly what you need
from your project history.
These new skills are an important part of your Git toolkit, but
remember that git log
is often used in conjunction other Git
commands. Once you’ve found the commit you’re looking for, you
typically pass it off to git checkout
, git revert
, or some other
tool for manipulating your commit history. So, be sure to keep on
learning about Git’s advanced features.
Key Points
We’ve been introduced powerful logging commands
Identifying breaking commits
Overview
Teaching: 30 min
Exercises: 10 minQuestions
How can I use git to track down problems in code?
Objectives
Learn to identify when and in what commit problems were introduced
Episode setup
First we need to pull down some code from a remote repository, we will need an example with some broken code
which can be found in the broken
branch of our example repository.
$ cd ~/Desktop
and clone the code
$ git clone git@github.com:NOC-OI/intermediate-git-test-repo.git
and change into the fresh repository and switch to the broken
branch.
$ cd intermediate-git-test-repo
$ git switch broken
Tracking down a broken commit
Let’s take a look at the contents of this repository
$ ls
We see a small number of files; let’s have a look inside plot_bouys.py
.
$ nano plot_buoys.py
Let’s try to run the code
$ python plot_bouys.py
This clearly has a problem, as expect. Let’s look at the log history to see if we can spot it.
$ git log --oneline
If we looked at this for a while, can could probably spot the commit that might be causing the issue, the commit labelled “changing function to plot_data”. In reality however, finding the problem wouldn’t be this simple. In general, we might not know what file the problem is in, or where in that file. We may have hundreds of files with hundreds of lines each, and no idea where to start looking. Let’s start by looking at the initial commit.
$ git checkout 2890
And see if the plot_buoys.py
script runs here.
$ python plot_buoys.py
The file runs with no problems from an earlier commit, somewhere since this commit something went wrong. In this section, we will explore ways in which we can investigate the sources of errors.
Let’s move back to the head of the broken
branch.
$ git checkout broken
Tracking down broken commits with git blame
If we know where the problem is in the file, we might ask ourselves what (and who) introduced this problem. What commit introduced this line. Let’s try this with
$ git blame plot_buoys.py
We see that most lines were created in the same two commits, but some were modified in other commits. There are a lot of lines here, let’s focus on the range of lines 57 to 61 (the part not in a function)
$ git blame -L 57,61 plot_buoys.py
That’s better. Let’s take a closer look at the commit on line 61.
$ git show 4445
That’s interesting. We have found a change to that line, but not the one which
altered the function name. Let’s try going back a bit in the history with git checkout
and do this again.
$ git checkout HEAD~1
$ git blame -L 57,61 plot_buoys.py
This still hasn’t found the commit which renamed the function, let’s try going back further.
$ git checkout HEAD~1
$ git blame -L 57,61 plot_buoys.py
We can see that the problematic line was brought in during commit eecf
.
Multiple commits after something breaks can make git blame
a little harder to use.
Challenge: Using git blame across files
We can ask
git blame
to attempt to track changes across files. For example where code is copied and pasted from one file to another or where files are renamed usinggit mv
. We can do this by specifying the-C
option togit blame
. Usegit blame -C
to identify which lines ofplot_buoys.py
came from another file. Then usegit show
orgit checkout
to examine the contents of this file.Solution
$ git blame -C plot_buoys.py
This came from description.txt in commit
73592708
. We can examine this commit with:$ git show 7359
or
$ git checkout 7359 $ cat description.txt $ git checkout broken #get back to the head of the branch
Binary searching with Git
We could checkout each commit one at a time, and check each one, but this is very time consuming. We’d have to check out each commit one at a time, like this
$ git checkout HEAD~7
$ git checkout HEAD~6
$ git checkout HEAD~5
...
$ git checkout HEAD~3
$ git checkout HEAD~2
$ git checkout HEAD~1
We can do better than this if we choose a half way point between the
bad and good commit, check if that is good or bad, and keep choosing a
half way point until we find the commit that causes the code to go
from good to bad. Git can actually help us do this with the git
bisect
command. Let’s try it, first let’s make sure we have reset
HEAD
to the most recent commit on the broken
branch.
$ git checkout broken
$ git bisect start
We mark the current commit as bad
$ git bisect bad HEAD
Then we can mark the commit from the merge as good
$ git bisect good 116c
Git will now drop us at a commit half way between the good and the bad commits, which should be commit 2890. We can verify this with
$ git log --oneline broken
We see some commits marked as bad and good, and git has placed us in the middle commit. Now we can test this commit
$ python plot_buoys.py
It works! The code wasn’t broken at this point. Let’s mark this commit as good
$ git bisect good
Great, git has moved us again. Let’s check where we are this time, it should be commit d022
$ git log --oneline broken
The markers for good and bad have moved, because we’ve given bisect more information, and HEAD
has been placed between them.
$ python plot_buoys.py
This failed, let’s mark this as a bad commit
$ git bisect bad
We found a bad commit, let’s take a look at where we are now:
$ git log --oneline master
Git has marked the good and bad commits, but it doesn’t know yet if the previous commit might have been the first bad one. It needs us to check that. Let’s go ahead and do that
$ python plot_buoys.py
This is also a bad commit, let’s mark it
$ git bisect bad
We’ve now only got one commit left so Git automatically identifies the commit which broke things as eecf
. Had we marked our good commit one commit earlier then
we could have used git bisect good
when we came across the first good commit.
Finally, git has found the commit we were looking for and told us where it is. Let’s see where we are
$ git log --oneline master
Git has marked the relevant commits as bad, but it hasn’t moved us to the first bad commit. It left us in this pending state. Let’s take a look at the content of the breaking commit
$ git show eecf
Git is telling us that the problem was introduced by a change that
happened on line 55 of plot_buoys.py
where plot_buoy_data()
was changed to
plot_data()
. For us, this was probably a problem that is easy enough to
resolve without using bisect, but for a large complex code base when
we don’t know where to start, bisect can instantly point us to the
change which first caused the problem. Let’s exit the bisect state and
go back to master with
$ git bisect reset
This worked great, and we can go through large numbers of commits with this technique, but there was a lot of typing. Can Git do a better job? It turns out that it can. Let’s look at the return value from Python
$ python plot_buoys.py
$ echo $?
The variable $?
is a special variable containing the return value of
the function. In this case it is non-zero, indicating an error. Let’s
look at the historic commit
$ git log --oneline
$ git checkout 2890
And test the code
$ python plot_buoys.py
$ echo $?
In this case the script returns 0, indicating success. This is a common convention in Unix scripts, and you can write your own scripts that follow this convention. Git can use this convention to decide if a commit is good or bad. Let’s try it
$ git bisect start HEAD 2890
Once again, git drops us in the middle of a commit. This time, instead
of running python plot_buoys.py
, we tell Git to run it for us
$ git bisect run 'python plot_buoys.py'
Git does all the boring work for us. Every time it runs the command we gave and gets a zero return value, it marks the commit as good, every time it sees a non-zero value, it marks the commit as bad. It then tells us the first commit if finds which changes the state of the repository from “good” to “bad”. Now that we’re done, we exit again with
$ git bisect reset
One caveat
This is a very powerful debugging tool, but it relies on all your code being in a runnable state, such that Git can automatically identify when this state changes. It works best when used with a branching and merging strategy, to ensure there are no breaking commits on the main branch.
Key Points
git blame
can identify when a problem line was introduced.
git bisect
can be used to binary search through git history to identify lines which first introduced a problem.
Overview
Teaching: min
Exercises: minQuestions
Objectives
Setup a repository for the episode
Before we can proceed we need to create a branch with some commits that we will undo in various different ways.
Exercise: Creating a branch.
- Create a new branch called
hotfix
. Create a new file and make 3-4 commits in that file or create 3-4 new files. Check the log to see the hash of the last commit.Solution
git switch -c hotfix #or git checkout -b hotfix touch a.txt git add a.txt git commit -m "1st git commit: 1 file" touch b.txt git add b.txt git commit -m "2nd git commit: 2 file" touch c.txt git add c.txt git commit -m "3rd git commit: 3 file" git status git log --oneline
Amending a Git Commit message
Git Revert
Reverting undoes a commit by creating a new commit. This is a safe way to undo changes, as it has no chance of re-writing the commit history. For example, the following command will figure out the changes contained in the 2nd to last commit, create a new commit undoing those changes, and tack the new commit onto the existing project.
git revert HEAD~1
ls
Note that revert only backs out the atomic changes of the ONE specific commit (by default, you can also give it a range of commits but we are not going to do that here, see the help).
git revert
does not rewrite history which is why it is the preferred way of
dealing with issues when the changes have already been pushed to a remote
repository.
Git Reset
Resetting is a way to move the tip of a branch to a different commit. This can
be used to remove commits from the current branch. For example, the following
command moves the hotfix
branch backwards by two commits.
git checkout hotfix
git reset HEAD~1
The two commits that were on the end of hotfix
are now dangling, or orphaned commits.
This means they will be deleted the next time git
performs a garbage collection.
In other words, you’re saying that you want to throw away these commits.
git reset
also reverts the commits but leaves the uncommitted changes in the repo.
git status
git restore b.txt
git reset
is a simple way to undo changes that haven’t been shared with anyone
else. It’s your go-to command when you’ve started working on a feature and find
yourself thinking, “Oh no, what am I doing? I should just start over.”
Using git reset
on uncommitted files
In addition to moving the current branch, you can also get git reset
to alter
the staged snapshot and/or the working directory by passing it one of the following flags:
–soft – The staged snapshot and working directory are not altered in any way.
–mixed – The staged snapshot is updated to match the specified commit, but the working directory is not affected. This is the default option.
–hard – The staged snapshot and the working directory are both updated to match the specified commit.
It’s easier to think of these modes as defining the scope of a git reset operation.
To just undo any uncommitted changes:
git status
git add c.txt
git status
git reset HEAD
git status
Use git restore
instead of git reset
for staged but uncommmitted files
The newer git restore
command (as suggested by git status) can also be used
here, but you need to specify which files to unstage and that you want to unstage.
git status
git add c.txt
git restore --staged c.txt
git status
You can add and commit the changes or restore the file.
Reset a single committed file
git reset
can also work on a single file:
Let’s first add some contents to our three text files.
echo "a" > a.txt
echo "b" > b.txt
echo "c" > c.txt
git add a.txt b.txt c.txt
git commit -m "added some file contents"
Now if we want to restore just one of these files to it’s previous (empty)
state we can specify it to git reset
.
git reset HEAD~1 b.txt
git status
Git Checkout: A Gentle Way
We already saw that git checkout
is used to move to a different branch but is
can also be used to update the state of the repository to a specific point in
the projects history.
git checkout hotfix
git checkout HEAD~2
This puts you in a detached HEAD state. AGHRRR!
Most of the time, HEAD points to a branch name. When you add a new commit, your branch reference is updated to point to it, but HEAD remains the same. When you change branches, HEAD is updated to point to the branch you’ve switched to. All of that means that, in these scenarios, HEAD is synonymous with “the last commit in the current branch.” This is the normal state, in which HEAD is attached to a branch.
The detached HEAD state is when HEAD is pointing directly to a commit instead of a branch. This is really useful because it allows you to go to a previous point in the project’s history. You can also make changes here and see how they affect the project.
echo "Welcome to the alternate timeline, Morty!" > new-file.txt
git add new-file.txt
git commit -m "Create new file"
echo "Another line" >> new-file.txt
git commit -a -m "Add a new line to the file"
git log --oneline
If we want to keep the changes we can create another branch for them.
git branch alt-history
git checkout alt-history
If we didn’t want the changes then we can discard them and recover by switching back to the hotfix branch:
git checkout hotfix
https://www.atlassian.com/git/tutorials/resetting-checking-out-and-reverting Also OMG: http://blog.kfish.org/2010/04/git-lola.html
Exercise: Undoing Changes
- Create a new branch called
hotfix
. Create a new file and make 3-4 commits in that file. Check the log to see the hash of the last commit.Solution
git status git log
- Revert the last commit that we just inserted. Check the history.
Solution
git revert -m 1 <hash> git log
- Completely throw away the last two commits [DANGER ZONE!!!]. Check the status and the log.
Solution
git reset HEAD~2 --hard git status git log
- Undo another commit but leave it in the staging area. Check the status and log.
Solution
git reset HEAD~1 git status git log
- Wrap it up: add and commit the changes.
Solution
git add . git commit -m "Message"
Comparing hard, mixed and soft resets
So far we’ve been using the default “mixed” option for git reset
. This leaves
the working directory alone and puts the files affected into the working directory.
As a reminder let’s recreate our three text files, commit them and then reset the commit to HEAD~1.
touch a.txt b.txt c.txt
git add a.txt b.txt c.txt
git commit -m "adding text files"
git reset --mixed HEAD~1
Let’s check what’s in the working directory and see what git status
says.
ls
git status
and finally let’s confirm the commit is no longer there by looking at git log --oneline
.
Now let’s repeat the process but this time we’ll do a soft reset. This should
leave the files staged when we run git status
.
git add a.txt b.txt c.txt
git commit -m "adding text files"
git reset --soft HEAD~1
git status
Finally let’s try a hard reset. This time the files will be completely deleted
from the working directory and the repository. Since the soft reset left the
files staged there’s no need to do a git add
this time.
git commit -m "adding text files"
git reset --hard HEAD~1
git status
ls
This time we see no evidence that these file ever existed, they are gone from the working directory, staging area and repository.
Challenge: When to use which type of reset?
Which reset should you use in the following scenarios:
- You have made three commits for three small changes and would prefer they were one bigger commit.
- You committed a file that you thought fixed a bug but realised soon after that you made a small mistake. You would like to fix the mistake but have the fix as a single commit.
- You have committed a file which was accidentally placed in your Git working directory and should have never been there.
Solution
- soft, we are assuming we do git reset HEAD~3`, all three files are now placed in the staging area and a single new commit will commit all three together.
- mixed, as we need to make some changes after resetting the commit we don’t want the file added to the staging area. Although we could do a soft reset followed by another
git add
.- hard, since we don’t want to keep the file after the reset.
Back in time
You can restrict the action of reset to a file with:
git reset -- filename
Make some changes to a file, add that file to the staging area, and use git reset to undo the action of git add.
Solution
Add changes to a file with
$ git add <file>
then reset the files with
$ git reset -- <file>
or
$ git reset HEAD -- <file>
or
git reset HEAD -- <file>
Note how if we leave out HEAD, then git will assume we want to pull from the HEAD reference by default.
Without a HEAD
What happens if we do a hard reset, but leave out the place to copy files from, like this
$ git reset --hard
Can you work out where the files come from Hint: it may help to make some changes to the files in the current directory first.
Solution
If the origin of the files is not specified, it is assumed to be HEAD by default.
Checkout on files
The checkout command from earlier has an important variant when passed files as arguments. In this case they behaves very differently. Let’s reset our repository to the way it is on the remote server to begin with.
$ git reset --hard origin/main
Let’s make two changes, one to plot_buoys.py
and one to README.md
. In both
cases add a line to the file listing yourself as an author of the file. In
the Python file this will need to be a comment. Go ahead and add/commit both
changes in a single commit.
$ nano README.md
$ nano plot_buoys.py
$ git add README.md plot_buoys.py
$ git commit -m "Adding author information"
Now, let’s perform a checkout, specifying that we’d like the last version of the Python file.
$ git checkout HEAD~1 -- plot_buoys.py
What happened? Previously checkout would have moved HEAD
.
$ git log --oneline
We’re still on the same commit, HEAD hasn’t moved at all this time. It doesn’t make sense to move HEAD for some files and keep it in the same place for others, that would get confusing very quickly. Only the file copy operations have been performed. Let’s see what effect this has had.
$ git status
The file plot_buoys.py
has been copied from the previous commit HEAD~1
into both our working directory as well and into the staging area.
We can verify the changes with
$ git diff --staged
The file plot_buoys.py
has changed and nothing else has. In this case git
checkout with a file behaves very much like we would expect git reset --hard
to behave with files. It overrides the file in the staging area and working
directory and resets any changes. For this reason
$ git reset --hard HEAD~1 -- plot_buoys.py
This is not a valid command, since it would perform the same operation as the git checkout
command.
Reset with files
Using git reset
with files allow us to copy specific files to and from the staging area, leaving the working directory unchanged.
Let’s reset our repository to the way it was at the beginning of this lesson
$ git reset --hard origin/main
Let’s make some changes to README.md
$ nano README.md
and copy them to the staging area.
$ git add README.md
$ git status
We can use git reset to copy the version in the repository back, effectively undoing the add.
We can unstage the file with
$ git reset HEAD -- README.md
More recently (as of Git 2.23 in August 2019) the git restore
command has been
introduced which can also be used to unstage changes and is suggested by
git status
. Older versions of git suggested the use of git reset
.
The equivalent git restore
command for the above would have been:
$ git restore README.md
But you may still find a lot of Git tutorials suggesting the use of git reset
in this scenario. Either command will work.
The dangers of checkout
What happens if you make some modification to README.md, add these changes to the staging area with
$ git add README.md
and then try to checkout the file with
$ git checkout HEAD -- README.md
Can you guess what will happen? Is this potentially dangerous to do?
Solution
The command
$ git checkout HEAD -- <filename>
will overwrite the file filename, even if there are changes. Be careful as you can lose your changes in this way. This command is a useful way to undo any changes you may have made to the files in your working directory.
The way things were
Can you use the checkout command to create a commit which contains the file README.md as it was 3 commits ago? Hint: because some work in the history was done on a pull request HEAD~3 might not get what you expect, use the commit hash instead.
Solution
HEAD~1 actually takes us all the way back to the first commit in the repository, even though there are 3 prior commits in the history.
git log --graph --oneline
will reveal that some of the history came from another branch and using HEAD~N doesn’t cover the commits from the branch, but treats them as if they were one commit. Let’s usegit log
to find a commit hash instead.$ git log --oneline 116cdda (HEAD -> main, origin/main, origin/HEAD, hotfix) Merge pull request #1 from NOC-OI/create_initial_script 7a760ff Add some basic instructions to the README and credit to the Intermediate Python Course 6c388d0 Tidy up formatting a62d779 Write first draft of script to plot buoy locations around UK 6d4fb54 Initial commit
So let’s take commit hash a62d instead. The file can be brought into the current directory with
$ git checkout 6b4f -- README.md
All that remains is to create a new commit, with a command such as
$ git commit -m 'README.md as it was 3 commits ago'
Without a HEAD
Can you work out what the following command does
$ git checkout -- README.md
Hint: try making some changes to README.md and running the command.
Solution
This command will revert the file README.md to the state it is in the current commit. This is equivalent to running
$ git checkout HEAD -- README.md
If the commit is not specified, git defaults to using HEAD.
Key Points
Tags
Overview
Teaching: 10 min
Exercises: 0 minQuestions
How can I flag a specific state of the project?
Objectives
Learning about the
git tag
command
A tag is a marker of a specific commit in the project history. You can think of it as a permanent bookmark. Tags can be created to point to a release version, a major code change, a state of the code that was used to produce a paper or a data release, or any other event you (or the development team) may want to reference in the future.
Once a tag has been created, no other changes can be added to it. But you can delete it and create a new one with the same name.
Don’t name your tags the same as your branches. Or the other way around. git fetch can get a tag or a branch and that can be confusing.
The command that allows you to handle git tags is just git tag
. Without any flags it simply list the existing tags:
git tag
You can create a new tag based on the current state of the repository by providing a tag name to the git tag
command:
git tag 1.0.0
This however creates what is called a lightweight tag
. Lightweight tags are like a branch that doesn’t change.
You can get information on a tag via git show
:
git show 1.0.0
Lightweight tags are not recommended in most use cases because they do not save all the information. Instead, use annotated tags
(see here for more information). They are stored as full objects in the Git database: they’re checksummed; contain the tagger name, email, and date; have a tagging message; and can be signed and verified with GNU Privacy Guard (GPG).
To create an annotated tag from the current commit:
git tag -a 2.0.0 -m <message>
It is also possible to tag a past commit by providing that commit’s hash:
git tag -a <tag> [<hash>] -m <message>
To get more information about an existing tag you can “verify” it, which displays that tag’s details, including the tagger, date, and message. This only works for annotated commits:
git tag -v 1.0.0
git tag -v 2.0.0
A tag allows you to switch to the version of the code that was tagged, to use that version of the code, or to see what the code looked at that tag. Here is how to check out a state of the code that has been tagged:
git checkout <tag>
Push a tag to origin:
git push origin <tag>
And of course you can delete a tag. This does not delete the commit, just removes the marker/lable. Delete a tag:
git tag -d <tag>
Since tags are frequently used to do releases, it is useful to be aware that codebases and languages have standards on how release versions should be labled. If you are working with an existing code base, follow the standard set by the dev team. If you are developing a library by yourself, follow the standards for the language. For example, the Python Packaging Authority (and previously PEP440) specifies the scheme for identifying versions for python
libraries.
Referring to Commits
We’ve seen a bunch of different type of arguments passed to commands such as git checkout. For example, references to HEAD
$ git log -1 HEAD
or an ancestry reference
$ git log HEAD~3
or to a local branch
$ git log -1 main
or a remote branch
$ git log -1 origin/main
or a tagged commit
$ git log -1 0.4.1
or an abbreviated commit ID
$ git log -1 1ffb
Git is clever about allow you to use any way is most convenient, and will ultimately (in most cases) translate them all to a commit reference behind the scenes.
Key Points
git tag
allows us to mark a point we can return to.A tag is tied to a commit.
Issue Tracking
Overview
Teaching: 10 min
Exercises: 10 minQuestions
How do we track issues with code in GitHub?
Objectives
Understand how to track issues with code in GitHub.
Introduction
The strength of online collaboration platforms such as GitHub does not just lie in the ability to share code. They also allow us to track problems with that code, for multiple developers to work on it independently and bring their changes together and to review those changes before they are accepted.
Tracking issues with code
A key feature of GitHub (as opposed to Git itself) is the issue tracker. This provides us with a place to keep track of any problems or bugs in the code and to discuss them with other developers. Sometimes advanced users will also use issue trackers of public projects to report problems they are having (and sometimes this is misused by users seeking help using documented features of the program).
The broken branch of the code from the identifying breaking commits chapter earlier has a bug with a mismatched function name in plot_buoys.py.
Let’s go ahead and create a new issue in our forked GitHub repository to describe this problem. We can find the issue tracker on the “Issues” tab in the top left of the GitHub page for the repository.
Issue tracker missing in Github
Sometimes when forking a Github repository the issue tracker is disabled. If you do not see an “Issues” tab in your fork of the repository then you can re-enable it by:
- Clicking on the Setting button (cog icon in the middle near the top)
- Scroll down to the Features section
- Tick the Issues box
An Issues tab should now appear on the toolbar near the top of the screen.
Click on this and then click on the green “New Issue” button on the right hand side of the screen. We can then enter a title and description of our issue.
A good issue description should include:
- What the problem is, including any error messages that are displayed.
- What version of the software it occurred with.
- Any relevant information about the system running it, for example the operating system being used.
- Versions of any dependent libraries.
- How to reproduce it.
After the issue is created it will be assigned a sequential ID number.
Write an issue to describe our bug
Create a new issue in your repository’s issue tracker by doing the following:
- Go to the GitHub webpage for your code
- Click on the Issues tab
- Click on the “New issue” button
- Enter a title and description for the issue
- Click the “Submit Issue” button to create the issue.
Discussing an issue
Once the issue is created, further discussion can take place with additional comments. These can include code snippets and file attachments such as screenshots or logfiles. We can also reference other issues by writing a # symbol and the number of the other issue. This is sometimes used to identify related issues or if an issue is a duplicate.
Closing an issue
Once an issue is solved then it can be closed. This can be done either by pressing the “Close” button in the GitHub web interface or by making a commit which includes the word “fixes”, “fixed”, “close”, “closed” or “closes” followed by a # symbol and the issue number.
Challenge: Fix and close an issue
Fix the issue of the function name mismatch on the broken branch. You could do this either by renaming the function or changing the call. Commit your changes and add the appropriate text to your commit message to close the issue. Push your changes to your forked repository on Github. Check the issue tracker and ensure it has closed.
Solution
def plot_buoy_data(figure_name):
Becomes
def plot_data(figure_name):
or
plot_data("bouys_plot.png")
Becomes
plot__buoy_data("bouys_plot.png")
git commit -m "Correcting function name mismatch, Fixes #1" git push
Key Points
Github includes an issue tracker for each repository where people can describe and discuss issues with code.
Issues can be opened, commented on and closed from the Github web interface.
Issues can also be closed in a commit message using ‘fixes’, ‘fixed’, ‘close’, ‘closed’ or ‘closes’ followed by a # symbol and the issue number.
Odd things to know about Files
Overview
Teaching: 0 min
Exercises: 0 minQuestions
Why are (large) binary files a problem in Git?
What is Git LFS?
What are the problems with Git LFS?
Objectives
Understanding that Git is not intended for (large) binary files
Learning about the
git lfs
commandsUnderstanding the disadvantages of
git lfs
Sometimes, you might want to add non-textual data to your Git repositories. Examples for such uses cases in a software project are e.g.
- assets for the project documentation like images
- test data for your test suite
However, such data is stored in binary formats most of the time. Git’s line-based
approach of tracking changes is not suited for this type of data. While Git will
work with binary data without any errors, it will internally treat each binary file
as a file with one (very long) single line of content. Consequently, if you apply
changes to such a file, Git will store the entire file in the commit even if there
was a lot of similarity between the two versions of the file. As Git does not “forget”
about previous versions of the file, doing this repeatedly and/or with very large
files will quickly make your repository grow in size. At some point this will
severely impact the performance of all your Git operations from git clone
to even
git status
. It is therefore generally discouraged to use Git to track (large) binary files.
However, the problem of binary files in Git repositories cannot be fully neglected: There is a lot of value for a software project in keeping things together that belong together: Documentation assets belong to the documention they are part of. Therefore we will now explore some options on how to integrate large file handling into Git.
The git lfs
subcommand is part of an extension to Git. LFS stands for Large
File Storage. It allows you to mark individual files as being large.
Git does not apply its normal, line-based approach to tracking changes to these
large files, instead they are stored separately and only referenced in the Git data
model. During push and pull operations, large files are transmitted separately -
requiring the server to support this operation.
For the sake of demonstration, we create a file called report.pdf
. We assume that it
is a large, binary file in order to show how to handle it with git lfs
:
echo "This is a very large report." > report.pdf
Next, we tell Git, that this file should be treated with LFS:
git lfs track report.pdf
Tracking "report.pdf"
Having done so, we can inspect the repository and we learn that a new file .gitattributes
was added to the repository.
git status
On branch main
Untracked files:
(use "git add <file>..." to include in what will be committed)
.gitattributes
report.pdf
cat .gitattributes
report.pdf filter=lfs diff=lfs merge=lfs -text
Similar to .gitignore
this file is part of the repository
itself in order to share it with all your collaborators on this project.
We therefore craft a commit that contains it:
git add .gitattributes
git commit -m "Setup LFS tracking"
Now, we are ready to add the large file to the repository the same way we would with any other file:
git add report.pdf
git commit -m "Add final report to the repository"
Pushing our commits to the remote repository, we can see in the console output, that our LFS data was transferred to the remote server separately.
git push origin main
Uploading LFS objects: 100% (1/1), 17 B | 0 B/s, done.
Tracking with wildcard patterns
LFS tracking is not limited to explicitly spelled out filenames. Instead, wildcard patterns can be passed to
git lfs track
. However, you should be careful to quote these patterns, as they might otherwise get expanded by to existing files by your shell. For example, tracking all PDFs with LFS could be achieved with the following command:git lfs track "*.pdf"
Disadvantages of Git LFS
Although
git lfs
by design solves the problem of storing large files in Git repositories, there are some practical hurdles that you should consider before introducing LFS into your project:
- The
git lfs
command is a separately maintained extension to the Git core. It is therefore not part of most Git distributions, but needs to be installed separately. Using it in your project will require you to educate your users about LFS and how to install it. Depending on your target audience, you should carefully consider whether the benefits outweigh this disadvantage.- Users that do not have
git lfs
installed will not be notified by Git. They will see the files, but the content will be Git metadata instead of the actual content. Trying to work with those files will typically produce cryptic error messages.- Some hosting providers - most notably GitHub - apply restrictive quotas to LFS storage. On the free plan, GitHub currently allows 1GB of storage and 1 GB bandwidth per month. As the band width quota counts every single clone by users, LFS should currently be considered unusable on the GitHub free plan.
Key Points
(Large) binary files can grow the repository size immensely and make it unusable
git lfs
is an extension that stores large files outside the Git data modelUse of Git LFS is discouraged in many scenarios.
Publishing and Citing Code
Overview
Teaching: 0 min
Exercises: 0 minQuestions
How do I ensure my code is citable?
Objectives
Understand how to archive code to Zenodo and create a digital object identifier (DOI) for a software project (and include that info in CITATION.cff).
Sharing code openly promotes collaboration, transparency, and innovation by allowing others to review, use, and improve the code. It fosters knowledge exchange, accelerates scientific progress, and enhances the reproducibility of research. Additionally, open sharing encourages community contributions and can lead to better-maintained, more reliable software.
Adding a license and other metadata to our code (covered in the previous episode) are the first steps towards sharing the code publicly. There are several other important steps to consider which we will cover here.
Sharing code to encourage collaboration
Making the code public
By default repositories created on GitHub are private and only their creator can see them. Since we added an open source license to our repository we probably want to make sure people can actually access it.
To make your repository public, if it is not already, go to your repository on GitHub and click on
the Settings
link near the top right corner.
Then scroll down to the bottom of the page and the “Danger Zone” settings. Click on “Change Visibility” and you
should see a message saying “Change to public”.
If it says “Change to private” then the repository is already public.
You will then be asked to confirm that you indeed want to make the repository public and agree
to the warning that the code will now be publicly visible.
As a security measure, you will then have to put in your GitHub password.
Transferring to an organisation
Currently our repository is under the GitHub “namespace” of our individual user. This is OK for individual projects where we are the sole or at least the main code author, but for bigger and more complex projects it is common to use a GitHub organisation named after our project. If we are a member of an organisation and have the appropriate permissions then we can transfer a repository from our personal namespace to the organisation’s. This can be done with another option in the “Danger Zone” settings, the “Transfer ownership” button. Pressing this will then prompt us as to which organisation we want to transfer the repository to.
Archiving code to Zenodo and obtaining a DOI
Zenodo is a data archive run by CERN. Anybody can upload datasets up to 50GB to it and receive a Digital Object Identifier (DOI). Zenodo’s definition of a dataset is quite broad and can include code - which gives us a way to obtain a DOI for our software.
Let us now look into how we can archive a GitHub repository to Zenodo. Note that, instead of using the real Zenodo website, we will practice with Zenodo Sandbox.
Zenodo Sandbox
Zenodo Sandbox is a testing environment for Zenodo, a repository for research outputs, allowing users to safely experiment with its features without affecting the live system. It is a clone of Zenodo, created for testing purposes, that works exactly the same way as Zenodo you can use it for learning, training, experimenting, and preparing uploads without impacting the primary Zenodo repository until you are ready to publish and release your code (or other research outputs) officially. It will also not create real DOIs for a number of test repositories we use for this course and saturate the DOI space (remember that a DOI, once created, is meant to exist forever).
We can archive our GitHub repository to Zenodo (Sandbox) by doing the following:
- Go to the Zenodo Sandbox login page and choose to login with GitHub.
- Authorise Zenodo Sandbox to connect to GitHub.
- Go to the GitHub page in your Zenodo Sandbox account. This can be found in the pull down menu with your user name in the top right corner of the screen.
- You will now have a list of all of your GitHub repositories. Next to each will be an “On” button. If you have created a new repository you might need to press the “Sync” button to update the list of repositories Zenodo Sandbox knows about.
- Press the “On” button for the repository you want to archive. If this was successful you will be told to refresh the page.
- The repository should now appear in the list of “Enabled” repositories at the top of the screen, but it does not yet have a DOI. To get one we have to make a “release” on GitHub. Click on the repository and then press the green button to create a release. This will take you to GitHub’s release page where you will be asked to give a title and description of the release. You will also have to create a “tag” for your release - a way of having a friendly name for the version of some code in Git instead of using a long hash code. Often we will create a sequential version number for each release of the software and have the tag name match this, for example v1.0 or just 1.0.
- If we now refresh the Zenodo Sandbox page for this repository we will see that it has been assigned a DOI.
The DOI does not just link to GitHub, Zenodo will have taken a copy (a snapshot) of our repository at the point
where we tagged the release.
This means that even if we delete it from GitHub or even if GitHub were ever to go away or remove it,
there will still be a copy on Zenodo.
Zenodo will allow people to download the entire repository (more accurately, its state at the time it was tagged for release) as a single zip
file.
Zenodo will have actually created two DOIs for you. One represents the latest version of the software and will always represent the latest if you make more releases. The other is specific to the release you made and will always point to that version. We can see both of these by clicking on the DOI link in the Zenodo page for the repository.
One of the things which is displayed on this page is a badge image that you can copy the link for and add to the README file in your GitHub repository so that people can find the Zenodo version of the repository. If you click on the DOI image in the Details section of the Zenodo page then you will be shown instructions for obtaining a link to the DOI badge in various formats including Markdown. Here is the badge for this repository and the corresponding Markdown:
[](https://doi.org/10.5281/zenodo.11869450)
Archive your repository to Zenodo (Sandbox)
Note: for this exercise, as demonstrated earlier, you should use the Sandbox Zenodo (a version of Zenodo for testing and playing with before minting a real DOI). For real software releases, you should use Zenodo.
- Create an account on Zenodo Sandbox that is linked to your GitHub account.
- Use Zenodo Sandbox to create a release for your repository and obtain a DOI for it.
- Get the link to the DOI badge for your repository and add a link to this image to your README file in Markdown format. Check that this is the DOI for the latest version and not the DOI for a specific version, if not you will be updating this every time you make a release.
Problems with GitHub and Zenodo integration
The integration between GitHub and Zenodo does not interact well with some browser privacy features and extensions. Firefox can be particularly problematic with this and might open new tabs to login to GitHub and then give an error saying:
Your browser did something unexpected. Please try again. If the error continues, try disabling all browser extensions.
If this happens try disabling the extra privacy features/extensions or using another browser such as Chrome.
Adding a DOI and ORCID to the citation file
Now that we have our DOI it is good practice to include this information
in our citation file.
Earlier we created a CITATION.cff
file with information about how to cite our code.
There are a few fields we can add now which are related to the DOI; one of these is the version
file which covers
the version number of the software.
We can add a DOI to the file in the identifiers
section with a type of doi
and value
of the Zenodo URL.
Optionally we can also add a date-released
field indicating the date we released this software.
Here is an updated version of our CITATION.cff
from the previous episode with a version number, DOI and release date added.
# This CITATION.cff file was generated with cffinit.
# Visit https://bit.ly/cffinit to generate yours today!
cff-version: 1.2.0
title: Spacewalks
message: >-
If you use this software, please cite it using the
metadata from this file.
type: software
authors:
- given-names: Jaffa
name-particle: Sarah
- given-names: Aleksandra
family-names: Nenadic
- given-names: Kamilla
family-names: Kopec-Harding
repository-code: >-
https://github.com/YOUR-REPOSITORY-URL/spacewalks.git
abstract: >-
A Python script to analyse NASA extravehicular activity
data
keywords:
- NASA
- Extravehicular activity
version: 1.0.1
identifiers:
- type: doi
value: 10.5281/zenodo.1234
date-released: 2024-06-01
Add a DOI to your citation file
Add the DOI you were allocated in the previous exercise to your
CITATION.cff
file and then commit and push the updated version to your GitHub repository. If you used thecommit
field in yourCITATION.cff
file before to point to a given version of the code - you can now remove it as using the DOI field is better for this job.
Going further with publishing code
We now have our code published online, licensed as open source, archived with Zenodo, accessible via a DOI and with a citation file to encourage people to cite it. What else might we want to do in order to improve how findable, accessible or reusable it is? One further step we could take is to publish the code with a peer reviewed journal. Some traditional journals will accept software submissions, although these are usually as a supplementary material for a paper. There also journals which specialise in research software such as the Journal of Open Research Software, The Journal of Open Source Software or SoftwareX. With these venues, the submission will be the software itself and not a paper, although a short abstract or description of the software is often required.
Key Points