Git has become an indispensable tool in modern software development. It is a version control tool that allows development teams to work together on software projects, facilitating collaboration and the delivery of high quality software. However, the use of this tool is often neglected, losing much of its usefulness.
In this article we want to explore some best practices that will help us to have a more organized and useful repository for the future.
The first point to discuss is how to write a commit correctly. Git gives us total flexibility when writing a commit message, but in reality there are some basic rules that we must follow if we want a correct display of the content.
According to the git-commit man page, we must first write a short description, less than 50 characters (the title), then a blank line and then a more detailed description (the body of the message).
This 50-character limitation is not strict, but exceeding it may result in the title not displaying correctly in some applications.
Another standard that is usually followed is to limit the width of the message body lines to 80 characters. Again, this is to make it easier to read.
Following these rules, a commit could look like the following (taken from the linux kernel repository):
More recently, other standards with a stricter format have been defined, with a view to automating certain tasks based on the messages.
Some advantages of this format is to be able to automatically tag a version based on the comits we have made using semantic-release.
This application checks the commits made since the last merge and depending on the type of commit (feat, fix or breaking change) decides how to upgrade the version of our software (as long as we keep semver).
Another similar utility is conventional-changelog, which will automatically generate a change file (CHANGELOG.md) by parsing the commit messages.
Before we get into how to properly format a commit, though, we can look at some examples of why it’s useful to spend a little time generating descriptive commit messages.
- Understand why this or that value has been used: for example, maybe we are using a git repository for infrastructure as code and we want to know why a certain machine has been configured with a disk of a certain size. Using git blame (it allows us to know in which commit each line was modified) we could see the commit that modified that line and in the commit there should be the explanation of why that value was chosen.
- We found a bug in the code: we use git blame again to find the commit responsible for the bug. Maybe we can understand why that change was made and proceed to fix it considering that it led to your change.
- Understand how the code works: apart from the code comments, in the commits we can find explanations that may have been taken to do things one way or another, or links to issues, emails, etc. where there is a deeper discussion.
Once the utilities are understood we can see more clearly what information we should add in the commit message.
A simple rule of thumb is to think that a colleague is asking us why we have made those changes. Seeing what changes the commit makes may be easy to see, in the end they are what the tools are showing us (the lines modified, created or deleted), but the reason for those changes often cannot be extracted from the code and that will be a very important part of the commit message.
We must also make the commits small enough to facilitate the review. A commit with many changes, even if it has a very detailed description, can be very complex to review.
It is important that commits only make one change. Mixing different changes becomes complex to review and the use of the commit history will become less useful.
A little trick to achieve these atomic commits is to create an empty commit before making any changes. In that empty commit we will write the changes we are going to make. This will prevent us from introducing changes that may be necessary, but are not part of the simple change we want to make.
To create an empty commit we will use the command:
git commit --allow-empty
If we encounter errors while we are making our commit and we do not want to let them pass, it is also possible to make those unrelated modifications and not include them in the commit. In a commit we can add only modifications of certain lines of a file, we will have to refer to our IDE to see how to do it. If we want to do it directly from the command line we will use the command (it will show us a menu to choose which hunks we want to add to the commit):
git add --patch <filename>
Returning to the commit message, it is clear that we must follow a title and body format. That we must explain what we are doing, but, above all, why we are making the changes and try to maintain the atomicity of the commit.
To give a more homogeneous format to the writing style we will use the imperative. Thus, instead of a title like “this commit would modify the response XXX” or “fixed the response message”, we will write “modifies the response with the text XXX”.
One way to help us write the commit title is to think about how we would complete the sentence “if applicable, this commit …”.
A further twist is the conventional commits mentioned above, designed to be analyzed by some machine and perform tasks automatically.
It is best to read the specification, but as a summary, the idea is to have a message with the format:
<type>[optional scope]: <description>
Where the type will be fix or feat (new functionality), and the exclamation mark can be used to denote a breaking change, e.g:
feat!: send an email to the customer when a product is shipped
This format is usually linked to the SemVer version format, where a fix would imply an increase in the version patch, a feat would increase the minor part and a breaking change would increase the major.
Finally, we must decide in which language we want to write the commit messages. English is probably the best choice, as it is quite concise and is the de facto computer standard.
Be careful what you commit
Two common pitfalls of using git is using it to store binary files or upload confidential information.
It may be the case that we want to store in our repository some example videos, or a binary file where some necessary parameters are stored. Git was not initially thought to manage large files, so uploading them as another file will bring a worse user experience. Mainly, we will increase the repository size, making it unmanageable if we keep uploading different versions of those binary files. To solve this problem there is Git LFS, which helps us to upload files to remote object stores, leaving only a reference to the file in git.
The other mistake, uploading confidential information, is usually due to carelessness, perhaps at the beginning of the repository, where perhaps we have not even thought of making it public, or we are not very careful because it is an initial phase. Once we have committed certain information, it will be difficult to remove it. Many people simply create a new commit deleting the information, but it will remain in the history.
To make a complete deletion we will have to make use of tools such as git-filter-repo, which allows us to delete strings or files from the entire history.
Even so, we will still have problems, since the git server we are using may keep cached copies, or someone may have forked our repository and taken this confidential information to a repository out of our reach.
To avoid uploading confidential files by mistake, it is always best to add them to the .gitignore file, so that git will not allow us to add that file to a commit (unless we explicitly force it).
We can also set up a pre-commit hook to check that we are not uploading something we don’t want to.
Git will check the result of the script, in case of error, we cannot continue with the flow.
These hooks can be declared per repository or create global ones.
For the case in question we could create a script that searches for a given string and fails if it finds it.
To facilitate this process there is a hooks package manager called pre-commit. It helps us to configure pre-commit hooks by declaring them in a yaml file. We will be able to reuse the work that other developers have done creating those hooks.
For example, we can configure detect-private-key, which will check if we are uploading a private key (it looks for text like “BEGIN RSA PRIVATE KEY”).
Document our repository
We can have a wonderful repository with perfectly written commits and exceptional functionality, but if we don’t explain what it’s for or how to use it, it won’t be very useful.
We can start with a small description of the repository (most git servers allow it), which will allow, at a glance, to know if the repository is what we are looking for.
It will also be essential to have a README, where we explain in more detail the project, its use, some examples, a small demonstration and, perhaps, a section on how to collaborate.
To make our work easier we can make use of the web application readme.so, which has already prepared a list of typical sections with examples to help us generate a markdown file.
The next step is usually to create branches to add certain features, where we can add several commits and once we have the finished feature we merge it with the main branch.
When there are several people working on the same repository and we want to maintain support for different versions, deploy to different environments, etc. the scheme becomes more complicated.
One of the first standardized ways of working was the one proposed by Vincent Driessen, git-flow. In this, summarizing a lot, we have two main branches, main and develop. Developers create feature branches from develop and merge them once completed. When a new version is to be generated, a release branch is generated where the tests and possible last minute fixes will be performed and that branch will be merge to main, generating a tag at the same time.
This scheme is sometimes excessively complex and difficult to integrate with CI/CD tools, so the people at GitHub came up with a variant called github-flow. Here the idea is to have a single main branch and then create branches to add changes, using the pull request process to merge those changes.
Also Gitlab has its flow: gitlab-flow, where we will have one branch per environment.
To share our code with the world, it is normal to generate versions at specific times, explaining the changes made since the last version (typically in a file called CHANGELOG).
The most popular versioning scheme, already mentioned above, is SemVer. This system uses a 2.3.0 type format, where each digit is modified following some rules.
The first digit (starting from the left) is called MAJOR, the second MINOR and the last PATCH.
The first digit is only modified if we have made a change that may break backward compatibility. This is very clear if our code is a code library used by third parties. If we change the MAJOR digit we are indicating to our users that they should not upload the version without checking how they use the library, since there are parts that have changed. These users will expect to find in the CHANGELOG the details of those breaking changes and how they should proceed to use the new version.
The MINOR digit will be incremented when we add new features, as long as we do not break compatibility. We may create a new method, but it does not affect anything that current users may be using.
Finally, PATCH is modified if we have fixed any bugs, equally, without breaking compatibility.
It seems that Semver is practically the only option today, but there are other types of versioning that can be more useful in certain cases and knowing them can help us to make a better choice.
For example, Ubuntu uses the YY.MM.VERSION scheme, where YY is the year, MM is the month and VERSION is used when they want to release improvements on that version. They always generate versions in April (04) and October (10). Generating a LTS (long term support) every two years in April.
In Zabbix they use an X.Y scheme, where Y can take the values 0 (LTS), 2 or 4 (public versions).
PostgreSQL only uses MAJOR.MINOR, being the upgrades between MAJOR versions more complex, as they usually require changes in the internal structure.
When we are developing a project we should often change our caps and put ourselves in the role of the future developers or users (or ourselves in a few months!). This will help us to have a more user-friendly and useful project.
The written medium is the predominant form of communication for this type of project, so we will have to make an extra effort to make it understandable for our future users.