Git Commit Messages and AI
In the age of AI, Git commits are more than logs for developers. They are key datasets for large language models (LLMs).
Git commit messages are more important than ever. LLM tools look at Git history. Each commit message is a chance to add context missing from the source code. Or at least, it should be. That context helps both humans and machines.
Less is more.
I love that LLMs make it clear: "Context is King 👑." LLMs do best with clear context. Vague commit messages can lead to confusion. Git commits could show how often we lack relevant context.
I've been saving every result from agent prompts lately. As well as using the generated commit messages from the LLMs. I want to study how these tools work. I can learn to use them in a way that maximizes their effectiveness.
It is amazing not to think about commit messages again. But there are several issues with messages created by AI.
Over time, I noticed that AI tools started making up messages. Tools like Cursor, which review past messages, can confuse AI. This can lead to wrong or misleading information. Some people see this as a failure of the tools. I agree it is a failure, but I have another perspective.
If the tools can make precise commit messages from code changes, that is as bad as they get. These tools keep getting better. But Git history cannot change (in practice, you can always YOLO it). We can regenerate the commit message in the future if it is important.
So, why should I add detailed commit messages?
From now on, I will only accept commit messages with a short and clear main sentence. I cannot have empty messages because tools will mark them as errors. And I will have to give explanations to others. Otherwise, I would have empty messages often. Here is a Git commit message template:
<semver>: <a sentence short description about the changes under 80 characters>
Signed-off-by: Yordis Prieto <yordis.prieto@gmail.com>
What, How, and Why
LLMs can look at diffs to explain technical changes. They use documentation and comments for context. They're quite effective at this. I love that developers can ask the tools to explain the source code in their own way. It's a great equalizer. Yet, they only explain the "what" and "how."
That leaves us with the "why" aspect.
The "why" explains the conditions that led to the choices in the source code. That context is key for developers, maintainers, and AI tools. It helps them grasp the intent.
Without the "why," we're left guessing. Was this workaround for a production bug? Was it a business decision? Was it a limitation of a third-party system? Or a temporary patch?
The commit messages in the Linux kernel community are a great example. Most of the commit messages are explanations of the "why."
It is not a good idea to demand detailed information on every commit. Most of the time, the why behind something isn't that nuanced. When in doubt, ask the LLMs to write the why. Analyze it, and if it is precise, that means it is most likely unnecessary.
I am on the journey to figure out the right balance as well. I've worked long enough to know we enjoy a binary world. Please do not force it; more information is not always better, as I explained before.
Wrapping Up 👋
Your Git history is a storybook for people and AI. It is now a key dataset for AI tools that help understand code. Knowing the "why" gives your code a valuable story.
- Let AI tools handle the "what" and "how."
- Focus on the "why." Ask the LLM to write the "why." Keep in mind that a precise message often means it is not needed.
- More information is not always better.
- Keep commit messages as short as possible.
Every commit is part of your story. What story are you telling?
Talk to you later 🐊 alligator.