The Git project recently released Git 2.54.0. Let's look at a few notable highlights from this release, which includes contributions from the Git team at GitLab.
Pluggable Object Databases
Git already has the ability to store references with either the "files" backend or with the "reftable" backend. This is achieved by having proper abstractions in Git that allows us to have different backends.
But references are just one of the two important types of data that are stored in repositories, with the other being objects. Objects are stored in the object database, and each object database in turn consists of multiple object sources where objects can be read from or written to. Each object source either stores individual objects as so-called "loose" objects, or compresses multiple objects into a "packfile" in your .git/objects directory.
Until now, however, these sources did not have a proper abstraction boundary, so the storage format for objects is completely hardcoded into Git. But this is finally changing with pluggable object databases! The concept is straightforward and similar to how we did this for references in the past: Instead of having hardcoded code paths for how to store objects, we introduce an abstraction boundary that allows us to have different backends for storing objects.
While the idea is simple, the implementation is not, as we have hardcoded assumptions about the storage formats used in Git all over the place. In fact, we have started working on this topic in Git 2.48, which was released in January 2025. Initially, we focused on making object-related subsystems self-contained and creating proper subsystems for the existing backends that we had in Git.
With Git 2.54, we have now reached a milestone: The object database backend is now pluggable. Not all of Git's functionality is covered yet, but introducing an alternate backend that handles a meaningful subset of operations is now a realistic undertaking.
For now, only local workflows like creating commits, showing commit graphs, or performing merges will work with such an alternative implementation. This notably excludes anything that interacts with a remote, such as when you want to fetch or push changes. Regardless, this is the culmination of almost two years of work spanning across almost 400 commits that have been merged upstream, and we will of course continue to iterate on this effort.
So why does this matter? The idea is that it becomes practical to introduce new storage formats into Git. Examples could be:
- A storage format that is able to store large binary files more efficiently than packfiles do today
- A storage format that is custom-tailored for GitLab to ensure that we can serve repositories to our users even more efficiently than we currently can
This is a large-scale effort that is likely to shape the future of Git and GitLab.
This project was led by Patrick Steinhardt.
Easier editing of your commit history
In many software development projects it is common practice for developers to not only polish the code they want to contribute, but to also polish the commit history so that it becomes easy to review. The result is a set of small and atomic commits that each do one thing, with a good commit message that describes the intent of the commit as well as specific nuances.
Of course, more often than not, these atomic commits are not something that just happens naturally during the development process. Instead, the author of the changes will gain a better understanding of what they are while iterating on them, and the way to split up the commits will become clearer over time. Furthermore, the subsequent review process may result in feedback that requires changes to the crafted commits.
The consequence of this process is that the developer will have to rewrite their commit history many times during the development process. Historically, Git has allowed for this use case via interactive rebases. These interactive rebases are an extremely powerful tool: They let you reorder commits, rewrite commit messages, squash multiple commits together, or perform arbitrary edits of any commit.
But they are also somewhat arcane and hard to understand. The user needs to figure out the base commit for the rebase, they need to understand how to edit a somewhat obscure "instruction sheet," and they need to be aware of how the stateful rebasing process works. For example, users are presented with an instruction sheet similar to the following when rebasing a topic branch:
pick b60623f382 # t: detect errors outside of test cases # empty
pick b80cb55882 # t: prepare `test_match_signal ()` calls for `set -e`
pick 5ffe397f30 # t: prepare `test_must_fail ()` for `set -e`
pick 5e9b0cf5e1 # t: prepare `stop_git_daemon ()` for `set -e`
pick 299561e7a2 # t: prepare `git config --unset` calls for `set -e`
pick ed0e7ca2b5 # t: detect errors outside of test cases
So while interactive rebases are powerful, they are also quite intimidating for the average user.
It doesn't have to be this way, though. Tools like Jujutsu provide interfaces that are much easier to use compared to Git, as you can for example simply execute jj split to split up a commit into two commits. With Git and interactive rebases, this use case requires a lot of different steps with confusing command line arguments.
We have thus taken inspiration from Jujutsu and have introduced a new git-history(1) command into Git that is the foundation for better history editing. For now, this command has two subcommands:
git history rewordallows you to easily rewrite a commit message. You simply give it the commit whose message you want to reword, Git asks you for the new commit message, and that's it.git history splitallows you to split up a commit into two, which is inspired byjj split. You give it a commit, Git asks you which changes to stage into which commit and for the two commit messages, and then you're done.
This is of course only a start, and we want to add additional subcommands over time. For example:
git history fixupto take staged changes and automatically amend them to a specific commitgit history dropto remove a commitgit history reorderto reorder the sequence of commitsgit history squashto squash a range of commits
But that's not all! In addition to making history editing easy, this new command also knows to automatically rebase all of your local branches that previously included this commit. So that means that you can even edit a commit that is not on the current branch, and all branches that contain the commit will be rewritten.
It may seem puzzling at first that Git is automatically rebasing dependent branches, as that is a significant diversion from how git-rebase(1) works. But this is part of a bigger effort to bring better support for Stacked Diffs to Git, which are a way to create a series of multiple dependent branches that can be reviewed independently, but that together work towards a bigger goal.
This project was led by Patrick Steinhardt with support from Elijah Newren.
A native replacement for git-sizer(1)
The size of a Git repository is an important factor that determines how well Git and GitLab can handle it. But size alone is not the only factor, as the performance of a repository is ultimately a combination of multiple different dimensions:
- The depth of the commit history
- The shape of the directory structure
- The size of files stored in the repository
- The number of references
These are only some of the dimensions one needs to consider when trying to predict whether Git will be able to handle a repository well.
But while it is clear that the mere repository size is insufficient, Git itself does not provide any tooling that gives the user an easy overview of these metrics. Instead, users are forced to rely on third-party tools like git-sizer(1) to fill this gap. This tool does an excellent job at surfacing this information, but it is not part of Git itself and thus needs to be installed separately.
Observability of repository internals is critical to us at GitLab, so we introduced a new git repo structure command into Git 2.52 to display repository metrics, which we have extended in Git 2.53 to show inflated and disk sizes for objects by type.
In Git 2.54, we are now iterating some more on this command so that we don't only show the overall size, but also show the largest objects by type:
$ git clone https://gitlab.com/git-scm/git.git
$ cd git
$ git repo structure
Counting objects: 410445, done.
| Repository structure | Value |
| ------------------------- | ----------- |
| * References | |
| * Count | 1.01 k |
| * Branches | 1 |
| * Tags | 1.00 k |
| * Remotes | 9 |
| * Others | 0 |
| | |
| * Reachable objects | |
| * Count | 410.45 k |
| * Commits | 83.99 k |
| * Trees | 164.46 k |
| * Blobs | 161.00 k |
| * Tags | 1.00 k |
| * Inflated size | 7.46 GiB |
| * Commits | 57.53 MiB |
| * Trees | 2.33 GiB |
| * Blobs | 5.07 GiB |
| * Tags | 737.48 KiB |
| * Disk size | 181.37 MiB |
| * Commits | 33.11 MiB |
| * Trees | 40.58 MiB |
| * Blobs | 107.11 MiB |
| * Tags | 582.67 KiB |
| | |
| * Largest objects | |
| * Commits | |
| * Maximum size [1] | 17.23 KiB |
| * Maximum parents [2] | 10 |
| * Trees | |
| * Maximum size [3] | 58.85 KiB |
| * Maximum entries [4] | 1.18 k |
| * Blobs | |
| * Maximum size [5] | 1019.51 KiB |
| * Tags | |
| * Maximum size [6] | 7.13 KiB |
[1] f6ecb603ff8af608a417d7724727d6bc3a9dbfdf
[2] 16d7601e176cd53f3c2f02367698d06b85e08879
[3] 203ee97047731b9fd3ad220faa607b6677861a0d
[4] 203ee97047731b9fd3ad220faa607b6677861a0d
[5] aa96f8bc361fd84a1459440f1e7de02ab0dc3543
[6] 07e38db6a5a03690034d27104401f6c8ea40f1fc
With this information we're now almost feature-complete as compared to git-sizer(1). We're not done yet, though — we plan to eventually add additional features such as:
- Severity levels as they exist in git-sizer(1)
- Graphs that show you the distribution of object sizes
- The ability to scan objects reachable via a subset of references
This project was led by Justin Tobler.
New infrastructure for repository maintenance
Whenever you write data into a Git repository you will typically end up adding more loose objects. Left unmanaged, this leads to a large number of separate files in your .git/objects/ directory, which slows down several operations that want to access many objects at once. Git thus regularly packs these objects into "packfiles" to ensure good performance.
This isn't the only data structure that may become inefficient over time: Updating references may create loose references, reflogs will need trimming, worktrees may become stale, and caches like commit-graphs need to be refreshed regularly.
All of these tasks have historically been managed by git-gc(1). However, this tool has a monolithic architecture, where it basically executes all of the tasks required in sequential order. This foundation is hard to extend and doesn't give the end user much flexibility in case they want to slightly modify how housekeeping is performed.
The Git project introduced the new git-maintenance(1) tool in Git 2.29. In contrast to git-gc(1), git-maintenance(1) is not monolithic but is instead structured around tasks. These tasks are freely configurable by the user so that the user can control which tasks are running, giving them much more fine-grained control over repository maintenance.
Eventually, Git has migrated to use git-maintenance(1) by default. But in the beginning, the only task that was default-enabled was the git-gc(1) task, which as you might have guessed, simply executes git gc. To manually run maintenance using this new command you can execute git maintenance run, but Git knows to execute this automatically after several other commands.
Over the last couple releases we have implemented all the individual tasks that are supported by git-gc(1) in git-maintenance(1) to ensure that we have feature parity between these two tools.
Furthermore, we have implemented a new task that uses Git's modern architecture for repacking objects with geometric compaction. Geometric compaction is a much better fit for large monorepos, and with our efforts to make them work well with partial clones that landed in Git 2.53 they are now a full replacement for our previous repacking strategy in Git.
In Git 2.54, we have now reached another significant milestone: Instead of using the git-gc(1)-based strategy by default, we are now using geometric repacking with fine-grained individual maintenance tasks! Besides being more efficient for large monorepos, it also ensures that we have an easier foundation to iterate on going forward.
The git-maintenance(1) infrastructure was originally implemented by Derrick Stolee and geometric maintenance was introduced by Taylor Blau. The effort to introduce the new fine-grained tasks and migrate to the new maintenance strategy was led by Patrick Steinhardt.
Read more
This article highlighted just a few of the contributions made by GitLab and the wider Git community for this latest release. You can learn about these from the official release announcement of the Git project. Also, check out our previous Git release blog posts to see other past highlights of contributions from GitLab team members.



