[{"data":1,"prerenderedAt":815},["ShallowReactive",2],{"/en-us/blog/making-gitlab-faster":3,"navigation-en-us":34,"banner-en-us":445,"footer-en-us":455,"blog-post-authors-en-us-Yorick Peterse":697,"blog-related-posts-en-us-making-gitlab-faster":711,"blog-promotions-en-us":752,"next-steps-en-us":805},{"id":4,"title":5,"authorSlugs":6,"authors":8,"body":10,"category":11,"categorySlug":11,"config":12,"content":16,"date":20,"description":17,"extension":21,"externalUrl":22,"featured":14,"heroImage":19,"isFeatured":14,"meta":23,"navigation":24,"path":25,"publishedDate":20,"rawbody":26,"seo":27,"slug":13,"stem":31,"tagSlugs":32,"tags":22,"template":15,"updatedDate":22,"__hash__":33},"blogPosts/en-us/blog/making-gitlab-faster.yml","Making GitLab Faster",[7],"yorick-peterse",[9],"Yorick Peterse","\nIn GitLab 8.5 we shipped numerous performance improvements. In this article\nwe'll take a look at some of these changes and the process involved in finding\nand resolving these issues. In particular we'll look at the following merge\nrequests:\n\n* [Optimize fetching issues closed by a merge request][mr2625]\n* [Improve performance of retrieving last update times for events][mr2613]\n* [Only set autocrlf when creating/updating files][mr2859]\n\n\u003C!--more-->\n\n## Performance Monitoring & Tooling\n\nWithout a proper production performance monitoring system and a good set of\ntools it's nearly impossible to find and resolve performance problems. GitLab\ncomes with two systems to make it possible to measure application performance:\n\n* [GitLab Performance Monitoring][monitoring]: a monitoring system using\n  [InfluxDB][influxdb] to track application performance of production\n  environments (though you can also use it during development). Data is then\n  visualized using [Grafana][grafana], though users can use any software capable\n  of extracting data from InfluxDB.\n* Sherlock: a development only monitoring system. Due to the overhead of\n  Sherlock it's not suitable for production environments. For example, Sherlock\n  uses [rblineprof][rblineprof] to track execution timings on a per line basis\n  but this adds quite a bit of overhead.\n\nAnother very useful library is [benchmark-ips][benchmark-ips]. This library can\nbe used to measure the performance of snippets of code while taking care of\nwarming up any caches, Just In Time compilers, etc. For more information see the\n[benchmark-ips README][benchmark-ips-readme].\n\n### Limitations of Benchmarks\n\nWhile we're on the topic of benchmarks it's worth mentioning that benchmarks are\nonly really useful to see the impact of a certain change. For example, if\nbenchmark X can run Y iterations in a certain time period this gives you no\ninsight in how this will perform in a production environment; all it indicates\nis that it can run a certain number of iterations. However, when a certain\nchange results in the benchmark now completing twice as fast things start\ngetting interesting. While we still don't really know how the change will affect\nour production environment we at least know that in the most ideal case\nperformance will be twice as fast.\n\nIn short, just benchmarks aren't enough; you always have to measure (and _keep_\nmeasuring) the performance of code in a production environment. This may seem\nlike common knowledge but a few too many projects out there make bold claims\nabout their performance based solely on a set of benchmarks.\n\nWith that out of the way, let's get started.\n\n## Optimize fetching issues closed by a merge request\n\nCommit messages can be used to automatically close issues by adding the text\n\"Fixes #X\" or \"Closes #X\" to a commit message (where X refers to an issue ID).\nIn turn each merge request shows the list of issues that will be closed whenever\nthe merge request is merged. The description of a merge request can also include\ninclude text such as \"Fixes #X\" to close issues. In other words, the list of\nissues to close is a set composed out of the issues to close as extracted from\nthe commit messages and the issues to close as extracted from the merge\nrequest's description.\n\nWhich brings us to the method `MergeRequest#closes_issues`. This method is used\nto return the list of issues to close (as an Array of `Issue` instances). If we\nlook at the performance of this method over time we see the following:\n\n![MergeRequest#closes_issues Timings][mr2625-timings]\n\nThe small gap at the start of the graph is due to monitoring data only being\nretained for 30 days.\n\nTo summarize the timings:\n\n* A mean of around 500 milliseconds\n* A 95th percentile between 1 and 1.5 seconds\n* A 99th percentile between 1.5 and 2 seconds\n\n2 seconds (in the worst case) to retrieve a list of issues to close is not\nacceptable so it was clear there was some work to be done.\n\nPrior to 8.5 this method was implemented as the following:\n\n    def closes_issues(current_user = self.author)\n       if target_branch == project.default_branch\n         issues = commits.flat_map { |c| c.closes_issues(current_user) }\n         issues.push(*Gitlab::ClosingIssueExtractor.new(project, current_user).\n                    closed_by_message(description))\n         issues.uniq(&:id)\n       else\n         []\n       end\n    end\n\nWhen the target branch of a merge request equals the project's default branch\nthis method takes the following steps:\n\n1. For every commit in the merge request, grab the issues that should be closed\n   when the merge request is merged.\n2. Append the list of issues to close based on the merge request's description\n   to the list of issues created in step 1.\n3. Remove any duplicate issues (based on the issue IDs) from the resulting list.\n\nWhat stood out here is the following line:\n\n    issues = commits.flat_map { |c| c.closes_issues(current_user) }\n\nFor every commit the method `Commit#closes_issues` would be called, which in\nturn was implemented as the following:\n\n    def closes_issues(current_user = self.committer)\n      Gitlab::ClosingIssueExtractor.new(project, current_user).closed_by_message(safe_message)\n    end\n\nFurther digging revealed that `Gitlab::ClosingIssueExtractor#closed_by_message`\nwould perform two steps:\n\n1. Extract the referenced issue IDs from a String\n2. Run a database query to return a list of corresponding `Issue` objects\n\nNote that the above steps would be performed for _every_ commit in a merge\nrequest, regardless of whether a commit would actually reference an issue or\nnot. As such the more commits a merge request would contain the slower things\nwould get.\n\nIf we look at how `Gitlab::ClosingIssueExtractor#closed_by_message` is\nimplemented and used we see that it operates on a single String and doesn't\nreally care what it contains or where it comes from as long as it contains\nreferences to issue IDs:\n\n    def closed_by_message(message)\n      return [] if message.nil?\n\n      closing_statements = []\n      message.scan(ISSUE_CLOSING_REGEX) do\n        closing_statements \u003C\u003C Regexp.last_match[0]\n      end\n\n      @extractor.analyze(closing_statements.join(\" \"))\n\n      @extractor.issues\n    end\n\nThis got me thinking: what if we concatenate all commit messages together and\npass the resulting String to `Gitlab::ClosingIssueExtractor#closed_by_message`?\nDoing so would mean performance is no longer affected by the amount of commits\nin a merge request.\n\nTo test this I wrote a benchmark to compare the old setup versus the idea I was\ngoing for:\n\n    require 'benchmark/ips'\n\n    project = Project.find_with_namespace('gitlab-org/gitlab-ce')\n    user    = User.find_by_username('yorickpeterse')\n    commits = ['Fixes #1', 'Fixes #2', 'Fixes #3']\n    desc    = 'This MR fixes #1 #2 #3'\n\n    Benchmark.ips do |bench|\n      # A somewhat simplified version of the old code (excluding any actual\n      # commit/merge request objects).\n      bench.report 'old' do\n        issues = commits.flat_map do |message|\n          Gitlab::ClosingIssueExtractor.new(project, user).\n            closed_by_message(message)\n        end\n\n        issues.push(*Gitlab::ClosingIssueExtractor.new(project, user).\n                   closed_by_message(desc))\n\n        issues.uniq(&:id)\n      end\n\n      # The new code\n      bench.report 'new' do\n        messages = commits + [desc]\n\n        Gitlab::ClosingIssueExtractor.new(project, user).\n          closed_by_message(messages.join(\"\\n\"))\n      end\n\n      bench.compare!\n    end\n\nWhen running this benchmark we get the following output:\n\n    Calculating -------------------------------------\n                     old     1.000  i/100ms\n                     new     1.000  i/100ms\n    -------------------------------------------------\n                     old      1.377  (± 0.0%) i/s -      7.000\n                     new      2.807  (± 0.0%) i/s -     15.000  in   5.345900s\n\n    Comparison:\n                     new:        2.8 i/s\n                     old:        1.4 i/s - 2.04x slower\n\nSo in this benchmark alone the new code is around 2 times faster than the old\ncode. The actual number of iterations isn't very relevant, we just want to know\nif we're on the right track or not.\n\nRunning the test suite showed no tests were broken by these changes so it was\ntime to set up a merge request and deploy this to GitLab.com (and of course\ninclude it in the next release, 8.5 in this case) to see the impact in a\nproduction environment. The merge request for this was [\"Optimize fetching\nissues closed by a merge request\"][mr2625]. These changes were deployed around\nthe 12th of February and we can see the impact on GitLab.com in the following\ngraph:\n\n![MergeRequest#closes_issues Timings][mr2625-timings]\n\nThat's right, we went from timings between 0.5 and 2.5 seconds to timings of\nless than 15 milliseconds (method call timings below 15 milliseconds are not\ntracked). Ship it!\n\n## Improve performance of retrieving last update times for events\n\nFor certain activity feeds we provide Atom feeds that users can subscribe to.\nFor example \u003Chttps://gitlab.com/yorickpeterse.atom> provides an Atom feed of\nmy public GitLab.com activity. The feed is built by querying a list of records\nfrom the database called \"events\". The SQL query is rather large as the list of\nevents to return is based on the projects a user has access to (in case of user\nactivity feeds). For example, for my own user profile the query would be as\nfollowing:\n\n    SELECT events.*\n    FROM events\n    LEFT OUTER JOIN projects ON projects.id = events.project_id\n    LEFT OUTER JOIN namespaces ON namespaces.id = projects.namespace_id\n    WHERE events.author_id IS NOT NULL\n    AND events.author_id = 209240\n    AND (\n        projects.id IN (\n            SELECT projects.id\n            FROM projects\n            WHERE projects.id IN (\n                -- All projects directly owned by a user.\n                SELECT projects.id\n                FROM projects\n                INNER JOIN namespaces ON projects.namespace_id = namespaces.id\n                WHERE namespaces.owner_id = 209240\n                AND namespaces.type IS NULL\n\n                UNION\n\n                -- All projects of the groups a user is a member of\n                SELECT projects.id\n                FROM projects\n                INNER JOIN namespaces ON projects.namespace_id = namespaces.id\n                INNER JOIN members ON namespaces.id = members.source_id\n                WHERE namespaces.type IN ('Group')\n                AND members.type IN ('GroupMember')\n                AND members.source_type = 'Namespace'\n                AND members.user_id = 209240\n\n                UNION\n\n                -- All projects (that don't belong to one of the groups of a\n                -- user) a user is a member of\n                SELECT projects.id\n                FROM projects\n                INNER JOIN members ON projects.id = members.source_id\n                WHERE members.type IN ('ProjectMember')\n                AND members.source_type = 'Project'\n                AND members.user_id = 209240\n            )\n\n            UNION\n\n            -- All publicly available projects, regardless of whether we still\n            -- have access or not.\n            SELECT projects.id\n            FROM projects\n            WHERE projects.visibility_level IN (20, 10)\n        )\n    )\n    ORDER BY events.id DESC;\n\nThis particular query is quite the behemoth but currently this is the easiest\nway of getting a list of events for projects a user has access to.\n\nOne of the bits of information provided by an Atom feed is a timestamp\nindicating the time the feed was updated. This timestamp was generated using the\nmethod `Event.latest_update_time` which would take a collection of events and\nreturn the most recent update time. This method was implemented as the following:\n\n    def latest_update_time\n      row = select(:updated_at, :project_id).reorder(id: :desc).take\n\n      row ? row.updated_at : nil\n    end\n\nThis method is broken up in two steps:\n\n1. Order the collection in descending order, take the first record\n2. If there was a record return the `updated_at` value, otherwise return `nil`\n\nThis method was then used as the following in the Atom feed (here `xml.updated`\nwould generate an `\u003Cupdated>` XML element):\n\n    xml.updated @events.latest_update_time.xmlschema if @events.any?\n\nPerformance of this method was less than stellar (the blue bars are the timings\nof `Event.latest_update_time`):\n\n![Event.latest_update_time Timings][mr2613-timings]\n\nIn this graph we can see the timings quite often hover around 10 seconds. That's\n10 seconds _just_ to get the latest update time from the database. Ouch!\n\nAt first I started messing around with using the SQL `max()` function instead of\na combination of `ORDER BY` and `LIMIT 1`. We were using this in the past and I\nexplicitly removed it because it was performing worse at the time. Since quite a\nbit changed since then I figured it was worth re-investigating the use of this\nfunction. The process of looking into this as well as my findings can be found\nin issue [12415](https://gitlab.com/gitlab-org/gitlab-ce/issues/12415).\n\nA couple of days after I first started looking into this issue I realized there\nwas a far easier solution to this problem. Since retrieving the list of events\nitself (without using the above code) is already quite fast and is already\nsorted in the right order we can simply re-use this list. That is, we'd take the\nfollowing steps:\n\n1. Query the list of events.\n2. Cast the list of events from an ActiveRecord query result to an Array (this\n   is done anyway later on as we have to generate XML for every event).\n3. Take the `updated_at` value of the first event in this list, if present.\n\nThis led to merge request\n[\"Improve performance of retrieving last update times for events\"][mr2613]. This\nmerge request also contains a few other changes so certain records aren't loaded\ninto memory when not needed, but the gist of it is that instead of this:\n\n    xml.updated @events.latest_update_time.xmlschema if @events.any?\n\nWe now use this:\n\n    xml.updated @events[0].updated_at.xmlschema if @events[0]\n\nAs a result of this the method `Event.latest_update_time` was no longer needed\nand thus was removed. This in turn drastically reduced the loading times of all\nAtom feeds (not just user feeds).\n\n## Only set autocrlf when creating/updating files\n\nGit has an option called `core.autocrlf` which can be used to automatically\nconvert line endings in text files. This option can be set to 3 values:\n\n1. `true`: CRLF line endings are always converted to LF line endings\n2. `false`: no conversion takes place\n3. `input`: converts CRLF line endings to LF upon committing changes\n\nGitLab supports 3 ways of committing changes to a Git repository:\n\n1. Via a Git client\n2. Via the web editor\n3. Via the API\n\nIn the last 2 cases we want to make sure CRLF line endings are replaced with LF\nline endings. For example, browsers use CRLF even on non Windows platforms. To\ntake care of this our documentation recommends users to configure Git to set\n`core.autocrlf` to `input`, however we still need to take care of this ourselves\nin case a user didn't configure Git to convert line endings by default. This\nprocess took place in a method called `Repository#raw_repository` which was\nimplemented as the following:\n\n    def raw_repository\n      return nil unless path_with_namespace\n\n      @raw_repository ||= begin\n        repo = Gitlab::Git::Repository.new(path_to_repo)\n        repo.autocrlf = :input\n        repo\n      rescue Gitlab::Git::Repository::NoRepository\n        nil\n      end\n    end\n\nThis particular method is used in quite a number of places and is used on almost\nevery (if not every) project-specific page (issues, milestones, the project\nhomepage, etc). Performance of this method was, well, bad:\n\n![Gitlab::Git::Repository#autocrlf= Timings][mr2859-bars]\n\nThis particular graph plots the 95th percentile of the method\n`Gitlab::Git::Repository#autocrlf=` which is used to set the `core.autocrlf`\noption. We can see that on average the 95th percentile hovers around 500\nmilliseconds. That's 500 milliseconds on almost every page to set a Git option\nthat's already set 99% of the time. More importantly, that's 500 milliseconds of\ntime wasted on many pages where no changes are ever written to a Git repository,\nthus never using this option.\n\nIt's clear that we _don't_ want to run this on every page, especially when the\noption is not going to be used. However, we still have to make sure this option\nis set when we _do_ need it. At this point my first thought was to see the\noverhead of always writing this option versus only writing this when actually\nneeded. In Ruby code this would roughly translate to:\n\n    repo = Gitlab::Git::Repository.new(path_to_repo)\n\n    # Only set autocrlf to :input if it's not already set to :input\n    repo.autocrlf = :input unless repo.autocrlf == :input\n\nThe idea was that when sharing a disk over the network (e.g. via an NFS server)\na read is probably much faster than a write. A write may also end up locking\nfiles for the duration, possibly blocking other read operations. To test this I\nwrote a script that would perform said operation a number of times and write the\ntimings to InfluxDB. This script is as the following:\n\n    require 'rugged'\n    require 'thread'\n    require 'benchmark'\n    require 'influxdb'\n\n    Thread.abort_on_exception = true\n\n    path = '/var/opt/gitlab/git-data/repositories/yorickpeterse/cat-pictures.git'\n    key  = 'core.autocrlf'\n    read = true\n\n    influx_options = { udp: { host: 'HOST', port: PORT } }\n\n    threads = 10.times.map do\n      Thread.new do\n        client = InfluxDB::Client.new(influx_options)\n\n        while read\n          time = Benchmark.measure do\n            repo = Rugged::Repository.new(path)\n\n            repo.config[key] = 'input' unless repo.config[key] == 'input'\n          end\n\n          ms = time.real * 1000\n\n          client.write_point('rugged_config_cas', values: { duration: ms })\n\n          sleep 0.05\n        end\n      end\n    end\n\n    sleep(120)\n\n    read = false\n\n    threads.each(&:join)\n\n    Rugged::Repository.new(path).config[key] = 'input'\n\nHere HOST and PORT were replaced with the hostname and port number of our\nInfluxDB server.\n\nRunning this script produced the following graph:\n\n![Timings for writing autocrlf when needed](https://about.gitlab.com/images/making_gitlab_faster/autocrlf_write_when_needed.png)\n\nNext I modified this script to simply always write the autocrlf option, this\nproduced the following graph:\n\n![Timings for always writing autocrlf](https://about.gitlab.com/images/making_gitlab_faster/autocrlf_always_write.png)\n\nFinally I modified the script to simply load the repository as-is, this produced\nthe following graph:\n\n![Timings for only reading](https://about.gitlab.com/images/making_gitlab_faster/autocrlf_read_only.png)\n\nIn all 3 cases we can see there's not really a clear difference in timings,\nleading me to believe there's no particular benefit to only writing the option\nwhen not already set to \"input\".\n\nI spent some more time trying out different things to see how they would impact\nperformance but sadly didn't get much out of it. The details can be found in the\nvarious comments for [issue 13457](https://gitlab.com/gitlab-org/gitlab-ce/issues/13457).\n\nA day later I and [Jacob Vosmaer][jacob] decided to double check the idea of\nwriting only when needed by applying a small patch to GitLab.com. This patch\nmodified `Repository#raw_repository` to the autocrlf option would only be\nwritten when needed just like the script above. We also made sure to measure the\ntimings of both reading and writing this option. After deploying this patch and\nwaiting for about half an hour to get enough data the timings were as the\nfollowing:\n\n![autocrlf reads vs writes](https://about.gitlab.com/images/making_gitlab_faster/autocrlf_reads_vs_writes.png)\n\nThis graph shows a nice drop in timings for writing the autocrlf option, sadly\nat the cost of an increase in timings for reading the autocrlf option. In other\nwords, this change didn't actually solve anything but instead just moved the\nproblem from writing an option to just reading the option.\n\nAfter discussing this with Jacob he suggested it may be an even better idea to\nonly set this option where we actually need it to, instead of checking (and\npotentially writing) it on every page that happens to use\n`Repository#raw_repository`. After all, the best way to speed code up is to\nremove it entirely (or at least as much as possible).\n\nThis lead to merge request\n[\"Only set autocrlf when creating/updating files\"][mr2859] which does exactly\nthat. The impact of this change can be seen in the following graph:\n\n![Merge Request Timings Impact](https://about.gitlab.com/images/making_gitlab_faster/autocrlf_timings_impact.png)\n\nThis graph shows the 95th percentile, 99th percentile, and the mean per 30\nminutes. The drop around the 20th is after the above merge request was deployed\nto GitLab.com. The changes in this merge request resulted in the timings going\nfrom between 70 milliseconds and 2.1 seconds to less than 15 milliseconds.\n\n## Conclusion\n\nIn this article I only highlighted 3 merge requests that made it into 8.5.0. The\nfollowing performance related merge requests are also included in 8.5.0:\n\n* [First pass at deleting projects in the background](https://gitlab.com/gitlab-org/gitlab-ce/merge_requests/2569)\n* [Background process note logic](https://gitlab.com/gitlab-org/gitlab-ce/merge_requests/2631)\n* [Page project list on dashboard](https://gitlab.com/gitlab-org/gitlab-ce/merge_requests/2689)\n* [Cache BroadcastMessage.current](https://gitlab.com/gitlab-org/gitlab-ce/merge_requests/2633)\n* [Smarter flushing of branch statistics caches](https://gitlab.com/gitlab-org/gitlab-ce/merge_requests/2769)\n* [Cache various Repository Git operations](https://gitlab.com/gitlab-org/gitlab-ce/merge_requests/2752)\n* [Dedicated method for counting commits between refs](https://gitlab.com/gitlab-org/gitlab-ce/merge_requests/2707)\n\nThese are just a few of the performance changes we've made over the past few\nmonths, and they certainly won't be the last as there's still a lot of work to\nbe done.\n\n[mr2625]: https://gitlab.com/gitlab-org/gitlab-ce/merge_requests/2625\n[mr2625-timings]: https://about.gitlab.com/images/making_gitlab_faster/merge_request_closes_issues.png\n[mr2613]: https://gitlab.com/gitlab-org/gitlab-ce/merge_requests/2613\n[mr2613-timings]: https://about.gitlab.com/images/making_gitlab_faster/event_latest_update_time.png\n[mr2859]: https://gitlab.com/gitlab-org/gitlab-ce/merge_requests/2859\n[mr2859-bars]: https://about.gitlab.com/images/making_gitlab_faster/gitlab_git_repository_autocrlf_bars.png\n[monitoring]: https://docs.gitlab.com/ce/monitoring/performance/introduction/\n[influxdb]: https://influxdata.com/time-series-platform/influxdb/\n[grafana]: http://grafana.org/\n[rblineprof]: https://github.com/peek/peek-rblineprof\n[benchmark-ips]: https://github.com/evanphx/benchmark-ips\n[benchmark-ips-readme]: https://github.com/evanphx/benchmark-ips/blob/master/README.md\n[jacob]: https://gitlab.com/jacobvosmaer\n","engineering",{"slug":13,"featured":14,"template":15},"making-gitlab-faster",false,"BlogPost",{"title":5,"description":17,"authors":18,"heroImage":19,"date":20,"body":10,"category":11},"In GitLab 8.5 we shipped numerous performance improvements. In this article we'll take a look at some of these changes and the process involved in finding and resolving these issues.",[9],"https://res.cloudinary.com/about-gitlab-com/image/upload/v1749663397/Blog/Hero%20Images/logoforblogpost.jpg","2016-02-25","yml",null,{},true,"/en-us/blog/making-gitlab-faster","seo:\n  title: Making GitLab Faster\n  description: >-\n    In GitLab 8.5 we shipped numerous performance improvements. In this article\n    we'll take a look at some of these changes and the process involved in\n    finding and resolving these issues.\n  ogTitle: Making GitLab Faster\n  ogDescription: >-\n    In GitLab 8.5 we shipped numerous performance improvements. In this article\n    we'll take a look at some of these changes and the process involved in\n    finding and resolving these issues.\n  noIndex: false\n  ogImage: >-\n    https://res.cloudinary.com/about-gitlab-com/image/upload/v1749663397/Blog/Hero%20Images/logoforblogpost.jpg\n  ogUrl: https://about.gitlab.com/blog/making-gitlab-faster\n  ogSiteName: https://about.gitlab.com\n  ogType: article\n  canonicalUrls: https://about.gitlab.com/blog/making-gitlab-faster\ncontent:\n  title: Making GitLab Faster\n  description: >-\n    In GitLab 8.5 we shipped numerous performance improvements. In this article\n    we'll take a look at some of these changes and the process involved in\n    finding and resolving these issues.\n  authors:\n    - Yorick Peterse\n  heroImage: >-\n    https://res.cloudinary.com/about-gitlab-com/image/upload/v1749663397/Blog/Hero%20Images/logoforblogpost.jpg\n  date: '2016-02-25'\n  body: >\n\n    In GitLab 8.5 we shipped numerous performance improvements. In this article\n\n    we'll take a look at some of these changes and the process involved in\n    finding\n\n    and resolving these issues. In particular we'll look at the following merge\n\n    requests:\n\n\n    * [Optimize fetching issues closed by a merge request][mr2625]\n\n    * [Improve performance of retrieving last update times for events][mr2613]\n\n    * [Only set autocrlf when creating/updating files][mr2859]\n\n\n    \u003C!--more-->\n\n\n    ## Performance Monitoring & Tooling\n\n\n    Without a proper production performance monitoring system and a good set of\n\n    tools it's nearly impossible to find and resolve performance problems.\n    GitLab\n\n    comes with two systems to make it possible to measure application\n    performance:\n\n\n    * [GitLab Performance Monitoring][monitoring]: a monitoring system using\n      [InfluxDB][influxdb] to track application performance of production\n      environments (though you can also use it during development). Data is then\n      visualized using [Grafana][grafana], though users can use any software capable\n      of extracting data from InfluxDB.\n    * Sherlock: a development only monitoring system. Due to the overhead of\n      Sherlock it's not suitable for production environments. For example, Sherlock\n      uses [rblineprof][rblineprof] to track execution timings on a per line basis\n      but this adds quite a bit of overhead.\n\n    Another very useful library is [benchmark-ips][benchmark-ips]. This library\n    can\n\n    be used to measure the performance of snippets of code while taking care of\n\n    warming up any caches, Just In Time compilers, etc. For more information see\n    the\n\n    [benchmark-ips README][benchmark-ips-readme].\n\n\n    ### Limitations of Benchmarks\n\n\n    While we're on the topic of benchmarks it's worth mentioning that benchmarks\n    are\n\n    only really useful to see the impact of a certain change. For example, if\n\n    benchmark X can run Y iterations in a certain time period this gives you no\n\n    insight in how this will perform in a production environment; all it\n    indicates\n\n    is that it can run a certain number of iterations. However, when a certain\n\n    change results in the benchmark now completing twice as fast things start\n\n    getting interesting. While we still don't really know how the change will\n    affect\n\n    our production environment we at least know that in the most ideal case\n\n    performance will be twice as fast.\n\n\n    In short, just benchmarks aren't enough; you always have to measure (and\n    _keep_\n\n    measuring) the performance of code in a production environment. This may\n    seem\n\n    like common knowledge but a few too many projects out there make bold claims\n\n    about their performance based solely on a set of benchmarks.\n\n\n    With that out of the way, let's get started.\n\n\n    ## Optimize fetching issues closed by a merge request\n\n\n    Commit messages can be used to automatically close issues by adding the text\n\n    \"Fixes #X\" or \"Closes #X\" to a commit message (where X refers to an issue\n    ID).\n\n    In turn each merge request shows the list of issues that will be closed\n    whenever\n\n    the merge request is merged. The description of a merge request can also\n    include\n\n    include text such as \"Fixes #X\" to close issues. In other words, the list of\n\n    issues to close is a set composed out of the issues to close as extracted\n    from\n\n    the commit messages and the issues to close as extracted from the merge\n\n    request's description.\n\n\n    Which brings us to the method `MergeRequest#closes_issues`. This method is\n    used\n\n    to return the list of issues to close (as an Array of `Issue` instances). If\n    we\n\n    look at the performance of this method over time we see the following:\n\n\n    ![MergeRequest#closes_issues Timings][mr2625-timings]\n\n\n    The small gap at the start of the graph is due to monitoring data only being\n\n    retained for 30 days.\n\n\n    To summarize the timings:\n\n\n    * A mean of around 500 milliseconds\n\n    * A 95th percentile between 1 and 1.5 seconds\n\n    * A 99th percentile between 1.5 and 2 seconds\n\n\n    2 seconds (in the worst case) to retrieve a list of issues to close is not\n\n    acceptable so it was clear there was some work to be done.\n\n\n    Prior to 8.5 this method was implemented as the following:\n\n        def closes_issues(current_user = self.author)\n           if target_branch == project.default_branch\n             issues = commits.flat_map { |c| c.closes_issues(current_user) }\n             issues.push(*Gitlab::ClosingIssueExtractor.new(project, current_user).\n                        closed_by_message(description))\n             issues.uniq(&:id)\n           else\n             []\n           end\n        end\n\n    When the target branch of a merge request equals the project's default\n    branch\n\n    this method takes the following steps:\n\n\n    1. For every commit in the merge request, grab the issues that should be\n    closed\n       when the merge request is merged.\n    2. Append the list of issues to close based on the merge request's\n    description\n       to the list of issues created in step 1.\n    3. Remove any duplicate issues (based on the issue IDs) from the resulting\n    list.\n\n\n    What stood out here is the following line:\n\n        issues = commits.flat_map { |c| c.closes_issues(current_user) }\n\n    For every commit the method `Commit#closes_issues` would be called, which in\n\n    turn was implemented as the following:\n\n        def closes_issues(current_user = self.committer)\n          Gitlab::ClosingIssueExtractor.new(project, current_user).closed_by_message(safe_message)\n        end\n\n    Further digging revealed that\n    `Gitlab::ClosingIssueExtractor#closed_by_message`\n\n    would perform two steps:\n\n\n    1. Extract the referenced issue IDs from a String\n\n    2. Run a database query to return a list of corresponding `Issue` objects\n\n\n    Note that the above steps would be performed for _every_ commit in a merge\n\n    request, regardless of whether a commit would actually reference an issue or\n\n    not. As such the more commits a merge request would contain the slower\n    things\n\n    would get.\n\n\n    If we look at how `Gitlab::ClosingIssueExtractor#closed_by_message` is\n\n    implemented and used we see that it operates on a single String and doesn't\n\n    really care what it contains or where it comes from as long as it contains\n\n    references to issue IDs:\n\n        def closed_by_message(message)\n          return [] if message.nil?\n\n          closing_statements = []\n          message.scan(ISSUE_CLOSING_REGEX) do\n            closing_statements \u003C\u003C Regexp.last_match[0]\n          end\n\n          @extractor.analyze(closing_statements.join(\" \"))\n\n          @extractor.issues\n        end\n\n    This got me thinking: what if we concatenate all commit messages together\n    and\n\n    pass the resulting String to\n    `Gitlab::ClosingIssueExtractor#closed_by_message`?\n\n    Doing so would mean performance is no longer affected by the amount of\n    commits\n\n    in a merge request.\n\n\n    To test this I wrote a benchmark to compare the old setup versus the idea I\n    was\n\n    going for:\n\n        require 'benchmark/ips'\n\n        project = Project.find_with_namespace('gitlab-org/gitlab-ce')\n        user    = User.find_by_username('yorickpeterse')\n        commits = ['Fixes #1', 'Fixes #2', 'Fixes #3']\n        desc    = 'This MR fixes #1 #2 #3'\n\n        Benchmark.ips do |bench|\n          # A somewhat simplified version of the old code (excluding any actual\n          # commit/merge request objects).\n          bench.report 'old' do\n            issues = commits.flat_map do |message|\n              Gitlab::ClosingIssueExtractor.new(project, user).\n                closed_by_message(message)\n            end\n\n            issues.push(*Gitlab::ClosingIssueExtractor.new(project, user).\n                       closed_by_message(desc))\n\n            issues.uniq(&:id)\n          end\n\n          # The new code\n          bench.report 'new' do\n            messages = commits + [desc]\n\n            Gitlab::ClosingIssueExtractor.new(project, user).\n              closed_by_message(messages.join(\"\\n\"))\n          end\n\n          bench.compare!\n        end\n\n    When running this benchmark we get the following output:\n\n        Calculating -------------------------------------\n                         old     1.000  i/100ms\n                         new     1.000  i/100ms\n        -------------------------------------------------\n                         old      1.377  (± 0.0%) i/s -      7.000\n                         new      2.807  (± 0.0%) i/s -     15.000  in   5.345900s\n\n        Comparison:\n                         new:        2.8 i/s\n                         old:        1.4 i/s - 2.04x slower\n\n    So in this benchmark alone the new code is around 2 times faster than the\n    old\n\n    code. The actual number of iterations isn't very relevant, we just want to\n    know\n\n    if we're on the right track or not.\n\n\n    Running the test suite showed no tests were broken by these changes so it\n    was\n\n    time to set up a merge request and deploy this to GitLab.com (and of course\n\n    include it in the next release, 8.5 in this case) to see the impact in a\n\n    production environment. The merge request for this was [\"Optimize fetching\n\n    issues closed by a merge request\"][mr2625]. These changes were deployed\n    around\n\n    the 12th of February and we can see the impact on GitLab.com in the\n    following\n\n    graph:\n\n\n    ![MergeRequest#closes_issues Timings][mr2625-timings]\n\n\n    That's right, we went from timings between 0.5 and 2.5 seconds to timings of\n\n    less than 15 milliseconds (method call timings below 15 milliseconds are not\n\n    tracked). Ship it!\n\n\n    ## Improve performance of retrieving last update times for events\n\n\n    For certain activity feeds we provide Atom feeds that users can subscribe\n    to.\n\n    For example \u003Chttps://gitlab.com/yorickpeterse.atom> provides an Atom feed of\n\n    my public GitLab.com activity. The feed is built by querying a list of\n    records\n\n    from the database called \"events\". The SQL query is rather large as the list\n    of\n\n    events to return is based on the projects a user has access to (in case of\n    user\n\n    activity feeds). For example, for my own user profile the query would be as\n\n    following:\n\n        SELECT events.*\n        FROM events\n        LEFT OUTER JOIN projects ON projects.id = events.project_id\n        LEFT OUTER JOIN namespaces ON namespaces.id = projects.namespace_id\n        WHERE events.author_id IS NOT NULL\n        AND events.author_id = 209240\n        AND (\n            projects.id IN (\n                SELECT projects.id\n                FROM projects\n                WHERE projects.id IN (\n                    -- All projects directly owned by a user.\n                    SELECT projects.id\n                    FROM projects\n                    INNER JOIN namespaces ON projects.namespace_id = namespaces.id\n                    WHERE namespaces.owner_id = 209240\n                    AND namespaces.type IS NULL\n\n                    UNION\n\n                    -- All projects of the groups a user is a member of\n                    SELECT projects.id\n                    FROM projects\n                    INNER JOIN namespaces ON projects.namespace_id = namespaces.id\n                    INNER JOIN members ON namespaces.id = members.source_id\n                    WHERE namespaces.type IN ('Group')\n                    AND members.type IN ('GroupMember')\n                    AND members.source_type = 'Namespace'\n                    AND members.user_id = 209240\n\n                    UNION\n\n                    -- All projects (that don't belong to one of the groups of a\n                    -- user) a user is a member of\n                    SELECT projects.id\n                    FROM projects\n                    INNER JOIN members ON projects.id = members.source_id\n                    WHERE members.type IN ('ProjectMember')\n                    AND members.source_type = 'Project'\n                    AND members.user_id = 209240\n                )\n\n                UNION\n\n                -- All publicly available projects, regardless of whether we still\n                -- have access or not.\n                SELECT projects.id\n                FROM projects\n                WHERE projects.visibility_level IN (20, 10)\n            )\n        )\n        ORDER BY events.id DESC;\n\n    This particular query is quite the behemoth but currently this is the\n    easiest\n\n    way of getting a list of events for projects a user has access to.\n\n\n    One of the bits of information provided by an Atom feed is a timestamp\n\n    indicating the time the feed was updated. This timestamp was generated using\n    the\n\n    method `Event.latest_update_time` which would take a collection of events\n    and\n\n    return the most recent update time. This method was implemented as the\n    following:\n\n        def latest_update_time\n          row = select(:updated_at, :project_id).reorder(id: :desc).take\n\n          row ? row.updated_at : nil\n        end\n\n    This method is broken up in two steps:\n\n\n    1. Order the collection in descending order, take the first record\n\n    2. If there was a record return the `updated_at` value, otherwise return\n    `nil`\n\n\n    This method was then used as the following in the Atom feed (here\n    `xml.updated`\n\n    would generate an `\u003Cupdated>` XML element):\n\n        xml.updated @events.latest_update_time.xmlschema if @events.any?\n\n    Performance of this method was less than stellar (the blue bars are the\n    timings\n\n    of `Event.latest_update_time`):\n\n\n    ![Event.latest_update_time Timings][mr2613-timings]\n\n\n    In this graph we can see the timings quite often hover around 10 seconds.\n    That's\n\n    10 seconds _just_ to get the latest update time from the database. Ouch!\n\n\n    At first I started messing around with using the SQL `max()` function\n    instead of\n\n    a combination of `ORDER BY` and `LIMIT 1`. We were using this in the past\n    and I\n\n    explicitly removed it because it was performing worse at the time. Since\n    quite a\n\n    bit changed since then I figured it was worth re-investigating the use of\n    this\n\n    function. The process of looking into this as well as my findings can be\n    found\n\n    in issue [12415](https://gitlab.com/gitlab-org/gitlab-ce/issues/12415).\n\n\n    A couple of days after I first started looking into this issue I realized\n    there\n\n    was a far easier solution to this problem. Since retrieving the list of\n    events\n\n    itself (without using the above code) is already quite fast and is already\n\n    sorted in the right order we can simply re-use this list. That is, we'd take\n    the\n\n    following steps:\n\n\n    1. Query the list of events.\n\n    2. Cast the list of events from an ActiveRecord query result to an Array\n    (this\n       is done anyway later on as we have to generate XML for every event).\n    3. Take the `updated_at` value of the first event in this list, if present.\n\n\n    This led to merge request\n\n    [\"Improve performance of retrieving last update times for events\"][mr2613].\n    This\n\n    merge request also contains a few other changes so certain records aren't\n    loaded\n\n    into memory when not needed, but the gist of it is that instead of this:\n\n        xml.updated @events.latest_update_time.xmlschema if @events.any?\n\n    We now use this:\n\n        xml.updated @events[0].updated_at.xmlschema if @events[0]\n\n    As a result of this the method `Event.latest_update_time` was no longer\n    needed\n\n    and thus was removed. This in turn drastically reduced the loading times of\n    all\n\n    Atom feeds (not just user feeds).\n\n\n    ## Only set autocrlf when creating/updating files\n\n\n    Git has an option called `core.autocrlf` which can be used to automatically\n\n    convert line endings in text files. This option can be set to 3 values:\n\n\n    1. `true`: CRLF line endings are always converted to LF line endings\n\n    2. `false`: no conversion takes place\n\n    3. `input`: converts CRLF line endings to LF upon committing changes\n\n\n    GitLab supports 3 ways of committing changes to a Git repository:\n\n\n    1. Via a Git client\n\n    2. Via the web editor\n\n    3. Via the API\n\n\n    In the last 2 cases we want to make sure CRLF line endings are replaced with\n    LF\n\n    line endings. For example, browsers use CRLF even on non Windows platforms.\n    To\n\n    take care of this our documentation recommends users to configure Git to set\n\n    `core.autocrlf` to `input`, however we still need to take care of this\n    ourselves\n\n    in case a user didn't configure Git to convert line endings by default. This\n\n    process took place in a method called `Repository#raw_repository` which was\n\n    implemented as the following:\n\n        def raw_repository\n          return nil unless path_with_namespace\n\n          @raw_repository ||= begin\n            repo = Gitlab::Git::Repository.new(path_to_repo)\n            repo.autocrlf = :input\n            repo\n          rescue Gitlab::Git::Repository::NoRepository\n            nil\n          end\n        end\n\n    This particular method is used in quite a number of places and is used on\n    almost\n\n    every (if not every) project-specific page (issues, milestones, the project\n\n    homepage, etc). Performance of this method was, well, bad:\n\n\n    ![Gitlab::Git::Repository#autocrlf= Timings][mr2859-bars]\n\n\n    This particular graph plots the 95th percentile of the method\n\n    `Gitlab::Git::Repository#autocrlf=` which is used to set the `core.autocrlf`\n\n    option. We can see that on average the 95th percentile hovers around 500\n\n    milliseconds. That's 500 milliseconds on almost every page to set a Git\n    option\n\n    that's already set 99% of the time. More importantly, that's 500\n    milliseconds of\n\n    time wasted on many pages where no changes are ever written to a Git\n    repository,\n\n    thus never using this option.\n\n\n    It's clear that we _don't_ want to run this on every page, especially when\n    the\n\n    option is not going to be used. However, we still have to make sure this\n    option\n\n    is set when we _do_ need it. At this point my first thought was to see the\n\n    overhead of always writing this option versus only writing this when\n    actually\n\n    needed. In Ruby code this would roughly translate to:\n\n        repo = Gitlab::Git::Repository.new(path_to_repo)\n\n        # Only set autocrlf to :input if it's not already set to :input\n        repo.autocrlf = :input unless repo.autocrlf == :input\n\n    The idea was that when sharing a disk over the network (e.g. via an NFS\n    server)\n\n    a read is probably much faster than a write. A write may also end up locking\n\n    files for the duration, possibly blocking other read operations. To test\n    this I\n\n    wrote a script that would perform said operation a number of times and write\n    the\n\n    timings to InfluxDB. This script is as the following:\n\n        require 'rugged'\n        require 'thread'\n        require 'benchmark'\n        require 'influxdb'\n\n        Thread.abort_on_exception = true\n\n        path = '/var/opt/gitlab/git-data/repositories/yorickpeterse/cat-pictures.git'\n        key  = 'core.autocrlf'\n        read = true\n\n        influx_options = { udp: { host: 'HOST', port: PORT } }\n\n        threads = 10.times.map do\n          Thread.new do\n            client = InfluxDB::Client.new(influx_options)\n\n            while read\n              time = Benchmark.measure do\n                repo = Rugged::Repository.new(path)\n\n                repo.config[key] = 'input' unless repo.config[key] == 'input'\n              end\n\n              ms = time.real * 1000\n\n              client.write_point('rugged_config_cas', values: { duration: ms })\n\n              sleep 0.05\n            end\n          end\n        end\n\n        sleep(120)\n\n        read = false\n\n        threads.each(&:join)\n\n        Rugged::Repository.new(path).config[key] = 'input'\n\n    Here HOST and PORT were replaced with the hostname and port number of our\n\n    InfluxDB server.\n\n\n    Running this script produced the following graph:\n\n\n    ![Timings for writing autocrlf when\n    needed](https://about.gitlab.com/images/making_gitlab_faster/autocrlf_write_when_needed.png)\n\n\n    Next I modified this script to simply always write the autocrlf option, this\n\n    produced the following graph:\n\n\n    ![Timings for always writing\n    autocrlf](https://about.gitlab.com/images/making_gitlab_faster/autocrlf_always_write.png)\n\n\n    Finally I modified the script to simply load the repository as-is, this\n    produced\n\n    the following graph:\n\n\n    ![Timings for only\n    reading](https://about.gitlab.com/images/making_gitlab_faster/autocrlf_read_only.png)\n\n\n    In all 3 cases we can see there's not really a clear difference in timings,\n\n    leading me to believe there's no particular benefit to only writing the\n    option\n\n    when not already set to \"input\".\n\n\n    I spent some more time trying out different things to see how they would\n    impact\n\n    performance but sadly didn't get much out of it. The details can be found in\n    the\n\n    various comments for [issue\n    13457](https://gitlab.com/gitlab-org/gitlab-ce/issues/13457).\n\n\n    A day later I and [Jacob Vosmaer][jacob] decided to double check the idea of\n\n    writing only when needed by applying a small patch to GitLab.com. This patch\n\n    modified `Repository#raw_repository` to the autocrlf option would only be\n\n    written when needed just like the script above. We also made sure to measure\n    the\n\n    timings of both reading and writing this option. After deploying this patch\n    and\n\n    waiting for about half an hour to get enough data the timings were as the\n\n    following:\n\n\n    ![autocrlf reads vs\n    writes](https://about.gitlab.com/images/making_gitlab_faster/autocrlf_reads_vs_writes.png)\n\n\n    This graph shows a nice drop in timings for writing the autocrlf option,\n    sadly\n\n    at the cost of an increase in timings for reading the autocrlf option. In\n    other\n\n    words, this change didn't actually solve anything but instead just moved the\n\n    problem from writing an option to just reading the option.\n\n\n    After discussing this with Jacob he suggested it may be an even better idea\n    to\n\n    only set this option where we actually need it to, instead of checking (and\n\n    potentially writing) it on every page that happens to use\n\n    `Repository#raw_repository`. After all, the best way to speed code up is to\n\n    remove it entirely (or at least as much as possible).\n\n\n    This lead to merge request\n\n    [\"Only set autocrlf when creating/updating files\"][mr2859] which does\n    exactly\n\n    that. The impact of this change can be seen in the following graph:\n\n\n    ![Merge Request Timings\n    Impact](https://about.gitlab.com/images/making_gitlab_faster/autocrlf_timings_impact.png)\n\n\n    This graph shows the 95th percentile, 99th percentile, and the mean per 30\n\n    minutes. The drop around the 20th is after the above merge request was\n    deployed\n\n    to GitLab.com. The changes in this merge request resulted in the timings\n    going\n\n    from between 70 milliseconds and 2.1 seconds to less than 15 milliseconds.\n\n\n    ## Conclusion\n\n\n    In this article I only highlighted 3 merge requests that made it into 8.5.0.\n    The\n\n    following performance related merge requests are also included in 8.5.0:\n\n\n    * [First pass at deleting projects in the\n    background](https://gitlab.com/gitlab-org/gitlab-ce/merge_requests/2569)\n\n    * [Background process note\n    logic](https://gitlab.com/gitlab-org/gitlab-ce/merge_requests/2631)\n\n    * [Page project list on\n    dashboard](https://gitlab.com/gitlab-org/gitlab-ce/merge_requests/2689)\n\n    * [Cache\n    BroadcastMessage.current](https://gitlab.com/gitlab-org/gitlab-ce/merge_requests/2633)\n\n    * [Smarter flushing of branch statistics\n    caches](https://gitlab.com/gitlab-org/gitlab-ce/merge_requests/2769)\n\n    * [Cache various Repository Git\n    operations](https://gitlab.com/gitlab-org/gitlab-ce/merge_requests/2752)\n\n    * [Dedicated method for counting commits between\n    refs](https://gitlab.com/gitlab-org/gitlab-ce/merge_requests/2707)\n\n\n    These are just a few of the performance changes we've made over the past few\n\n    months, and they certainly won't be the last as there's still a lot of work\n    to\n\n    be done.\n\n\n    [mr2625]: https://gitlab.com/gitlab-org/gitlab-ce/merge_requests/2625\n\n    [mr2625-timings]:\n    https://about.gitlab.com/images/making_gitlab_faster/merge_request_closes_issues.png\n\n    [mr2613]: https://gitlab.com/gitlab-org/gitlab-ce/merge_requests/2613\n\n    [mr2613-timings]:\n    https://about.gitlab.com/images/making_gitlab_faster/event_latest_update_time.png\n\n    [mr2859]: https://gitlab.com/gitlab-org/gitlab-ce/merge_requests/2859\n\n    [mr2859-bars]:\n    https://about.gitlab.com/images/making_gitlab_faster/gitlab_git_repository_autocrlf_bars.png\n\n    [monitoring]:\n    https://docs.gitlab.com/ce/monitoring/performance/introduction/\n\n    [influxdb]: https://influxdata.com/time-series-platform/influxdb/\n\n    [grafana]: http://grafana.org/\n\n    [rblineprof]: https://github.com/peek/peek-rblineprof\n\n    [benchmark-ips]: https://github.com/evanphx/benchmark-ips\n\n    [benchmark-ips-readme]:\n    https://github.com/evanphx/benchmark-ips/blob/master/README.md\n\n    [jacob]: https://gitlab.com/jacobvosmaer\n  category: engineering\nconfig:\n  slug: making-gitlab-faster\n  featured: false\n  template: BlogPost\n",{"title":5,"description":17,"ogTitle":5,"ogDescription":17,"noIndex":14,"ogImage":19,"ogUrl":28,"ogSiteName":29,"ogType":30,"canonicalUrls":28},"https://about.gitlab.com/blog/making-gitlab-faster","https://about.gitlab.com","article","en-us/blog/making-gitlab-faster",[],"MbTdVltAA2qtsCfggmn00DQn80ruS5SelYCbsXbhu2U",{"data":35},{"logo":36,"freeTrial":41,"sales":46,"login":51,"items":56,"search":365,"minimal":396,"duo":415,"switchNav":424,"pricingDeployment":435},{"config":37},{"href":38,"dataGaName":39,"dataGaLocation":40},"/","gitlab logo","header",{"text":42,"config":43},"Get free trial",{"href":44,"dataGaName":45,"dataGaLocation":40},"https://gitlab.com/-/trial_registrations/new?glm_source=about.gitlab.com&glm_content=default-saas-trial/","free trial",{"text":47,"config":48},"Talk to sales",{"href":49,"dataGaName":50,"dataGaLocation":40},"/sales/","sales",{"text":52,"config":53},"Sign in",{"href":54,"dataGaName":55,"dataGaLocation":40},"https://gitlab.com/users/sign_in/","sign in",[57,84,179,184,286,346],{"text":58,"config":59,"cards":61},"Platform",{"dataNavLevelOne":60},"platform",[62,68,76],{"title":58,"description":63,"link":64},"The intelligent orchestration platform for DevSecOps",{"text":65,"config":66},"Explore our Platform",{"href":67,"dataGaName":60,"dataGaLocation":40},"/platform/",{"title":69,"description":70,"link":71},"GitLab Duo Agent Platform","Agentic AI for the entire software lifecycle",{"text":72,"config":73},"Meet GitLab Duo",{"href":74,"dataGaName":75,"dataGaLocation":40},"/gitlab-duo-agent-platform/","gitlab duo agent platform",{"title":77,"description":78,"link":79},"Why GitLab","See the top reasons enterprises choose GitLab",{"text":80,"config":81},"Learn more",{"href":82,"dataGaName":83,"dataGaLocation":40},"/why-gitlab/","why gitlab",{"text":85,"left":24,"config":86,"link":88,"lists":92,"footer":161},"Product",{"dataNavLevelOne":87},"solutions",{"text":89,"config":90},"View all Solutions",{"href":91,"dataGaName":87,"dataGaLocation":40},"/solutions/",[93,117,140],{"title":94,"description":95,"link":96,"items":101},"Automation","CI/CD and automation to accelerate deployment",{"config":97},{"icon":98,"href":99,"dataGaName":100,"dataGaLocation":40},"AutomatedCodeAlt","/solutions/delivery-automation/","automated software delivery",[102,106,109,113],{"text":103,"config":104},"CI/CD",{"href":105,"dataGaLocation":40,"dataGaName":103},"/solutions/continuous-integration/",{"text":69,"config":107},{"href":74,"dataGaLocation":40,"dataGaName":108},"gitlab duo agent platform - product menu",{"text":110,"config":111},"Source Code Management",{"href":112,"dataGaLocation":40,"dataGaName":110},"/solutions/source-code-management/",{"text":114,"config":115},"Automated Software Delivery",{"href":99,"dataGaLocation":40,"dataGaName":116},"Automated software delivery",{"title":118,"description":119,"link":120,"items":125},"Security","Deliver code faster without compromising security",{"config":121},{"href":122,"dataGaName":123,"dataGaLocation":40,"icon":124},"/solutions/application-security-testing/","security and compliance","ShieldCheckLight",[126,130,135],{"text":127,"config":128},"Application Security Testing",{"href":122,"dataGaName":129,"dataGaLocation":40},"Application security testing",{"text":131,"config":132},"Software Supply Chain Security",{"href":133,"dataGaLocation":40,"dataGaName":134},"/solutions/supply-chain/","Software supply chain security",{"text":136,"config":137},"Software Compliance",{"href":138,"dataGaName":139,"dataGaLocation":40},"/solutions/software-compliance/","software compliance",{"title":141,"link":142,"items":147},"Measurement",{"config":143},{"icon":144,"href":145,"dataGaName":146,"dataGaLocation":40},"DigitalTransformation","/solutions/visibility-measurement/","visibility and measurement",[148,152,156],{"text":149,"config":150},"Visibility & Measurement",{"href":145,"dataGaLocation":40,"dataGaName":151},"Visibility and Measurement",{"text":153,"config":154},"Value Stream Management",{"href":155,"dataGaLocation":40,"dataGaName":153},"/solutions/value-stream-management/",{"text":157,"config":158},"Analytics & Insights",{"href":159,"dataGaLocation":40,"dataGaName":160},"/solutions/analytics-and-insights/","Analytics and insights",{"title":162,"items":163},"GitLab for",[164,169,174],{"text":165,"config":166},"Enterprise",{"href":167,"dataGaLocation":40,"dataGaName":168},"/enterprise/","enterprise",{"text":170,"config":171},"Small Business",{"href":172,"dataGaLocation":40,"dataGaName":173},"/small-business/","small business",{"text":175,"config":176},"Public Sector",{"href":177,"dataGaLocation":40,"dataGaName":178},"/solutions/public-sector/","public sector",{"text":180,"config":181},"Pricing",{"href":182,"dataGaName":183,"dataGaLocation":40,"dataNavLevelOne":183},"/pricing/","pricing",{"text":185,"config":186,"link":188,"lists":192,"feature":277},"Resources",{"dataNavLevelOne":187},"resources",{"text":189,"config":190},"View all resources",{"href":191,"dataGaName":187,"dataGaLocation":40},"/resources/",[193,226,249],{"title":194,"items":195},"Getting started",[196,201,206,211,216,221],{"text":197,"config":198},"Install",{"href":199,"dataGaName":200,"dataGaLocation":40},"/install/","install",{"text":202,"config":203},"Quick start guides",{"href":204,"dataGaName":205,"dataGaLocation":40},"/get-started/","quick setup checklists",{"text":207,"config":208},"Learn",{"href":209,"dataGaLocation":40,"dataGaName":210},"https://university.gitlab.com/","learn",{"text":212,"config":213},"Product documentation",{"href":214,"dataGaName":215,"dataGaLocation":40},"https://docs.gitlab.com/","product documentation",{"text":217,"config":218},"Best practice videos",{"href":219,"dataGaName":220,"dataGaLocation":40},"/getting-started-videos/","best practice videos",{"text":222,"config":223},"Integrations",{"href":224,"dataGaName":225,"dataGaLocation":40},"/integrations/","integrations",{"title":227,"items":228},"Discover",[229,234,239,244],{"text":230,"config":231},"Customer success stories",{"href":232,"dataGaName":233,"dataGaLocation":40},"/customers/","customer success stories",{"text":235,"config":236},"Blog",{"href":237,"dataGaName":238,"dataGaLocation":40},"/blog/","blog",{"text":240,"config":241},"The Source",{"href":242,"dataGaName":243,"dataGaLocation":40},"/the-source/","the source",{"text":245,"config":246},"Remote",{"href":247,"dataGaName":248,"dataGaLocation":40},"https://handbook.gitlab.com/handbook/company/culture/all-remote/","remote",{"title":250,"items":251},"Connect",[252,257,262,267,272],{"text":253,"config":254},"GitLab Services",{"href":255,"dataGaName":256,"dataGaLocation":40},"/services/","services",{"text":258,"config":259},"Community",{"href":260,"dataGaName":261,"dataGaLocation":40},"/community/","community",{"text":263,"config":264},"Forum",{"href":265,"dataGaName":266,"dataGaLocation":40},"https://forum.gitlab.com/","forum",{"text":268,"config":269},"Events",{"href":270,"dataGaName":271,"dataGaLocation":40},"/events/","events",{"text":273,"config":274},"Partners",{"href":275,"dataGaName":276,"dataGaLocation":40},"/partners/","partners",{"textColor":278,"title":279,"text":280,"link":281},"#000","What’s new in GitLab","Stay updated with our latest features and improvements.",{"text":282,"config":283},"Read the latest",{"href":284,"dataGaName":285,"dataGaLocation":40},"/releases/whats-new/","whats new",{"text":287,"config":288,"lists":290},"Company",{"dataNavLevelOne":289},"company",[291],{"items":292},[293,298,304,306,311,316,321,326,331,336,341],{"text":294,"config":295},"About",{"href":296,"dataGaName":297,"dataGaLocation":40},"/company/","about",{"text":299,"config":300,"footerGa":303},"Jobs",{"href":301,"dataGaName":302,"dataGaLocation":40},"/jobs/","jobs",{"dataGaName":302},{"text":268,"config":305},{"href":270,"dataGaName":271,"dataGaLocation":40},{"text":307,"config":308},"Leadership",{"href":309,"dataGaName":310,"dataGaLocation":40},"/company/team/e-group/","leadership",{"text":312,"config":313},"Team",{"href":314,"dataGaName":315,"dataGaLocation":40},"/company/team/","team",{"text":317,"config":318},"Handbook",{"href":319,"dataGaName":320,"dataGaLocation":40},"https://handbook.gitlab.com/","handbook",{"text":322,"config":323},"Investor relations",{"href":324,"dataGaName":325,"dataGaLocation":40},"https://ir.gitlab.com/","investor relations",{"text":327,"config":328},"Trust Center",{"href":329,"dataGaName":330,"dataGaLocation":40},"/security/","trust center",{"text":332,"config":333},"AI Transparency Center",{"href":334,"dataGaName":335,"dataGaLocation":40},"/ai-transparency-center/","ai transparency center",{"text":337,"config":338},"Newsletter",{"href":339,"dataGaName":340,"dataGaLocation":40},"/company/contact/#contact-forms","newsletter",{"text":342,"config":343},"Press",{"href":344,"dataGaName":345,"dataGaLocation":40},"/press/","press",{"text":347,"config":348,"lists":349},"Contact us",{"dataNavLevelOne":289},[350],{"items":351},[352,355,360],{"text":47,"config":353},{"href":49,"dataGaName":354,"dataGaLocation":40},"talk to sales",{"text":356,"config":357},"Support portal",{"href":358,"dataGaName":359,"dataGaLocation":40},"https://support.gitlab.com","support portal",{"text":361,"config":362},"Customer portal",{"href":363,"dataGaName":364,"dataGaLocation":40},"https://customers.gitlab.com/customers/sign_in/","customer portal",{"close":366,"login":367,"suggestions":374},"Close",{"text":368,"link":369},"To search repositories and projects, login to",{"text":370,"config":371},"gitlab.com",{"href":54,"dataGaName":372,"dataGaLocation":373},"search login","search",{"text":375,"default":376},"Suggestions",[377,379,383,385,389,393],{"text":69,"config":378},{"href":74,"dataGaName":69,"dataGaLocation":373},{"text":380,"config":381},"Code Suggestions (AI)",{"href":382,"dataGaName":380,"dataGaLocation":373},"/solutions/code-suggestions/",{"text":103,"config":384},{"href":105,"dataGaName":103,"dataGaLocation":373},{"text":386,"config":387},"GitLab on AWS",{"href":388,"dataGaName":386,"dataGaLocation":373},"/partners/technology-partners/aws/",{"text":390,"config":391},"GitLab on Google Cloud",{"href":392,"dataGaName":390,"dataGaLocation":373},"/partners/technology-partners/google-cloud-platform/",{"text":394,"config":395},"Why GitLab?",{"href":82,"dataGaName":394,"dataGaLocation":373},{"freeTrial":397,"mobileIcon":402,"desktopIcon":407,"secondaryButton":410},{"text":398,"config":399},"Start free trial",{"href":400,"dataGaName":45,"dataGaLocation":401},"https://gitlab.com/-/trials/new/","nav",{"altText":403,"config":404},"Gitlab Icon",{"src":405,"dataGaName":406,"dataGaLocation":401},"https://res.cloudinary.com/about-gitlab-com/image/upload/v1758203874/jypbw1jx72aexsoohd7x.svg","gitlab icon",{"altText":403,"config":408},{"src":409,"dataGaName":406,"dataGaLocation":401},"https://res.cloudinary.com/about-gitlab-com/image/upload/v1758203875/gs4c8p8opsgvflgkswz9.svg",{"text":411,"config":412},"Get Started",{"href":413,"dataGaName":414,"dataGaLocation":401},"https://gitlab.com/-/trial_registrations/new?glm_source=about.gitlab.com/get-started/","get started",{"freeTrial":416,"mobileIcon":420,"desktopIcon":422},{"text":417,"config":418},"Learn more about GitLab Duo",{"href":74,"dataGaName":419,"dataGaLocation":401},"gitlab duo",{"altText":403,"config":421},{"src":405,"dataGaName":406,"dataGaLocation":401},{"altText":403,"config":423},{"src":409,"dataGaName":406,"dataGaLocation":401},{"button":425,"mobileIcon":430,"desktopIcon":432},{"text":426,"config":427},"/switch",{"href":428,"dataGaName":429,"dataGaLocation":401},"#contact","switch",{"altText":403,"config":431},{"src":405,"dataGaName":406,"dataGaLocation":401},{"altText":403,"config":433},{"src":434,"dataGaName":406,"dataGaLocation":401},"https://res.cloudinary.com/about-gitlab-com/image/upload/v1773335277/ohhpiuoxoldryzrnhfrh.png",{"freeTrial":436,"mobileIcon":441,"desktopIcon":443},{"text":437,"config":438},"Back to pricing",{"href":182,"dataGaName":439,"dataGaLocation":401,"icon":440},"back to pricing","GoBack",{"altText":403,"config":442},{"src":405,"dataGaName":406,"dataGaLocation":401},{"altText":403,"config":444},{"src":409,"dataGaName":406,"dataGaLocation":401},{"title":446,"button":447,"config":452},"See how agentic AI transforms software delivery",{"text":448,"config":449},"Watch GitLab Transcend now",{"href":450,"dataGaName":451,"dataGaLocation":40},"/events/transcend/virtual/","transcend event",{"layout":453,"icon":454,"disabled":24},"release","AiStar",{"data":456},{"text":457,"source":458,"edit":464,"contribute":469,"config":474,"items":479,"minimal":686},"Git is a trademark of Software Freedom Conservancy and our use of 'GitLab' is under license",{"text":459,"config":460},"View page source",{"href":461,"dataGaName":462,"dataGaLocation":463},"https://gitlab.com/gitlab-com/marketing/digital-experience/about-gitlab-com/","page source","footer",{"text":465,"config":466},"Edit this page",{"href":467,"dataGaName":468,"dataGaLocation":463},"https://gitlab.com/gitlab-com/marketing/digital-experience/about-gitlab-com/-/blob/main/content/","web ide",{"text":470,"config":471},"Please contribute",{"href":472,"dataGaName":473,"dataGaLocation":463},"https://gitlab.com/gitlab-com/marketing/digital-experience/about-gitlab-com/-/blob/main/CONTRIBUTING.md/","please contribute",{"twitter":475,"facebook":476,"youtube":477,"linkedin":478},"https://twitter.com/gitlab","https://www.facebook.com/gitlab","https://www.youtube.com/channel/UCnMGQ8QHMAnVIsI3xJrihhg","https://www.linkedin.com/company/gitlab-com",[480,527,581,625,652],{"title":180,"links":481,"subMenu":496},[482,486,491],{"text":483,"config":484},"View plans",{"href":182,"dataGaName":485,"dataGaLocation":463},"view plans",{"text":487,"config":488},"Why Premium?",{"href":489,"dataGaName":490,"dataGaLocation":463},"/pricing/premium/","why premium",{"text":492,"config":493},"Why Ultimate?",{"href":494,"dataGaName":495,"dataGaLocation":463},"/pricing/ultimate/","why ultimate",[497],{"title":498,"links":499},"Contact Us",[500,503,505,507,512,517,522],{"text":501,"config":502},"Contact sales",{"href":49,"dataGaName":50,"dataGaLocation":463},{"text":356,"config":504},{"href":358,"dataGaName":359,"dataGaLocation":463},{"text":361,"config":506},{"href":363,"dataGaName":364,"dataGaLocation":463},{"text":508,"config":509},"Status",{"href":510,"dataGaName":511,"dataGaLocation":463},"https://status.gitlab.com/","status",{"text":513,"config":514},"Terms of use",{"href":515,"dataGaName":516,"dataGaLocation":463},"/terms/","terms of use",{"text":518,"config":519},"Privacy statement",{"href":520,"dataGaName":521,"dataGaLocation":463},"/privacy/","privacy statement",{"text":523,"config":524},"Cookie preferences",{"dataGaName":525,"dataGaLocation":463,"id":526,"isOneTrustButton":24},"cookie preferences","ot-sdk-btn",{"title":85,"links":528,"subMenu":537},[529,533],{"text":530,"config":531},"DevSecOps platform",{"href":67,"dataGaName":532,"dataGaLocation":463},"devsecops platform",{"text":534,"config":535},"AI-Assisted Development",{"href":74,"dataGaName":536,"dataGaLocation":463},"ai-assisted development",[538],{"title":539,"links":540},"Topics",[541,546,551,556,561,566,571,576],{"text":542,"config":543},"CICD",{"href":544,"dataGaName":545,"dataGaLocation":463},"/topics/ci-cd/","cicd",{"text":547,"config":548},"GitOps",{"href":549,"dataGaName":550,"dataGaLocation":463},"/topics/gitops/","gitops",{"text":552,"config":553},"DevOps",{"href":554,"dataGaName":555,"dataGaLocation":463},"/topics/devops/","devops",{"text":557,"config":558},"Version Control",{"href":559,"dataGaName":560,"dataGaLocation":463},"/topics/version-control/","version control",{"text":562,"config":563},"DevSecOps",{"href":564,"dataGaName":565,"dataGaLocation":463},"/topics/devsecops/","devsecops",{"text":567,"config":568},"Cloud Native",{"href":569,"dataGaName":570,"dataGaLocation":463},"/topics/cloud-native/","cloud native",{"text":572,"config":573},"AI for Coding",{"href":574,"dataGaName":575,"dataGaLocation":463},"/topics/devops/ai-for-coding/","ai for coding",{"text":577,"config":578},"Agentic AI",{"href":579,"dataGaName":580,"dataGaLocation":463},"/topics/agentic-ai/","agentic ai",{"title":582,"links":583},"Solutions",[584,586,588,593,597,600,604,607,609,612,615,620],{"text":127,"config":585},{"href":122,"dataGaName":127,"dataGaLocation":463},{"text":116,"config":587},{"href":99,"dataGaName":100,"dataGaLocation":463},{"text":589,"config":590},"Agile development",{"href":591,"dataGaName":592,"dataGaLocation":463},"/solutions/agile-delivery/","agile delivery",{"text":594,"config":595},"SCM",{"href":112,"dataGaName":596,"dataGaLocation":463},"source code management",{"text":542,"config":598},{"href":105,"dataGaName":599,"dataGaLocation":463},"continuous integration & delivery",{"text":601,"config":602},"Value stream management",{"href":155,"dataGaName":603,"dataGaLocation":463},"value stream management",{"text":547,"config":605},{"href":606,"dataGaName":550,"dataGaLocation":463},"/solutions/gitops/",{"text":165,"config":608},{"href":167,"dataGaName":168,"dataGaLocation":463},{"text":610,"config":611},"Small business",{"href":172,"dataGaName":173,"dataGaLocation":463},{"text":613,"config":614},"Public sector",{"href":177,"dataGaName":178,"dataGaLocation":463},{"text":616,"config":617},"Education",{"href":618,"dataGaName":619,"dataGaLocation":463},"/solutions/education/","education",{"text":621,"config":622},"Financial services",{"href":623,"dataGaName":624,"dataGaLocation":463},"/solutions/finance/","financial services",{"title":185,"links":626},[627,629,631,633,636,638,640,642,644,646,648,650],{"text":197,"config":628},{"href":199,"dataGaName":200,"dataGaLocation":463},{"text":202,"config":630},{"href":204,"dataGaName":205,"dataGaLocation":463},{"text":207,"config":632},{"href":209,"dataGaName":210,"dataGaLocation":463},{"text":212,"config":634},{"href":214,"dataGaName":635,"dataGaLocation":463},"docs",{"text":235,"config":637},{"href":237,"dataGaName":238,"dataGaLocation":463},{"text":230,"config":639},{"href":232,"dataGaName":233,"dataGaLocation":463},{"text":245,"config":641},{"href":247,"dataGaName":248,"dataGaLocation":463},{"text":253,"config":643},{"href":255,"dataGaName":256,"dataGaLocation":463},{"text":258,"config":645},{"href":260,"dataGaName":261,"dataGaLocation":463},{"text":263,"config":647},{"href":265,"dataGaName":266,"dataGaLocation":463},{"text":268,"config":649},{"href":270,"dataGaName":271,"dataGaLocation":463},{"text":273,"config":651},{"href":275,"dataGaName":276,"dataGaLocation":463},{"title":287,"links":653},[654,656,658,660,662,664,666,670,675,677,679,681],{"text":294,"config":655},{"href":296,"dataGaName":289,"dataGaLocation":463},{"text":299,"config":657},{"href":301,"dataGaName":302,"dataGaLocation":463},{"text":307,"config":659},{"href":309,"dataGaName":310,"dataGaLocation":463},{"text":312,"config":661},{"href":314,"dataGaName":315,"dataGaLocation":463},{"text":317,"config":663},{"href":319,"dataGaName":320,"dataGaLocation":463},{"text":322,"config":665},{"href":324,"dataGaName":325,"dataGaLocation":463},{"text":667,"config":668},"Sustainability",{"href":669,"dataGaName":667,"dataGaLocation":463},"/sustainability/",{"text":671,"config":672},"Diversity, inclusion and belonging (DIB)",{"href":673,"dataGaName":674,"dataGaLocation":463},"/diversity-inclusion-belonging/","Diversity, inclusion and belonging",{"text":327,"config":676},{"href":329,"dataGaName":330,"dataGaLocation":463},{"text":337,"config":678},{"href":339,"dataGaName":340,"dataGaLocation":463},{"text":342,"config":680},{"href":344,"dataGaName":345,"dataGaLocation":463},{"text":682,"config":683},"Modern Slavery Transparency Statement",{"href":684,"dataGaName":685,"dataGaLocation":463},"https://handbook.gitlab.com/handbook/legal/modern-slavery-act-transparency-statement/","modern slavery transparency statement",{"items":687},[688,691,694],{"text":689,"config":690},"Terms",{"href":515,"dataGaName":516,"dataGaLocation":463},{"text":692,"config":693},"Cookies",{"dataGaName":525,"dataGaLocation":463,"id":526,"isOneTrustButton":24},{"text":695,"config":696},"Privacy",{"href":520,"dataGaName":521,"dataGaLocation":463},[698],{"id":699,"title":9,"body":22,"config":700,"content":702,"description":22,"extension":21,"meta":706,"navigation":24,"path":707,"seo":708,"stem":709,"__hash__":710},"blogAuthors/en-us/blog/authors/yorick-peterse.yml",{"template":701},"BlogAuthor",{"name":9,"config":703},{"headshot":704,"ctfId":705},"https://res.cloudinary.com/about-gitlab-com/image/upload/v1749659488/Blog/Author%20Headshots/gitlab-logo-extra-whitespace.png","Yorick-Peterse",{},"/en-us/blog/authors/yorick-peterse",{},"en-us/blog/authors/yorick-peterse","VGbmFb88hdwYhr9lyGHjbwPUUHPQ6npgsw4txxlgB14",[712,726,740],{"content":713,"config":724},{"title":714,"description":715,"authors":716,"heroImage":718,"date":719,"body":720,"category":11,"tags":721},"How to build CI/CD observability at scale","This practical guide to GitLab pipeline analytics helps self-managed users gain operational insights using Prometheus and Grafana.",[717],"Paul Meresanu","https://res.cloudinary.com/about-gitlab-com/image/upload/v1774465167/n5hlvrsrheadeccyr1oz.png","2026-04-28","CI/CD optimization starts with visibility. Building a successful DevOps platform at enterprise scale **should include** understanding pipeline performance, job execution patterns, and quantifiable operational insights — especially for organizations running GitLab self-managed instances.\n\nTo help GitLab customers maximize their platform investments, we developed the GitLab CI/CD Observability solution as part of our Platform Excellence program, which transforms raw pipeline metrics into actionable operational insights.\n\nA leading financial services organization partnered with GitLab's customer success architect to gain visibility into their GitLab self-managed deployment. Together, we implemented a containerized observability solution combining the open-source gitlab-ci-pipelines-exporter with enterprise-grade Prometheus and Grafana infrastructure.\n\nIn this article, you'll learn the challenges they faced managing pipelines at scale and how GitLab CI/CD Observability addressed them with a practical, end-to-end implementation.\n\n## The challenge: Measuring CI/CD performance\nBefore implementing any observability solution, define your measurement landscape:\n*   **What metrics matter?** Pipeline duration, job success rates, queue times, runner utilization\n*   **Who needs visibility?** Developers, DevOps engineers, platform teams, leadership\n*   **What decisions will this drive?** Infrastructure investment, bottleneck remediation, capacity planning\n\n## Solution architecture: A full set of dashboards for observability\nOnce deployed, the observability stack provides a set of Grafana dashboards that give real-time and historical visibility into your CI/CD platform. A typical deployment includes:\n*   **Pipeline Overview Dashboard:** A top-level view showing total pipeline runs, success/failure rates over time (as stacked bar or time-series charts), and average pipeline duration trends. Panels use color-coded status indicators (green for success, red for failure, amber for cancelled) so platform teams can spot degradation at a glance.\n*   **Job Performance Dashboard:** Drill-down panels showing individual job duration distributions (histogram), the top 10 slowest jobs by average duration, and job failure heatmaps by project and stage. This is where teams identify specific bottleneck jobs worth optimizing.\n*   **Runner & Infrastructure Dashboard:** Combines Node Exporter host metrics (CPU, memory, disk) with pipeline queue-time data to correlate infrastructure saturation with pipeline wait times. Useful for capacity planning decisions such as scaling runner pools or upgrading instance sizes.\n*   **Deployment Frequency Dashboard:** Tracks deployment count and deployment duration over time per environment, aligned with DORA metrics. Helps engineering leadership assess delivery throughput and environment drift (commits behind main).\n\nEach dashboard is provisioned automatically via Grafana's file-based provisioning, so it deploys consistently across environments. The dashboards can be further customized with Grafana variables to filter by project, ref/branch, or time range.\n\n![Solution architecture](https://res.cloudinary.com/about-gitlab-com/image/upload/v1777382608/Blog/Imported/blog-building-ci-cd-observability-stack-for-gitlab-self-managed/image1.png)\n\nThe solution requires two exporters:\n*   **Pipeline Exporter:** Collects CI/CD metrics via GitLab API (pipeline duration, job status, deployments)\n*   **Node Exporter:** Collects host-level metrics (CPU, memory, disk) for infrastructure correlation\n\n**Prerequisites:**\n*   GitLab Self-Managed Version 18.1+\n*   **Container orchestration platform:** A Kubernetes cluster (recommended for enterprise deployments) or a container runtime such as Docker/Podman for smaller scale or proof-of-concept environments. The primary deployment guide below targets Kubernetes; a Docker Compose alternative is provided in the appendix for local testing and evaluation\n*   GitLab Personal Access Token (**read_api** scope)\n\n## Kubernetes deployment (recommended)\nFor enterprise environments, deploy each component as a separate Deployment within a dedicated namespace. This approach integrates with existing cluster infrastructure, secrets management, and network policies.\n\n### 1. Create namespace and secret\n```bash\nkubectl create namespace gitlab-observability\n\n# Create the GitLab token secret (see Secrets Management section below\n# for enterprise-grade approaches using external secret operators)\nkubectl create secret generic gitlab-token \\\n  --from-literal=token=glpat-xxxxxxxxxxxx \\\n  -n gitlab-observability\n```\n\n\n### 2. Deploy the Pipeline Exporter\n```yaml\n# exporter-deployment.yaml\napiVersion: apps/v1\nkind: Deployment\nmetadata:\n  name: gitlab-ci-pipelines-exporter\n  namespace: gitlab-observability\nspec:\n  replicas: 1\n  selector:\n    matchLabels:\n      app: gitlab-ci-pipelines-exporter\n  template:\n    metadata:\n      labels:\n        app: gitlab-ci-pipelines-exporter\n    spec:\n      containers:\n        - name: exporter\n          image: mvisonneau/gitlab-ci-pipelines-exporter:latest\n          ports:\n            - containerPort: 8080\n          env:\n            - name: GCPE_GITLAB_TOKEN\n              valueFrom:\n                secretKeyRef:\n                  name: gitlab-token\n                  key: token\n            - name: GCPE_CONFIG\n              value: /etc/gcpe/config.yml\n          volumeMounts:\n            - name: config\n              mountPath: /etc/gcpe\n      volumes:\n        - name: config\n          configMap:\n            name: gcpe-config\n---\napiVersion: v1\nkind: Service\nmetadata:\n  name: gitlab-ci-pipelines-exporter\n  namespace: gitlab-observability\nspec:\n  selector:\n    app: gitlab-ci-pipelines-exporter\n  ports:\n    - port: 8080\n      targetPort: 8080\n```\n\n### 3. Deploy Node Exporter (DaemonSet)\n```yaml\n# node-exporter-daemonset.yaml\napiVersion: apps/v1\nkind: DaemonSet\nmetadata:\n  name: node-exporter\n  namespace: gitlab-observability\nspec:\n  selector:\n    matchLabels:\n      app: node-exporter\n  template:\n    metadata:\n      labels:\n        app: node-exporter\n    spec:\n      containers:\n        - name: node-exporter\n          image: prom/node-exporter:latest\n          ports:\n            - containerPort: 9100\n---\napiVersion: v1\nkind: Service\nmetadata:\n  name: node-exporter\n  namespace: gitlab-observability\nspec:\n  selector:\n    app: node-exporter\n  ports:\n    - port: 9100\n      targetPort: 9100\n```\n\n### 4. Deploy Prometheus\n```yaml\n# prometheus-deployment.yaml\napiVersion: apps/v1\nkind: Deployment\nmetadata:\n  name: prometheus\n  namespace: gitlab-observability\nspec:\n  replicas: 1\n  selector:\n    matchLabels:\n      app: prometheus\n  template:\n    metadata:\n      labels:\n        app: prometheus\n    spec:\n      containers:\n        - name: prometheus\n          image: prom/prometheus:latest\n          ports:\n            - containerPort: 9090\n          volumeMounts:\n            - name: config\n              mountPath: /etc/prometheus\n      volumes:\n        - name: config\n          configMap:\n            name: prometheus-config\n---\napiVersion: v1\nkind: Service\nmetadata:\n  name: prometheus\n  namespace: gitlab-observability\nspec:\n  selector:\n    app: prometheus\n  ports:\n    - port: 9090\n      targetPort: 9090\n```\n\n### 5. Deploy Grafana\nThe Grafana deployment below starts with authentication disabled (`GF_AUTH_ANONYMOUS_ENABLED: true`) for initial setup convenience.\n\n**This setting allows anyone with network access to view all dashboards without logging in.** For production deployments, remove this variable or set it to false and configure a proper authentication provider (LDAP, SAML/SSO, or OAuth) to restrict access to authorized users.\n```yaml\n# grafana-deployment.yaml\napiVersion: apps/v1\nkind: Deployment\nmetadata:\n  name: grafana\n  namespace: gitlab-observability\nspec:\n  replicas: 1\n  selector:\n    matchLabels:\n      app: grafana\n  template:\n    metadata:\n      labels:\n        app: grafana\n    spec:\n      containers:\n        - name: grafana\n          image: grafana/grafana:10.0.0\n          ports:\n            - containerPort: 3000\n          env:\n            # REMOVE or set to 'false' for production.\n            # When 'true', any user with network access can\n            # view dashboards without authentication.\n            - name: GF_AUTH_ANONYMOUS_ENABLED\n              value: 'true'\n          volumeMounts:\n            - name: dashboards-provider\n              mountPath: /etc/grafana/provisioning/dashboards\n            - name: datasources\n              mountPath: /etc/grafana/provisioning/datasources\n            - name: dashboards\n              mountPath: /var/lib/grafana/dashboards\n      volumes:\n        - name: dashboards-provider\n          configMap:\n            name: grafana-dashboards-provider\n        - name: datasources\n          configMap:\n            name: grafana-datasources\n        - name: dashboards\n          configMap:\n            name: grafana-dashboards\n---\napiVersion: v1\nkind: Service\nmetadata:\n  name: grafana\n  namespace: gitlab-observability\nspec:\n  selector:\n    app: grafana\n  ports:\n    - port: 3000\n      targetPort: 3000\n```\n\n### 6. Set network policy\nRestrict inter-pod traffic to only the required communication paths:\n```yaml\n# network-policy.yaml\napiVersion: networking.k8s.io/v1\nkind: NetworkPolicy\nmetadata:\n  name: observability-policy\n  namespace: gitlab-observability\nspec:\n  podSelector: {}\n  policyTypes:\n    - Ingress\n  ingress:\n    # Prometheus scrapes exporter and node-exporter\n    - from:\n        - podSelector:\n            matchLabels:\n              app: prometheus\n      ports:\n        - port: 8080\n        - port: 9100\n    # Grafana queries Prometheus\n    - from:\n        - podSelector:\n            matchLabels:\n              app: grafana\n      ports:\n        - port: 9090\n```\n\n### 7. Validate\n```bash\nkubectl get pods -n gitlab-observability\nkubectl port-forward svc/grafana 3000:3000 -n gitlab-observability\ncurl http://localhost:3000/api/health\n```\n\n## Configuration reference\n### Exporter configuration\n```yaml\n# gitlab-ci-pipelines-exporter.yml (ConfigMap: gcpe-config)\nlog:\n  level: info\ngitlab:\n  url: https://gitlab.your-domain.com\n  maximum_requests_per_second: 10\nproject_defaults:\n  pull:\n    pipeline:\n      jobs:\n        enabled: true\nwildcards:\n  - owner:\n      name: your-group-name\n      kind: group\n    archived: false\n```\n\n### Prometheus configuration\n```yaml\n# prometheus.yml (ConfigMap: prometheus-config)\nglobal:\n  scrape_interval: 15s\nscrape_configs:\n  - job_name: 'gitlab-ci-pipelines-exporter'\n    static_configs:\n      - targets: ['gitlab-ci-pipelines-exporter:8080']\n  - job_name: 'node-exporter'\n    static_configs:\n      - targets: ['node-exporter:9100']\n```\n\n### Grafana data sources\n```yaml\n# datasources.yml (ConfigMap: grafana-datasources)\napiVersion: 1\ndatasources:\n  - name: Prometheus\n    type: prometheus\n    access: proxy\n    url: http://prometheus:9090\n    isDefault: true\n# dashboards.yml (ConfigMap: grafana-dashboards-provider)\napiVersion: 1\nproviders:\n  - name: 'default'\n    folder: 'GitLab CI/CD'\n    type: file\n    options:\n      path: /var/lib/grafana/dashboards\n```\n\n## Key metrics\n### Pipeline Exporter metrics\n| Metric | Description |\n| :---- | :---- |\n| `gitlab_ci_pipeline_duration_seconds` | Pipeline execution time |\n| `gitlab_ci_pipeline_status` | Pipeline success/failure by project |\n| `gitlab_ci_pipeline_job_duration_seconds` | Individual job execution time |\n| `gitlab_ci_pipeline_job_status` | Job success/failure status |\n| `gitlab_ci_pipeline_job_artifact_size_bytes` | Artifact storage consumption |\n| `gitlab_ci_pipeline_coverage` | Code coverage percentage |\n| `gitlab_ci_environment_deployment_count` | Deployment frequency |\n| `gitlab_ci_environment_deployment_duration_seconds` | Deployment execution time |\n| `gitlab_ci_environment_behind_commits_count` | Environment drift from main |\n\n### Node Exporter metrics\n| Metric | Description |\n| :---- | :---- |\n| `node_cpu_seconds_total` | CPU utilization |\n| `node_memory_MemAvailable_bytes` | Available memory |\n| `node_filesystem_avail_bytes` | Disk space available |\n| `node_load1` | 1-minute load average |\n\n## Troubleshooting\n### Air-gapped Grafana plugin installation\nFor offline environments, install plugins manually. Example for Kubernetes:\n```bash\n# Copy plugin zip into the Grafana pod\nkubectl cp grafana-polystat-panel-2.1.16.zip \\\n  gitlab-observability/grafana-\u003Cpod-id>:/tmp/\n# Extract plugin\nkubectl exec -it -n gitlab-observability deploy/grafana -- \\\n  sh -c \"unzip /tmp/grafana-polystat-panel-2.1.16.zip -d /var/lib/grafana/plugins/\"\n# Restart Grafana pod\nkubectl rollout restart deployment/grafana -n gitlab-observability\n# Verify installation\nkubectl exec -it -n gitlab-observability deploy/grafana -- \\\n  ls -al /var/lib/grafana/plugins/\n```\n\n## Enterprise considerations\nFor regulated industries, ensure:\n*   **Token security:** Store GitLab Personal Access Tokens in a dedicated secrets manager rather than hardcoded in ConfigMaps. Enforce token rotation policies and limit scope to **read\\_api** only.\n*   **Network segmentation:** Deploy behind a reverse proxy with TLS termination. In Kubernetes, use an Ingress controller with automated certificate provisioning.\n*   **Authentication:** Configure Grafana with your organization's identity provider (SAML, LDAP, or OAuth/OIDC) to enforce role-based access control on dashboards.\n\n## Why GitLab?\nGitLab's API-first design enables custom observability solutions that complement native capabilities like Value Stream Analytics and DORA metrics. The open architecture allows organizations to integrate proven open-source tooling — like the gitlab-ci-pipelines-exporter — directly with their existing enterprise infrastructure, without disrupting established workflows.\n\nAs your observability maturity grows, GitLab's built-in Observability capabilities provide a natural next step — offering deeper, integrated visibility without additional tooling. Learn more about what's available natively in the platform for [GitLab Observability](https://docs.gitlab.com/operations/observability/observability/).\n",[103,722,723],"product","tutorial",{"featured":14,"template":15,"slug":725},"how-to-build-ci-cd-observability-at-scale",{"content":727,"config":738},{"body":728,"title":729,"description":730,"authors":731,"heroImage":733,"date":734,"category":11,"tags":735},"Most CI/CD tools can run a build and ship a deployment. Where they diverge is what happens when your delivery needs get real: a monorepo with a dozen services, microservices spread across multiple repositories, deployments to dozens of environments, or a platform team trying to enforce standards without becoming a bottleneck.\n  \nGitLab's pipeline execution model was designed for that complexity. Parent-child pipelines, DAG execution, dynamic pipeline generation, multi-project triggers, merge request pipelines with merged results, and CI/CD Components each solve a distinct class of problems. Because they compose, understanding the full model unlocks something more than a faster pipeline. In this article, you'll learn about the five patterns where that model stands out, each mapped to a real engineering scenario with the configuration to match.\n  \nThe configs below are illustrative. The scripts use echo commands to keep the signal-to-noise ratio low. Swap them out for your actual build, test, and deploy steps and they are ready to use.\n\n\n## 1. Monorepos: Parent-child pipelines + DAG execution\n\n\nThe problem: Your monorepo has a frontend, a backend, and a docs site. Every commit triggers a full rebuild of everything, even when only a README changed.\n\n\nGitLab solves this with two complementary features: [parent-child pipelines](https://docs.gitlab.com/ci/pipelines/downstream_pipelines/#parent-child-pipelines) (which let a top-level pipeline spawn isolated sub-pipelines) and [DAG execution via `needs`](https://docs.gitlab.com/ci/yaml/#needs) (which breaks rigid stage-by-stage ordering and lets jobs start the moment their dependencies finish).\n\n\nA parent pipeline detects what changed and triggers only the relevant child pipelines:\n\n```yaml\n# .gitlab-ci.yml\nstages:\n  - trigger\n\ntrigger-services:\n  stage: trigger\n  trigger:\n    include:\n      - local: '.gitlab/ci/api-service.yml'\n      - local: '.gitlab/ci/web-service.yml'\n      - local: '.gitlab/ci/worker-service.yml'\n    strategy: depend\n```\n\n\nEach child pipeline is a fully independent pipeline with its own stages, jobs, and artifacts. The parent waits for all of them via [strategy: depend](https://docs.gitlab.com/ci/pipelines/downstream_pipelines/#wait-for-downstream-pipeline-to-complete) so you get a single green/red signal at the top level, with full drill-down into each service's pipeline. This organizational separation is the bigger win for large teams: each service owns its pipeline config, changes in one cannot break another, and the complexity stays manageable as the repo grows.\n\n\nOne thing worth knowing: when you pass [multiple files to a single `trigger: include:`](https://docs.gitlab.com/ci/pipelines/downstream_pipelines/#combine-multiple-child-pipeline-configuration-files), GitLab merges them into a single child pipeline configuration. This means jobs defined across those files share the same pipeline context and can reference each other with `needs:`, which is what makes the DAG optimization possible. If you split them into separate trigger jobs instead, each would be its own isolated pipeline and cross-file `needs:` references would not work.\n\n\nCombine this with `needs:` inside each child pipeline and you get DAG execution. Your integration tests can start the moment the build finishes, without waiting for other jobs in the same stage.\n\n```yaml\n# .gitlab/ci/api-service.yml\nstages:\n  - build\n  - test\n\nbuild-api:\n  stage: build\n  script:\n    - echo \"Building API service\"\n\ntest-api:\n  stage: test\n  needs: [build-api]\n  script:\n    - echo \"Running API tests\"\n```\n\n\nWhy it matters: Teams with large monorepos typically report significant reductions in pipeline runtime after switching to DAG execution, since jobs no longer wait on unrelated work in the same stage. Parent-child pipelines add the organizational layer that keeps the configuration maintainable as the repo and team grow.\n\n![Local downstream pipelines](https://res.cloudinary.com/about-gitlab-com/image/upload/v1775738759/Blog/Imported/hackathon-fake-blog-post-s/image3_vwj3rz.png \"Local downstream pipelines\")\n\n## 2. Microservices: Cross-repo, multi-project pipelines\n\n\nThe problem: Your frontend lives in one repo, your backend in another. When the frontend team ships a change, they have no visibility into whether it broke the backend integration and vice versa.\n\n\nGitLab's [multi-project pipelines](https://docs.gitlab.com/ci/pipelines/downstream_pipelines/#multi-project-pipelines) let one project trigger a pipeline in a completely separate project and wait for the result. The triggering project gets a linked downstream pipeline right in its own pipeline view.\n\n\nThe frontend pipeline builds an API contract artifact and publishes it, then triggers the backend pipeline. The backend fetches that artifact directly using the [Jobs API](https://docs.gitlab.com/api/jobs/#download-a-single-artifact-file-from-specific-tag-or-branch) and validates it before allowing anything to proceed. If a breaking change is detected, the backend pipeline fails and the frontend pipeline fails with it.\n\n```yaml\n# frontend repo: .gitlab-ci.yml\nstages:\n  - build\n  - test\n  - trigger-backend\n\nbuild-frontend:\n  stage: build\n  script:\n    - echo \"Building frontend and generating API contract...\"\n    - mkdir -p dist\n    - |\n      echo '{\n        \"api_version\": \"v2\",\n        \"breaking_changes\": false\n      }' > dist/api-contract.json\n    - cat dist/api-contract.json\n  artifacts:\n    paths:\n      - dist/api-contract.json\n    expire_in: 1 hour\n\ntest-frontend:\n  stage: test\n  script:\n    - echo \"All frontend tests passed!\"\n\ntrigger-backend-pipeline:\n  stage: trigger-backend\n  trigger:\n    project: my-org/backend-service\n    branch: main\n    strategy: depend\n  rules:\n    - if: $CI_COMMIT_BRANCH == \"main\"\n```\n\n```yaml\n# backend repo: .gitlab-ci.yml\nstages:\n  - build\n  - test\n\nbuild-backend:\n  stage: build\n  script:\n    - echo \"All backend tests passed!\"\n\nintegration-test:\n  stage: test\n  rules:\n    - if: $CI_PIPELINE_SOURCE == \"pipeline\"\n  script:\n    - echo \"Fetching API contract from frontend...\"\n    - |\n      curl --silent --fail \\\n        --header \"JOB-TOKEN: $CI_JOB_TOKEN\" \\\n        --output api-contract.json \\\n        \"${CI_API_V4_URL}/projects/${FRONTEND_PROJECT_ID}/jobs/artifacts/main/raw/dist/api-contract.json?job=build-frontend\"\n    - cat api-contract.json\n    - |\n      if grep -q '\"breaking_changes\": true' api-contract.json; then\n        echo \"FAIL: Breaking API changes detected - backend integration blocked!\"\n        exit 1\n      fi\n      echo \"PASS: API contract is compatible!\"\n```\n\n\nA few things worth noting in this config. The `integration-test` job uses `$CI_PIPELINE_SOURCE == \"pipeline\"` to ensure it only runs when triggered by an upstream pipeline, not on a standalone push to the backend repo. The frontend project ID is referenced via `$FRONTEND_PROJECT_ID`, which should be set as a [CI/CD variable](https://docs.gitlab.com/ci/variables/) in the backend project settings to avoid hardcoding it.\n\n\nWhy it matters: Cross-service breakage that previously surfaced in production gets caught in the pipeline instead. The dependency between services stops being invisible and becomes something teams can see, track, and act on.\n\n\n![Cross-project pipelines](https://res.cloudinary.com/about-gitlab-com/image/upload/v1775738762/Blog/Imported/hackathon-fake-blog-post-s/image4_h6mfsb.png \"Cross-project pipelines\")\n\n\n## 3. Multi-tenant / matrix deployments: Dynamic child pipelines\n\n\nThe problem: You deploy the same application to 15 customer environments, or three cloud regions, or dev/staging/prod. Updating a deploy stage across all of them one by one is the kind of work that leads to configuration drift. Writing a separate pipeline for each environment is unmaintainable from day one.\n\n\nGitLab's [dynamic child pipelines](https://docs.gitlab.com/ci/pipelines/downstream_pipelines/#dynamic-child-pipelines) let you generate a pipeline at runtime. A job runs a script that produces a YAML file, and that YAML becomes the pipeline for the next stage. The pipeline structure itself becomes data.\n\n\n```yaml\n# .gitlab-ci.yml\nstages:\n  - generate\n  - trigger-environments\n\ngenerate-config:\n  stage: generate\n  script:\n    - |\n      # ENVIRONMENTS can be passed as a CI variable or read from a config file.\n      # Default to dev, staging, prod if not set.\n      ENVIRONMENTS=${ENVIRONMENTS:-\"dev staging prod\"}\n      for ENV in $ENVIRONMENTS; do\n        cat > ${ENV}-pipeline.yml \u003C\u003C EOF\n      stages:\n        - deploy\n        - verify\n      deploy-${ENV}:\n        stage: deploy\n        script:\n          - echo \"Deploying to ${ENV} environment\"\n      verify-${ENV}:\n        stage: verify\n        script:\n          - echo \"Running smoke tests on ${ENV}\"\n      EOF\n      done\n  artifacts:\n    paths:\n      - \"*.yml\"\n    exclude:\n      - \".gitlab-ci.yml\"\n\n.trigger-template:\n  stage: trigger-environments\n  trigger:\n    strategy: depend\n\ntrigger-dev:\n  extends: .trigger-template\n  trigger:\n    include:\n      - artifact: dev-pipeline.yml\n        job: generate-config\n\ntrigger-staging:\n  extends: .trigger-template\n  needs: [trigger-dev]\n  trigger:\n    include:\n      - artifact: staging-pipeline.yml\n        job: generate-config\n\ntrigger-prod:\n  extends: .trigger-template\n  needs: [trigger-staging]\n  trigger:\n    include:\n      - artifact: prod-pipeline.yml\n        job: generate-config\n  when: manual\n```\n\n\nThe generation script loops over an `ENVIRONMENTS` variable rather than hardcoding each environment separately. Pass in a different list via a CI variable or read it from a config file and the pipeline adapts without touching the YAML. The trigger jobs use [extends:](https://docs.gitlab.com/ci/yaml/#extends) to inherit shared configuration from `.trigger-template`, so `strategy: depend` is defined once rather than repeated on every trigger job. Add a new environment by updating the variable, not by duplicating pipeline config. Add [when: manual](https://docs.gitlab.com/ci/yaml/#when) to the production trigger and you get a promotion gate baked right into the pipeline graph.\n\n\nWhy it matters: SaaS companies and platform teams use this pattern to manage dozens of environments without duplicating pipeline logic. The pipeline structure itself stays lean as the deployment matrix grows.\n\n\n![Dynamic pipeline](https://res.cloudinary.com/about-gitlab-com/image/upload/v1775738765/Blog/Imported/hackathon-fake-blog-post-s/image7_wr0kx2.png \"Dynamic pipeline\")\n\n\n## 4. MR-first delivery: Merge request pipelines, merged results, and workflow routing\n\n\nThe problem: Your pipeline runs on every push to every branch. Expensive tests run on feature branches that will never merge. Meanwhile, you have no guarantee that what you tested is actually what will land on `main` after a merge.\n\n\nGitLab has three interlocking features that solve this together:\n\n\n*   [Merge request pipelines](https://docs.gitlab.com/ci/pipelines/merge_request_pipelines/) run only when a merge request exists, not on every branch push. This alone eliminates a significant amount of wasted compute.\n\n*   [Merged results pipelines](https://docs.gitlab.com/ci/pipelines/merged_results_pipelines/) go further. GitLab creates a temporary merge commit (your branch plus the current target branch) and runs the pipeline against that. You are testing what will actually exist after the merge, not just your branch in isolation.\n\n*   [Workflow rules](https://docs.gitlab.com/ci/yaml/workflow/) let you define exactly which pipeline type runs under which conditions and suppress everything else. The `$CI_OPEN_MERGE_REQUESTS` guard below prevents duplicate pipelines firing for both a branch and its open MR simultaneously.\n\n\nWith those three working together, here is what a tiered pipeline looks like:\n\n```yaml\n# .gitlab-ci.yml\nworkflow:\n  rules:\n    - if: $CI_PIPELINE_SOURCE == \"merge_request_event\"\n    - if: $CI_COMMIT_BRANCH && $CI_OPEN_MERGE_REQUESTS\n      when: never\n    - if: $CI_COMMIT_BRANCH\n    - if: $CI_PIPELINE_SOURCE == \"schedule\"\n\nstages:\n  - fast-checks\n  - expensive-tests\n  - deploy\n\nlint-code:\n  stage: fast-checks\n  script:\n    - echo \"Running linter\"\n  rules:\n    - if: $CI_PIPELINE_SOURCE == \"push\"\n    - if: $CI_PIPELINE_SOURCE == \"merge_request_event\"\n    - if: $CI_COMMIT_BRANCH == \"main\"\n\nunit-tests:\n  stage: fast-checks\n  script:\n    - echo \"Running unit tests\"\n  rules:\n    - if: $CI_PIPELINE_SOURCE == \"push\"\n    - if: $CI_PIPELINE_SOURCE == \"merge_request_event\"\n    - if: $CI_COMMIT_BRANCH == \"main\"\n\nintegration-tests:\n  stage: expensive-tests\n  script:\n    - echo \"Running integration tests (15 min)\"\n  rules:\n    - if: $CI_PIPELINE_SOURCE == \"merge_request_event\"\n    - if: $CI_COMMIT_BRANCH == \"main\"\n\ne2e-tests:\n  stage: expensive-tests\n  script:\n    - echo \"Running E2E tests (30 min)\"\n  rules:\n    - if: $CI_PIPELINE_SOURCE == \"merge_request_event\"\n    - if: $CI_COMMIT_BRANCH == \"main\"\n\nnightly-comprehensive-scan:\n  stage: expensive-tests\n  script:\n    - echo \"Running full nightly suite (2 hours)\"\n  rules:\n    - if: $CI_PIPELINE_SOURCE == \"schedule\"\n\ndeploy-production:\n  stage: deploy\n  script:\n    - echo \"Deploying to production\"\n  rules:\n    - if: $CI_COMMIT_BRANCH == \"main\"\n      when: manual\n```\n\nWith this setup, the pipeline behaves differently depending on context. A push to a feature branch with no open MR runs lint and unit tests only. Once an MR is opened, the workflow rules switch from a branch pipeline to an MR pipeline, and the full integration and E2E suite runs against the merged result. Merging to `main` queues a manual production deployment. A nightly schedule runs the comprehensive scan once, not on every commit.\n\n\nWhy it matters: Teams routinely cut CI costs significantly with this pattern, not by running fewer tests, but by running the right tests at the right time. Merged results pipelines catch the class of bugs that only appear after a merge, before they ever reach `main`.\n\n\n![Conditional pipelines (within a branch with no MR)](https://res.cloudinary.com/about-gitlab-com/image/upload/v1775738768/Blog/Imported/hackathon-fake-blog-post-s/image6_dnfcny.png \"Conditional pipelines (within a branch with no MR)\")\n\n\n\n![Conditional pipelines (within an MR)](https://res.cloudinary.com/about-gitlab-com/image/upload/v1775738772/Blog/Imported/hackathon-fake-blog-post-s/image1_wyiafu.png \"Conditional pipelines (within an MR)\")\n\n\n\n![Conditional pipelines (on the main branch)](https://res.cloudinary.com/about-gitlab-com/image/upload/v1775738774/Blog/Imported/hackathon-fake-blog-post-s/image5_r6lkfd.png \"Conditional pipelines (on the main branch)\")\n\n## 5. Governed pipelines: CI/CD Components\n\n\nThe problem: Your platform team has defined the right way to build, test, and deploy. But every team has their own `.gitlab-ci.yml` with subtle variations. Security scanning gets skipped. Deployment standards drift. Audits are painful.\n\n\nGitLab [CI/CD Components](https://docs.gitlab.com/ci/components/) let platform teams publish versioned, reusable pipeline building blocks. Application teams consume them with a single `include:` line and optional inputs — no copy-paste, no drift. Components are discoverable through the [CI/CD Catalog](https://docs.gitlab.com/ci/components/#cicd-catalog), which means teams can find and adopt approved building blocks without needing to go through the platform team directly.\n\n\nHere is a component definition from a shared library:\n\n```yaml\n# templates/deploy.yml\nspec:\n  inputs:\n    stage:\n      default: deploy\n    environment:\n      default: production\n---\ndeploy-job:\n  stage: $[[ inputs.stage ]]\n  script:\n    - echo \"Deploying $APP_NAME to $[[ inputs.environment ]]\"\n    - echo \"Deploy URL: $DEPLOY_URL\"\n  environment:\n    name: $[[ inputs.environment ]]\n```\nAnd here is how an application team consumes it:\n\n```yaml\n# Application repo: .gitlab-ci.yml\nvariables:\n  APP_NAME: \"my-awesome-app\"\n  DEPLOY_URL: \"https://api.example.com\"\n\ninclude:\n  - component: gitlab.com/my-org/component-library/build@v1.0.6\n  - component: gitlab.com/my-org/component-library/test@v1.0.6\n  - component: gitlab.com/my-org/component-library/deploy@v1.0.6\n    inputs:\n      environment: staging\n\nstages:\n  - build\n  - test\n  - deploy\n```\n\nThree lines of `include:` replace hundreds of lines of duplicated YAML. The platform team can push a security fix to `v1.0.7` and teams opt in on their own schedule — or the platform team can pin everyone to a minimum version. Either way, one change propagates everywhere instead of needing to be applied repo by repo.\n\n\nPair this with [resource groups](https://docs.gitlab.com/ci/resource_groups/) to prevent concurrent deployments to the same environment, and [protected environments](https://docs.gitlab.com/ci/environments/protected_environments/) to enforce approval gates - and you have a governed delivery platform where compliance is the default, not the exception.\n\n\nWhy it matters: This is the pattern that makes GitLab CI/CD scale across hundreds of teams. Platform engineering teams enforce compliance without becoming a bottleneck. Application teams get a fast path to a working pipeline without reinventing the wheel.\n\n\n![Component pipeline (imported jobs)](https://res.cloudinary.com/about-gitlab-com/image/upload/v1775738776/Blog/Imported/hackathon-fake-blog-post-s/image2_pizuxd.png \"Component pipeline (imported jobs)\")\n\n## Putting it all together\n\nNone of these features exist in isolation. The reason GitLab's pipeline model is worth understanding deeply is that these primitives compose:\n\n*   A monorepo uses parent-child pipelines, and each child uses DAG execution\n\n*   A microservices platform uses multi-project pipelines, and each project uses MR pipelines with merged results\n\n*   A governed platform uses CI/CD components to standardize the patterns above across every team\n\n\nMost teams discover one of these features when they hit a specific pain point. The ones who invest in understanding the full model end up with a delivery system that actually reflects how their engineering organization works, not a pipeline that fights it.\n\n## Other patterns worth exploring\n\n\nThe five patterns above cover the most common structural pain points, but GitLab's pipeline model goes further. A few others worth looking into as your needs grow:\n\n\n*   [Review apps with dynamic environments](https://docs.gitlab.com/ci/environments/) let you spin up a live preview for every feature branch and tear it down automatically when the MR closes. Useful for teams doing frontend work or API changes that need stakeholder sign-off before merging.\n\n*   [Caching and artifact strategies](https://docs.gitlab.com/ci/caching/) are often the fastest way to cut pipeline runtime after the structural work is done. Structuring `cache:` keys around dependency lockfiles and being deliberate about what gets passed between jobs with [artifacts:](https://docs.gitlab.com/ci/yaml/#artifacts) can make a significant difference without changing your pipeline shape at all.\n\n*   [Scheduled and API-triggered pipelines](https://docs.gitlab.com/ci/pipelines/schedules/) are worth knowing about because not everything should run on a code push. Nightly security scans, compliance reports, and release automation are better modeled as scheduled or [API-triggered](https://docs.gitlab.com/ci/triggers/) pipelines with `$CI_PIPELINE_SOURCE` routing the right jobs for each context.\n\n## How to get started\n\nModern software delivery is complex. Teams are managing monorepos with dozens of services, coordinating across multiple repositories, deploying to many environments at once, and trying to keep standards consistent as organizations grow. GitLab's pipeline model was built with all of that in mind.\n\nWhat makes it worth investing time in is how well the pieces fit together. Parent-child pipelines bring structure to large codebases. Multi-project pipelines make cross-team dependencies visible and testable. Dynamic pipelines turn environment management into something that scales gracefully. MR-first delivery with merged results ensures confidence at every step of the review process. And CI/CD Components give platform teams a way to share best practices across an entire organization without becoming a bottleneck.\n\nEach of these features is powerful on its own, and even more so when combined. GitLab gives you the building blocks to design a delivery system that fits how your team actually works, and grows with you as your needs evolve.\n\n> [Start a free trial of GitLab Ultimate](https://about.gitlab.com/free-trial/) to use pipeline logic today.\n\n## Read more\n\n*   [Variable and artifact sharing in GitLab parent-child pipelines](https://about.gitlab.com/blog/variable-and-artifact-sharing-in-gitlab-parent-child-pipelines/)\n*   [CI/CD inputs: Secure and preferred method to pass parameters to a pipeline](https://about.gitlab.com/blog/ci-cd-inputs-secure-and-preferred-method-to-pass-parameters-to-a-pipeline/)\n*   [Tutorial: How to set up your first GitLab CI/CD component](https://about.gitlab.com/blog/tutorial-how-to-set-up-your-first-gitlab-ci-cd-component/)\n*   [How to include file references in your CI/CD components](https://about.gitlab.com/blog/how-to-include-file-references-in-your-ci-cd-components/)\n*   [FAQ: GitLab CI/CD Catalog](https://about.gitlab.com/blog/faq-gitlab-ci-cd-catalog/)\n*   [Building a GitLab CI/CD pipeline for a monorepo the easy way](https://about.gitlab.com/blog/building-a-gitlab-ci-cd-pipeline-for-a-monorepo-the-easy-way/)\n*   [A CI/CD component builder's journey](https://about.gitlab.com/blog/a-ci-component-builders-journey/)\n*   [CI/CD Catalog goes GA: No more building pipelines from scratch](https://about.gitlab.com/blog/ci-cd-catalog-goes-ga-no-more-building-pipelines-from-scratch/)","5 ways GitLab pipeline logic solves real engineering problems","Learn how to scale CI/CD with composable patterns for monorepos, microservices, environments, and governance.",[732],"Omid Khan","https://res.cloudinary.com/about-gitlab-com/image/upload/v1772721753/frfsm1qfscwrmsyzj1qn.png","2026-04-09",[103,736,723,737],"DevOps platform","features",{"featured":24,"template":15,"slug":739},"5-ways-gitlab-pipeline-logic-solves-real-engineering-problems",{"content":741,"config":750},{"title":742,"description":743,"authors":744,"heroImage":746,"date":747,"body":748,"category":11,"tags":749},"How to use GitLab Container Virtual Registry with Docker Hardened Images","Learn how to simplify container image management with this step-by-step guide.",[745],"Tim Rizzi","https://res.cloudinary.com/about-gitlab-com/image/upload/v1772111172/mwhgbjawn62kymfwrhle.png","2026-03-12","If you're a platform engineer, you've probably had this conversation:\n  \n*\"Security says we need to use hardened base images.\"*\n\n*\"Great, where do I configure credentials for yet another registry?\"*\n\n*\"Also, how do we make sure everyone actually uses them?\"*\n\nOr this one:\n\n*\"Why are our builds so slow?\"*\n\n*\"We're pulling the same 500MB image from Docker Hub in every single job.\"*\n\n*\"Can't we just cache these somewhere?\"*\n\nI've been working on [Container Virtual Registry](https://docs.gitlab.com/user/packages/virtual_registry/container/) at GitLab specifically to solve these problems. It's a pull-through cache that sits in front of your upstream registries — Docker Hub, dhi.io (Docker Hardened Images), MCR, and Quay — and gives your teams a single endpoint to pull from. Images get cached on the first pull. Subsequent pulls come from the cache. Your developers don't need to know or care which upstream a particular image came from.\n\nThis article shows you how to set up Container Virtual Registry, specifically with Docker Hardened Images in mind, since that's a combination that makes a lot of sense for teams concerned about security and not making their developers' lives harder.\n\n## What problem are we actually solving?\n\nThe Platform teams I usually talk to manage container images across three to five registries:\n\n* **Docker Hub** for most base images\n* **dhi.io** for Docker Hardened Images (security-conscious workloads)\n* **MCR** for .NET and Azure tooling\n* **Quay.io** for Red Hat ecosystem stuff\n* **Internal registries** for proprietary images\n\nEach one has its own:\n\n* Authentication mechanism\n* Network latency characteristics\n* Way of organizing image paths\n\nYour CI/CD configs end up littered with registry-specific logic. Credential management becomes a project unto itself. And every pipeline job pulls the same base images over the network, even though they haven't changed in weeks.\n\nContainer Virtual Registry consolidates this. One registry URL. One authentication flow (GitLab's). Cached images are served from GitLab's infrastructure rather than traversing the internet each time.\n\n## How it works\n\nThe model is straightforward:\n\n```text\nYour pipeline pulls:\n  gitlab.com/virtual_registries/container/1000016/python:3.13\n\nVirtual registry checks:\n  1. Do I have this cached? → Return it\n  2. No? → Fetch from upstream, cache it, return it\n\n```\n\nYou configure upstreams in priority order. When a pull request comes in, the virtual registry checks each upstream until it finds the image. The result gets cached for a configurable period (default 24 hours).\n\n```text\n┌─────────────────────────────────────────────────────────┐\n│                    CI/CD Pipeline                       │\n│                          │                              │\n│                          ▼                              │\n│   gitlab.com/virtual_registries/container/\u003Cid>/image   │\n└─────────────────────────────────────────────────────────┘\n                           │\n                           ▼\n┌─────────────────────────────────────────────────────────┐\n│            Container Virtual Registry                   │\n│                                                         │\n│  Upstream 1: Docker Hub ────────────────┐               │\n│  Upstream 2: dhi.io (Hardened) ────────┐│               │\n│  Upstream 3: MCR ─────────────────────┐││               │\n│  Upstream 4: Quay.io ────────────────┐│││               │\n│                                      ││││               │\n│                    ┌─────────────────┴┴┴┴──┐            │\n│                    │        Cache          │            │\n│                    │  (manifests + layers) │            │\n│                    └───────────────────────┘            │\n└─────────────────────────────────────────────────────────┘\n```\n\n## Why this matters for Docker Hardened Images\n\n[Docker Hardened Images](https://docs.docker.com/dhi/) are great because of the minimal attack surface, near-zero CVEs, proper software bills of materials (SBOMs), and SLSA provenance. If you're evaluating base images for security-sensitive workloads, they should be on your list.\n\nBut adopting them creates the same operational friction as any new registry:\n\n* **Credential distribution**: You need to get Docker credentials to every system that pulls images from dhi.io.\n* **CI/CD changes**: Every pipeline needs to be updated to authenticate with dhi.io.\n* **Developer friction**: People need to remember to use the hardened variants.\n* **Visibility gap**: It's difficult to tell if teams are actually using hardened images vs. regular ones.\n\nVirtual registry addresses each of these:\n\n**Single credential**: Teams authenticate to GitLab. The virtual registry handles upstream authentication. You configure Docker credentials once, at the registry level, and they apply to all pulls.\n\n**No CI/CD changes per-team**: Point pipelines at your virtual registry. Done. The upstream configuration is centralized.\n\n**Gradual adoption**: Since images get cached with their full path, you can see in the cache what's being pulled. If someone's pulling `library/python:3.11` instead of the hardened variant, you'll know.\n\n**Audit trail**: The cache shows you exactly which images are in active use. Useful for compliance, useful for understanding what your fleet actually depends on.\n\n## Setting it up\n\nHere's a real setup using the Python client from this demo project.\n\n### Create the virtual registry\n\n```python\nfrom virtual_registry_client import VirtualRegistryClient\n\nclient = VirtualRegistryClient()\n\nregistry = client.create_virtual_registry(\n    group_id=\"785414\",  # Your top-level group ID\n    name=\"platform-images\",\n    description=\"Cached container images for platform teams\"\n)\n\nprint(f\"Registry ID: {registry['id']}\")\n# You'll need this ID for the pull URL\n```\n\n### Add Docker Hub as an upstream\n\nFor official images like Alpine, Python, etc.:\n\n```python\ndocker_upstream = client.create_upstream(\n    registry_id=registry['id'],\n    url=\"https://registry-1.docker.io\",\n    name=\"Docker Hub\",\n    cache_validity_hours=24\n)\n```\n\n### Add Docker Hardened Images (dhi.io)\n\nDocker Hardened Images are hosted on `dhi.io`, a separate registry that requires authentication:\n\n```python\ndhi_upstream = client.create_upstream(\n    registry_id=registry['id'],\n    url=\"https://dhi.io\",\n    name=\"Docker Hardened Images\",\n    username=\"your-docker-username\",\n    password=\"your-docker-access-token\",\n    cache_validity_hours=24\n)\n```\n\n### Add other upstreams\n\n```python\n# MCR for .NET teams\nclient.create_upstream(\n    registry_id=registry['id'],\n    url=\"https://mcr.microsoft.com\",\n    name=\"Microsoft Container Registry\",\n    cache_validity_hours=48\n)\n\n# Quay for Red Hat stuff\nclient.create_upstream(\n    registry_id=registry['id'],\n    url=\"https://quay.io\",\n    name=\"Quay.io\",\n    cache_validity_hours=24\n)\n```\n\n### Update your CI/CD\n\nHere's a `.gitlab-ci.yml` that pulls through the virtual registry:\n\n```yaml\nvariables:\n  VIRTUAL_REGISTRY_ID: \u003Cyour_virtual_registry_ID>\n\n  \nbuild:\n  image: docker:24\n  services:\n    - docker:24-dind\n  before_script:\n    # Authenticate to GitLab (which handles upstream auth for you)\n    - echo \"${CI_JOB_TOKEN}\" | docker login -u gitlab-ci-token --password-stdin gitlab.com\n  script:\n    # All of these go through your single virtual registry\n    \n    # Official Docker Hub images (use library/ prefix)\n    - docker pull gitlab.com/virtual_registries/container/${VIRTUAL_REGISTRY_ID}/library/alpine:latest\n    \n    # Docker Hardened Images from dhi.io (no prefix needed)\n    - docker pull gitlab.com/virtual_registries/container/${VIRTUAL_REGISTRY_ID}/python:3.13\n    \n    # .NET from MCR\n    - docker pull gitlab.com/virtual_registries/container/${VIRTUAL_REGISTRY_ID}/dotnet/sdk:8.0\n```\n\n### Image path formats\n\nDifferent registries use different path conventions:\n\n| Registry | Pull URL Example |\n|----------|------------------|\n| Docker Hub (official) | `.../library/python:3.11-slim` |\n| Docker Hardened Images (dhi.io) | `.../python:3.13` |\n| MCR | `.../dotnet/sdk:8.0` |\n| Quay.io | `.../prometheus/prometheus:latest` |\n\n### Verify it's working\n\nAfter some pulls, check your cache:\n\n```python\nupstreams = client.list_registry_upstreams(registry['id'])\nfor upstream in upstreams:\n    entries = client.list_cache_entries(upstream['id'])\n    print(f\"{upstream['name']}: {len(entries)} cached entries\")\n\n```\n\n## What the numbers look like\n\nI ran tests pulling images through the virtual registry:\n\n| Metric | Without Cache | With Warm Cache |\n|--------|---------------|-----------------|\n| Pull time (Alpine) | 10.3s | 4.2s |\n| Pull time (Python 3.13 DHI) | 11.6s | ~4s |\n| Network roundtrips to upstream | Every pull | Cache misses only |\n\n\n\n\nThe first pull is the same speed (it has to fetch from upstream). Every pull after that, for the cache validity period, comes straight from GitLab's storage. No network hop to Docker Hub, dhi.io, MCR, or wherever the image lives.\n\nFor a team running hundreds of pipeline jobs per day, that's hours of cumulative build time saved.\n\n## Practical considerations\nHere are some considerations to keep in mind:\n\n### Cache validity\n\n24 hours is the default. For security-sensitive images where you want patches quickly, consider 12 hours or less:\n\n```python\nclient.create_upstream(\n    registry_id=registry['id'],\n    url=\"https://dhi.io\",\n    name=\"Docker Hardened Images\",\n    username=\"your-username\",\n    password=\"your-token\",\n    cache_validity_hours=12\n)\n```\n\nFor stable, infrequently-updated images (like specific version tags), longer validity is fine.\n\n### Upstream priority\n\nUpstreams are checked in order. If you have images with the same name on different registries, the first matching upstream wins.\n\n### Limits\n\n* Maximum of 20 virtual registries per group\n* Maximum of 20 upstreams per virtual registry\n\n## Configuration via UI\n\nYou can also configure virtual registries and upstreams directly from the GitLab UI—no API calls required. Navigate to your group's **Settings > Packages and registries > Virtual Registry** to:\n\n* Create and manage virtual registries\n* Add, edit, and reorder upstream registries\n* View and manage the cache\n* Monitor which images are being pulled\n\n## What's next\n\nWe're actively developing:\n\n* **Allow/deny lists**: Use regex to control which images can be pulled from specific upstreams.\n\nThis is beta software. It works, people are using it in production, but we're still iterating based on feedback.\n\n## Share your feedback\n\nIf you're a platform engineer dealing with container registry sprawl, I'd like to understand your setup:\n\n* How many upstream registries are you managing?\n* What's your biggest pain point with the current state?\n* Would something like this help, and if not, what's missing?\n\nPlease share your experiences in the [Container Virtual Registry feedback issue](https://gitlab.com/gitlab-org/gitlab/-/work_items/589630).\n## Related resources\n- [New GitLab metrics and registry features help reduce CI/CD bottlenecks](https://about.gitlab.com/blog/new-gitlab-metrics-and-registry-features-help-reduce-ci-cd-bottlenecks/#container-virtual-registry)\n- [Container Virtual Registry documentation](https://docs.gitlab.com/user/packages/virtual_registry/container/)\n- [Container Virtual Registry API](https://docs.gitlab.com/api/container_virtual_registries/)",[723,722,737],{"featured":14,"template":15,"slug":751},"using-gitlab-container-virtual-registry-with-docker-hardened-images",{"promotions":753},[754,768,779,791],{"id":755,"categories":756,"header":758,"text":759,"button":760,"image":765},"ai-modernization",[757],"ai-ml","Is AI achieving its promise at scale?","Quiz will take 5 minutes or less",{"text":761,"config":762},"Get your AI maturity score",{"href":763,"dataGaName":764,"dataGaLocation":238},"/assessments/ai-modernization-assessment/","modernization assessment",{"config":766},{"src":767},"https://res.cloudinary.com/about-gitlab-com/image/upload/v1772138786/qix0m7kwnd8x2fh1zq49.png",{"id":769,"categories":770,"header":771,"text":759,"button":772,"image":776},"devops-modernization",[722,565],"Are you just managing tools or shipping innovation?",{"text":773,"config":774},"Get your DevOps maturity score",{"href":775,"dataGaName":764,"dataGaLocation":238},"/assessments/devops-modernization-assessment/",{"config":777},{"src":778},"https://res.cloudinary.com/about-gitlab-com/image/upload/v1772138785/eg818fmakweyuznttgid.png",{"id":780,"categories":781,"header":783,"text":759,"button":784,"image":788},"security-modernization",[782],"security","Are you trading speed for security?",{"text":785,"config":786},"Get your security maturity score",{"href":787,"dataGaName":764,"dataGaLocation":238},"/assessments/security-modernization-assessment/",{"config":789},{"src":790},"https://res.cloudinary.com/about-gitlab-com/image/upload/v1772138786/p4pbqd9nnjejg5ds6mdk.png",{"id":792,"paths":793,"header":796,"text":797,"button":798,"image":803},"github-azure-migration",[794,795],"migration-from-azure-devops-to-gitlab","integrating-azure-devops-scm-and-gitlab","Is your team ready for GitHub's Azure move?","GitHub is already rebuilding around Azure. Find out what it means for you.",{"text":799,"config":800},"See how GitLab compares to GitHub",{"href":801,"dataGaName":802,"dataGaLocation":238},"/compare/gitlab-vs-github/github-azure-migration/","github azure migration",{"config":804},{"src":778},{"header":806,"blurb":807,"button":808,"secondaryButton":813},"Start building faster today","See what your team can do with the intelligent orchestration platform for DevSecOps.\n",{"text":809,"config":810},"Get your free trial",{"href":811,"dataGaName":45,"dataGaLocation":812},"https://gitlab.com/-/trial_registrations/new?glm_content=default-saas-trial&glm_source=about.gitlab.com/","feature",{"text":501,"config":814},{"href":49,"dataGaName":50,"dataGaLocation":812},1777493631983]