[{"data":1,"prerenderedAt":835},["ShallowReactive",2],{"/en-us/blog/three-new-support-tools":3,"navigation-en-us":43,"banner-en-us":454,"footer-en-us":464,"blog-post-authors-en-us-Will Chandler|Sara Kassabian":706,"blog-related-posts-en-us-three-new-support-tools":732,"blog-promotions-en-us":772,"next-steps-en-us":825},{"id":4,"title":5,"authorSlugs":6,"authors":9,"body":12,"category":13,"categorySlug":13,"config":14,"content":18,"date":22,"description":19,"extension":27,"externalUrl":28,"featured":16,"heroImage":21,"isFeatured":16,"meta":29,"navigation":30,"path":31,"publishedDate":22,"rawbody":32,"seo":33,"slug":15,"stem":37,"tagSlugs":38,"tags":41,"template":17,"updatedDate":28,"__hash__":42},"blogPosts/en-us/blog/three-new-support-tools.yml","We've open sourced 3 tools to help troubleshoot system performance",[7,8],"will-chandler","sara-kassabian",[10,11],"Will Chandler","Sara Kassabian","Our self-managed customers often encounter issues related to performance, or the time it takes to execute something. In the past, the [Support team](https://handbook.gitlab.com/handbook/support/) had to pull data from disparate sources and cobble it together in order to analyze performance-related issues.\n\n“We’re dealing with someone else’s computer on support, so we have to be able to handle environments with limited observability,” says [Will Chandler](/company/team/#wchandler), senior support engineer. “We’re at the mercy of their infrastructure. That’s why the team has made tools to reduce the friction.”\n\n“With [GitLab.com](/pricing/), we have all of this fancy tooling that helps us collect performance data,” says [Lee Matos](/company/team/#leematos), support engineering manager. “But when we’re working with customers, we need to be ready to bring lightweight tools that don’t require a lot of setup that we can use based on what they have in place.”\n\nThe Support team is working on becoming more data driven by using three new tools designed to aggregate and summarize performance data for self-managed customers. A focus on data-driven decision-making improves the customer relationship and demonstrates our commitment to making performance a key feature of GitLab.\n\nWe'll look at three open source tools created by GitLab Self-Managed Support. Strace parser is a general tool that could be of use to anyone, while JSON Stats and GitLabSOS are tailored to GitLab, but could be easily modified.\n\n## 1. [Strace parser](https://gitlab.com/gitlab-com/support/toolbox/strace-parser)\n\n[Strace](https://gitlab.com/strace/strace) is a commonly used debugging and diagnostic tool in Linux that captures information about what’s happening inside processes running on our customers’ environments.\n\nUnlike [newer](http://man7.org/linux/man-pages/man1/perf.1.html) and [more powerful](https://github.com/iovisor/bpftrace) tracing tools, strace adds [significant overhead to a process](http://www.brendangregg.com/blog/2014-05-11/strace-wow-much-syscall.html). However, strace is generally available even on very old versions of Linux.\n\nAn strace of a single-threaded program is linear, but following the threads of execution quickly gets difficult when there are many processes being captured. At GitLab Support we are typically tracing [Unicorn](https://bogomips.org/unicorn/) workers or [Gitaly](https://gitlab.com/gitlab-org/gitaly), which are highly concurrent, resulting in hundreds of process IDs being traced and hundreds of thousands of lines of output from traces only a few seconds long.\n\nWill built [strace parser](https://gitlab.com/gitlab-com/support/toolbox/strace-parser) for these types of use cases. Strace parser summarizes the most meaningful processing data delivered by an strace in a more accessible format, allowing users to find the critical section sections of the data quickly.\n\nThe next two examples are from a GitLab customer that was using a very slow file system to host their .gitconfig file, which was a major performance bottleneck. But it was not immediately clear what was happening from the perspective of a user trying to troubleshoot. By running an strace on Gitaly, we were able to get a better understanding of why the system was so slow.\n\n```text\n3694  13:45:06.207369 clock_gettime(CLOCK_MONOTONIC, {3016230, 201254200}) = 0 \u003C0.000015>\n3694  13:45:06.207409 futex(0x7f645bb49664, FUTEX_WAIT_BITSET_PRIVATE, 192398, {3016230, 299906871}, ffffffff \u003Cunfinished ...>\n3542  13:45:06.209616 \u003C... futex resumed> ) = -1 ETIMEDOUT (Connection timed out) \u003C0.005236>\n3542  13:45:06.209639 futex(0x1084ff0, FUTEX_WAKE, 1) = 1 \u003C0.000023>\n3510  13:45:06.209673 \u003C... futex resumed> ) = 0 \u003C0.002909>\n3542  13:45:06.209701 futex(0xc420896548, FUTEX_WAKE, 1 \u003Cunfinished ...>\n3510  13:45:06.209710 pselect6(0, NULL, NULL, NULL, {0, 20000}, NULL \u003Cunfinished ...>\n16780 13:45:06.209740 \u003C... futex resumed> ) = 0 \u003C0.002984>\n3542  13:45:06.209749 \u003C... futex resumed> ) = 1 \u003C0.000043>\n16780 13:45:06.209776 pselect6(0, NULL, NULL, NULL, {0, 3000}, NULL \u003Cunfinished ...>\n3542  13:45:06.209787 futex(0xc420053548, FUTEX_WAKE, 1 \u003Cunfinished ...>\n16780 13:45:06.209839 \u003C... pselect6 resumed> ) = 0 (Timeout) \u003C0.000056>\n3544  13:45:06.209853 \u003C... futex resumed> ) = 0 \u003C0.003148>\n3542  13:45:06.209861 \u003C... futex resumed> ) = 1 \u003C0.000069>\n3510  13:45:06.209868 \u003C... pselect6 resumed> ) = 0 (Timeout) \u003C0.000151>\n3544  13:45:06.209915 epoll_ctl(4\u003Canon_inode:[eventpoll]>, EPOLL_CTL_DEL, 181\u003CUNIX:[164869291]>, 0xc42105bb14 \u003Cunfinished ...>\n16780 13:45:06.210076 write(1\u003Cpipe:[55447]>, \"time=\\\"2019-02-14T18:45:06Z\\\" level=warning msg=\\\"health check failed\\\" error=\\\"rpc error: code = DeadlineExceeded desc = context deadline exceeded\\\" worker.name=gitaly-ruby.4\\n\", 170 \u003Cunfinished ...>\n3544  13:45:06.210093 \u003C... epoll_ctl resumed> ) = 0 \u003C0.000053>\n3542  13:45:06.210101 futex(0x1089020, FUTEX_WAIT, 0, {0, 480025102} \u003Cunfinished ...>\n3510  13:45:06.210109 pselect6(0, NULL, NULL, NULL, {0, 20000}, NULL \u003Cunfinished ...>\n16780 13:45:06.210153 \u003C... write resumed> ) = 170 \u003C0.000064>\n3544  13:45:06.210163 close(181\u003CUNIX:[164869291]> \u003Cunfinished ...>\n```\n\nThis strace delivers more than 300,000 lines about the different Gitaly processes running on this customer’s GitLab environment, making it challenging to decipher the flow of execution.\n\n\n“In this case, we can use strace-parser to say, ‘Just give me all the files that were opened, and sort them by how long it took to open,’” says Will.\n\n```text\n$ strace-parser trace.txt files --sort duration\n\nFiles Opened\n\n      pid      dur (ms)       timestamp            error         file name\n  -------    ----------    ---------------    ---------------    ---------\n    24670      5203.999    13:45:16.152985           -           /efs/gitlab/home/.gitconfig\n    24859      5296.580    13:45:23.367482           -           /efs/gitlab/home/.gitconfig\n    24584      5279.810    13:45:09.286019           -           /efs/gitlab/home/.gitconfig\n    24666      5276.975    13:45:16.079697           -           /efs/gitlab/home/.gitconfig\n    24667      5255.649    13:45:16.101009           -           /efs/gitlab/home/.gitconfig\n    14871      2594.364    13:45:18.762347           -           /efs/gitlab/home/.gitconfig\n    24885      2440.635    13:45:26.224189           -           /efs/gitlab/home/.gitconfig\n    24886      2432.980    13:45:26.231009           -           /efs/gitlab/home/.gitconfig\n    24656        55.873    13:45:15.916836        ENOENT         /nfs/gitlab/gitdata/repositories/group/project.git/objects/info/alternates\n    24688        42.764    13:45:21.522789        ENOENT         /nfs/gitlab/gitdata/repositories/group/project.git/objects/info/alternates\n     3709        39.631    13:45:07.816618           -           /efs/gitlab/home/.gitconfig\n    24583        37.959    13:45:09.218283           -           /efs/gitlab/home/.gitconfig\n\n```\n\nBy summarizing the data in this way, we see multiple files that took 2-5 seconds to open, which is several orders of magnitude slower than expected.\n\n\n“If it’s a particularly busy server and we’re performing these actions 50 times a second, 100 times a second, that adds up really fast,” says Will. “Strace-Parser lets you drill down quickly, and say, ‘OK, this specific thing we’re doing is super slow.’”\n\n### Get a closer look at processes using strace-parser\n\nStrace-Parser can also be used to drill down into details of a process.\n\nThe previous output showed PID 24670 is one of the slower processes, so we use the parser to understand how this slow call impacted the performance of the process overall.\n\n```text\n$ strace-parser trace.txt pid 24670\n\nPID 24670\n\n  271 syscalls, active time: 5303.438ms, user time: 34.662ms, total time: 5338.100ms\n  start time: 13:45:16.116671    end time: 13:45:21.454771\n\n  syscall                 count    total (ms)      max (ms)      avg (ms)      min (ms)    errors\n  -----------------    --------    ----------    ----------    ----------    ----------    --------\n  open                       29      5223.073      5203.999       180.106         0.031    ENOENT: 9\n  read                       25        46.303        28.747         1.852         0.031\n  access                     11         6.948         4.131         0.632         0.056    ENOENT: 3\n  lstat                       6         5.116         2.130         0.853         0.077    ENOENT: 4\n  mmap                       32         3.868         0.485         0.121         0.028\n  openat                      2         3.757         2.934         1.878         0.823\n  fstat                      28         3.395         0.272         0.121         0.033\n  munmap                     11         2.551         0.929         0.232         0.056\n  rt_sigaction               59         2.548         0.121         0.043         0.024\n  close                      22         2.375         0.279         0.108         0.032\n  mprotect                   14         0.927         0.174         0.066         0.032\n  execve                      1         0.621         0.621         0.621         0.621\n  brk                         6         0.595         0.210         0.099         0.046\n  stat                        8         0.388         0.082         0.048         0.027    ENOENT: 3\n  getdents                    4         0.361         0.138         0.090         0.044\n  rt_sigprocmask              3         0.141         0.059         0.047         0.040\n  write                       1         0.101         0.101         0.101         0.101\n  dup2                        3         0.090         0.032         0.030         0.026\n  arch_prctl                  1         0.077         0.077         0.077         0.077\n  getrlimit                   1         0.062         0.062         0.062         0.062\n  getcwd                      1         0.044         0.044         0.044         0.044\n  set_robust_list             1         0.035         0.035         0.035         0.035\n  set_tid_address             1         0.032         0.032         0.032         0.032\n  setpgid                     1         0.030         0.030         0.030         0.030\n  ---------------\n\n  Program Executed: /opt/gitlab/embedded/bin/git\n  Args: [\"--git-dir\" \"/nfs/gitlab/gitdata/repositories/group/project.git\" \"cat-file\" \"--batch-check\"]\n\n  Parent PID:  3563\n\n  Slowest file open times for PID 24670:\n\n    dur (ms)       timestamp            error         file name\n  ----------    ---------------    ---------------    ---------\n    5203.999    13:45:16.152985           -           /efs/gitlab/home/.gitconfig\n       5.420    13:45:16.143520           -           /nfs/gitlab/gitdata/repositories/group/project.git/config\n       2.959    13:45:21.372776           -           /efs/gitlab/home/.gitconfig\n       2.934    13:45:21.401073           -           /nfs/gitlab/gitdata/repositories/group/project.git/refs/\n       2.736    13:45:21.417333        ENOENT         /nfs/gitlab/gitdata/repositories/group/project.git/info/grafts\n       2.683    13:45:21.421558           -           /nfs/gitlab/gitdata/repositories/group/project.git/objects/b7/ef5eba3a425af1e2a9cf6f51cb87454b6e1ad1\n       2.430    13:45:21.407170        ENOENT         /nfs/gitlab/gitdata/repositories/group/project.git/objects/info/alternates\n       0.992    13:45:21.420213        ENOENT         /nfs/gitlab/gitdata/repositories/group/project.git/shallow\n       0.823    13:45:21.405535           -           /nfs/gitlab/gitdata/repositories/group/project.git/objects/pack\n       0.275    13:45:21.380382           -           /nfs/gitlab/gitdata/repositories/group/project.git/config\n\n```\n\nThe output shows the time this process spent working was dominated by the slow file open. This data points the Support team in the right direction for fixing the underlying issue.\n\n\nStrace itself has the `-c` flag which provides a similar summary, but its utility is limited when multiple processes are traced as it cannot break out per-process statistics.  Strace-Parser breaks these down to the PID level, and can also include the details of parent and child processes on demand.\n\n“In this case Will has identified an interesting area for our customer and then very quickly anchored it in the fact that when we look at that one spot it was slow,” says Lee. “When we’re debugging, having this data available really helps us pinpoint the problem for our customers so we can give them answers.”\n\nThe typical GitLab deployment has many different processes and services running at a time, which can create dozens of different child processes, so there is a large surface area for potential errors or slowness to occur.\n\nStrace-Parser is an open source, generic tool that anyone can use to better understand their strace data.\n\n## 2. [JSON Stats](https://gitlab.com/gitlab-com/support/toolbox/json_stats)\n\nWill also built [JSON Stats](https://gitlab.com/gitlab-com/support/toolbox/json_stats), a script that pulls performance statistics for different logs from the customer’s GitLab environment and summarizes the results in an easy-to-interpret table.\n\n```text\nMETHOD                             COUNT     RPS     PERC99     PERC95 MEDIAN         MAX        MIN          SCORE    % FAIL\nFetchRemote                         2542    0.17  962176.08  130154.88 36580.23  4988513.00    1940.45  2445851585.19      1.06\nFindAllTags                         5200    0.34   30000.37   11538.63 1941.84    30006.23     252.10   156001924.68      1.63\nFindCommit                          3506    0.23   20859.98   16622.78 10841.86    30001.59    2528.67    73135073.75      0.23\nFindAllRemoteBranches               1664    0.11   20432.93   12996.75 8606.60   405503.94    1430.84    34000396.10      0.00\nAddRemote                           2603    0.17   10001.03    8094.97 825.46    10007.46     228.13    26032673.70      3.00\nFindLocalBranches                   2535    0.16   10004.68   10002.90 9051.91    10036.16    1260.89    25361871.05     34.32\n```\n\nThis output shows that we’re calling the “FindLocalBranches” service 2500+ times, and it’s failing 34% of the time.\n\n\nThe Support team can use JSON Stats to ground their findings in evidence when evaluating overall performance for a customer. It's the same concept as strace-parser. Can we pivot the information in a way that it clearly becomes meaningful data?\n\n“It’s a quick way of extracting data that you can give to a customer. Instead of saying ‘Look, this failed once,’ we can say, ‘Look, this is failing a third of the time and that suggests there’s a problem with X,’” says Will.\n\nIn the sample output we see that JSON Stats is working with Gitaly logs, but the tool is nimble enough to work on the logs from all the heavy components of GitLab, including Rails, which runs the UI, and Sidekiq, which works on background tasks.\n\n“Some of our customers are very sophisticated and may have advanced monitoring that could give us this information. But we wanted to build a tool that would help us align and easily standardize on how we can get this performance information for customers that don’t have an advanced monitoring setup,” says Lee.\n\nWhile this specific tool isn't as helpful for people outside of the GitLab community, hopefully it helps to inspire others to consider how they are drawing conclusions, and how they can speed that process up.\n\n### Benchmarking with JSON Stats\n\nWill is building a future iteration of JSON Stats that will compare the performance of a customer’s GitLab instance with GitLab.com.\n\n![JSON benchmarking table](https://about.gitlab.com/images/blogimages/support-tools-update.png)\n\nBenchmarking the performance of GitLab.com (the first row) with the customer environment (second row), and the ratio between the two (third row). We can see that in the worst case, the customer’s 99th percentile FindCommit latency was almost eight times slower than it was on GitLab.com.\n\n\n“Our vision here is to give accountability to our customers. We’re going to treat GitLab.com as the pinnacle experience for GitLab,” says Lee. “We want to use JSON Stats with benchmarking to help us understand how far away our customers are from GitLab.com.”\n\nLee and Will are still assessing how to set the target range for the customer’s instance of GitLab. But considering the wealth of resources allocated to GitLab.com, any self-managed customer that is performing within 5-10% of GitLab.com would be considered hugely successful.\n\n## 3. [GitLab SOS](https://gitlab.com/gitlab-com/support/toolbox/gitlabsos)\n\nWhen a customer encounters an issue, but they are unsure of what they problem is, they can run [GitLab SOS](https://gitlab.com/gitlab-com/support/toolbox/gitlabsos), created by support engineer [Cody West](/company/team/#codyww), to create a snapshot of different activities happening on their system. It's been so helpful in debugging GitLab that it's being added into our [Omnibus delivery](https://gitlab.com/gitlab-org/omnibus-gitlab/merge_requests/3430).\n\nBy capturing so much data about a moment in time during or shortly after encountering a problem, the support team is able to work asynchronously to troubleshoot on behalf of the customer.\n\n```text\ncpuinfo              getenforce           iotop netstat              opt                  sestatus             unicorn_stats\ndf_h                 gitlab_status        lscpu netstat_i            pidstat              systemctl_unit_files uptime\ndmesg                gitlabsos.log        meminfo nfsiostat            ps                   tainted              var\netc                  hostname             mount nfsstat              sar_dev              ulimit               vmstat\nfree_m               iostat               mpstat ntpq                 sar_tcp              uname\n```\n\nGitLab SOS works best if the script is run while an issue is occurring, or moments after, but even if the window of opportunity is missed you can still successfully gather information to diagnose the problem.\n\n\n“If a customer is sharp, they may know what problems to look for already,” says Lee. “But if a customer is scared and they don’t know what to look for, then they can lean on a tool like GitLab SOS and learn from GitLab SOS. We even have some sharp customers that will generate the SOS output and begin to troubleshoot themselves because of the comprehensive overview it provides.”\n\n## These new tools drive data-driven decision-making in Support\n\nTools like strace-parser, JSON Stats, and GitLab SOS provide the Support team and GitLab customers with critical evidence about performance. By letting the data drive decision-making, the Support team is able to identify problems faster and quickly start debugging customer environments. Performance is a key feature of GitLab, and by filling our toolbox with data-driven solutions we can ensure greater [transparency](https://handbook.gitlab.com/handbook/values/#transparency) between GitLab and our customers.\n\nLearn more about debugging from a support engineering perspective in a GitLab Unfiltered video.\n\n\u003Cfigure class=\"video_container\">\n  \u003Ciframe src=\"https://www.youtube.com/embed/9W6QnpYewik\" frameborder=\"0\" allowfullscreen=\"true\"> \u003C/iframe>\n\u003C/figure>\n\nCover photo by [Diogo Nunes](https://unsplash.com/@dialex?utm_source=unsplash&utm_medium=referral&utm_content=creditCopyText) on [Unsplash](https://unsplash.com/search/photos/tools?utm_source=unsplash&utm_medium=referral&utm_content=creditCopyText)\n","engineering",{"slug":15,"featured":16,"template":17},"three-new-support-tools",false,"BlogPost",{"title":5,"description":19,"authors":20,"heroImage":21,"date":22,"body":12,"category":13,"tags":23},"Say hello to the open source tools our Support team is using to better summarize customer performance data – and find out how they can help you.",[10,11],"https://res.cloudinary.com/about-gitlab-com/image/upload/v1749670405/Blog/Hero%20Images/open_source_tools.jpg","2019-07-24",[24,25,26],"open source","features","inside GitLab","yml",null,{},true,"/en-us/blog/three-new-support-tools","seo:\n  title: We've open sourced 3 tools to help troubleshoot system performance\n  description: >-\n    Say hello to the open source tools our Support team is using to better\n    summarize customer performance data – and find out how they can help you.\n  ogTitle: We've open sourced 3 tools to help troubleshoot system performance\n  ogDescription: >-\n    Say hello to the open source tools our Support team is using to better\n    summarize customer performance data – and find out how they can help you.\n  noIndex: false\n  ogImage: >-\n    https://res.cloudinary.com/about-gitlab-com/image/upload/v1749670405/Blog/Hero%20Images/open_source_tools.jpg\n  ogUrl: https://about.gitlab.com/blog/three-new-support-tools\n  ogSiteName: https://about.gitlab.com\n  ogType: article\n  canonicalUrls: https://about.gitlab.com/blog/three-new-support-tools\ncontent:\n  title: We've open sourced 3 tools to help troubleshoot system performance\n  description: >-\n    Say hello to the open source tools our Support team is using to better\n    summarize customer performance data – and find out how they can help you.\n  authors:\n    - Will Chandler\n    - Sara Kassabian\n  heroImage: >-\n    https://res.cloudinary.com/about-gitlab-com/image/upload/v1749670405/Blog/Hero%20Images/open_source_tools.jpg\n  date: '2019-07-24'\n  body: >\n    Our self-managed customers often encounter issues related to performance, or\n    the time it takes to execute something. In the past, the [Support\n    team](https://handbook.gitlab.com/handbook/support/) had to pull data from\n    disparate sources and cobble it together in order to analyze\n    performance-related issues.\n\n\n    “We’re dealing with someone else’s computer on support, so we have to be\n    able to handle environments with limited observability,” says [Will\n    Chandler](/company/team/#wchandler), senior support engineer. “We’re at the\n    mercy of their infrastructure. That’s why the team has made tools to reduce\n    the friction.”\n\n\n    “With [GitLab.com](/pricing/), we have all of this fancy tooling that helps\n    us collect performance data,” says [Lee Matos](/company/team/#leematos),\n    support engineering manager. “But when we’re working with customers, we need\n    to be ready to bring lightweight tools that don’t require a lot of setup\n    that we can use based on what they have in place.”\n\n\n    The Support team is working on becoming more data driven by using three new\n    tools designed to aggregate and summarize performance data for self-managed\n    customers. A focus on data-driven decision-making improves the customer\n    relationship and demonstrates our commitment to making performance a key\n    feature of GitLab.\n\n\n    We'll look at three open source tools created by GitLab Self-Managed\n    Support. Strace parser is a general tool that could be of use to anyone,\n    while JSON Stats and GitLabSOS are tailored to GitLab, but could be easily\n    modified.\n\n\n    ## 1. [Strace\n    parser](https://gitlab.com/gitlab-com/support/toolbox/strace-parser)\n\n\n    [Strace](https://gitlab.com/strace/strace) is a commonly used debugging and\n    diagnostic tool in Linux that captures information about what’s happening\n    inside processes running on our customers’ environments.\n\n\n    Unlike [newer](http://man7.org/linux/man-pages/man1/perf.1.html) and [more\n    powerful](https://github.com/iovisor/bpftrace) tracing tools, strace adds\n    [significant overhead to a\n    process](http://www.brendangregg.com/blog/2014-05-11/strace-wow-much-syscall.html).\n    However, strace is generally available even on very old versions of Linux.\n\n\n    An strace of a single-threaded program is linear, but following the threads\n    of execution quickly gets difficult when there are many processes being\n    captured. At GitLab Support we are typically tracing\n    [Unicorn](https://bogomips.org/unicorn/) workers or\n    [Gitaly](https://gitlab.com/gitlab-org/gitaly), which are highly concurrent,\n    resulting in hundreds of process IDs being traced and hundreds of thousands\n    of lines of output from traces only a few seconds long.\n\n\n    Will built [strace\n    parser](https://gitlab.com/gitlab-com/support/toolbox/strace-parser) for\n    these types of use cases. Strace parser summarizes the most meaningful\n    processing data delivered by an strace in a more accessible format, allowing\n    users to find the critical section sections of the data quickly.\n\n\n    The next two examples are from a GitLab customer that was using a very slow\n    file system to host their .gitconfig file, which was a major performance\n    bottleneck. But it was not immediately clear what was happening from the\n    perspective of a user trying to troubleshoot. By running an strace on\n    Gitaly, we were able to get a better understanding of why the system was so\n    slow.\n\n\n    ```text\n\n    3694  13:45:06.207369 clock_gettime(CLOCK_MONOTONIC, {3016230, 201254200}) =\n    0 \u003C0.000015>\n\n    3694  13:45:06.207409 futex(0x7f645bb49664, FUTEX_WAIT_BITSET_PRIVATE,\n    192398, {3016230, 299906871}, ffffffff \u003Cunfinished ...>\n\n    3542  13:45:06.209616 \u003C... futex resumed> ) = -1 ETIMEDOUT (Connection timed\n    out) \u003C0.005236>\n\n    3542  13:45:06.209639 futex(0x1084ff0, FUTEX_WAKE, 1) = 1 \u003C0.000023>\n\n    3510  13:45:06.209673 \u003C... futex resumed> ) = 0 \u003C0.002909>\n\n    3542  13:45:06.209701 futex(0xc420896548, FUTEX_WAKE, 1 \u003Cunfinished ...>\n\n    3510  13:45:06.209710 pselect6(0, NULL, NULL, NULL, {0, 20000}, NULL\n    \u003Cunfinished ...>\n\n    16780 13:45:06.209740 \u003C... futex resumed> ) = 0 \u003C0.002984>\n\n    3542  13:45:06.209749 \u003C... futex resumed> ) = 1 \u003C0.000043>\n\n    16780 13:45:06.209776 pselect6(0, NULL, NULL, NULL, {0, 3000}, NULL\n    \u003Cunfinished ...>\n\n    3542  13:45:06.209787 futex(0xc420053548, FUTEX_WAKE, 1 \u003Cunfinished ...>\n\n    16780 13:45:06.209839 \u003C... pselect6 resumed> ) = 0 (Timeout) \u003C0.000056>\n\n    3544  13:45:06.209853 \u003C... futex resumed> ) = 0 \u003C0.003148>\n\n    3542  13:45:06.209861 \u003C... futex resumed> ) = 1 \u003C0.000069>\n\n    3510  13:45:06.209868 \u003C... pselect6 resumed> ) = 0 (Timeout) \u003C0.000151>\n\n    3544  13:45:06.209915 epoll_ctl(4\u003Canon_inode:[eventpoll]>, EPOLL_CTL_DEL,\n    181\u003CUNIX:[164869291]>, 0xc42105bb14 \u003Cunfinished ...>\n\n    16780 13:45:06.210076 write(1\u003Cpipe:[55447]>, \"time=\\\"2019-02-14T18:45:06Z\\\"\n    level=warning msg=\\\"health check failed\\\" error=\\\"rpc error: code =\n    DeadlineExceeded desc = context deadline exceeded\\\"\n    worker.name=gitaly-ruby.4\\n\", 170 \u003Cunfinished ...>\n\n    3544  13:45:06.210093 \u003C... epoll_ctl resumed> ) = 0 \u003C0.000053>\n\n    3542  13:45:06.210101 futex(0x1089020, FUTEX_WAIT, 0, {0, 480025102}\n    \u003Cunfinished ...>\n\n    3510  13:45:06.210109 pselect6(0, NULL, NULL, NULL, {0, 20000}, NULL\n    \u003Cunfinished ...>\n\n    16780 13:45:06.210153 \u003C... write resumed> ) = 170 \u003C0.000064>\n\n    3544  13:45:06.210163 close(181\u003CUNIX:[164869291]> \u003Cunfinished ...>\n\n    ```\n\n\n    This strace delivers more than 300,000 lines about the different Gitaly\n    processes running on this customer’s GitLab environment, making it\n    challenging to decipher the flow of execution.\n\n\n\n    “In this case, we can use strace-parser to say, ‘Just give me all the files\n    that were opened, and sort them by how long it took to open,’” says Will.\n\n\n    ```text\n\n    $ strace-parser trace.txt files --sort duration\n\n\n    Files Opened\n\n          pid      dur (ms)       timestamp            error         file name\n      -------    ----------    ---------------    ---------------    ---------\n        24670      5203.999    13:45:16.152985           -           /efs/gitlab/home/.gitconfig\n        24859      5296.580    13:45:23.367482           -           /efs/gitlab/home/.gitconfig\n        24584      5279.810    13:45:09.286019           -           /efs/gitlab/home/.gitconfig\n        24666      5276.975    13:45:16.079697           -           /efs/gitlab/home/.gitconfig\n        24667      5255.649    13:45:16.101009           -           /efs/gitlab/home/.gitconfig\n        14871      2594.364    13:45:18.762347           -           /efs/gitlab/home/.gitconfig\n        24885      2440.635    13:45:26.224189           -           /efs/gitlab/home/.gitconfig\n        24886      2432.980    13:45:26.231009           -           /efs/gitlab/home/.gitconfig\n        24656        55.873    13:45:15.916836        ENOENT         /nfs/gitlab/gitdata/repositories/group/project.git/objects/info/alternates\n        24688        42.764    13:45:21.522789        ENOENT         /nfs/gitlab/gitdata/repositories/group/project.git/objects/info/alternates\n         3709        39.631    13:45:07.816618           -           /efs/gitlab/home/.gitconfig\n        24583        37.959    13:45:09.218283           -           /efs/gitlab/home/.gitconfig\n\n    ```\n\n\n    By summarizing the data in this way, we see multiple files that took 2-5\n    seconds to open, which is several orders of magnitude slower than expected.\n\n\n\n    “If it’s a particularly busy server and we’re performing these actions 50\n    times a second, 100 times a second, that adds up really fast,” says Will.\n    “Strace-Parser lets you drill down quickly, and say, ‘OK, this specific\n    thing we’re doing is super slow.’”\n\n\n    ### Get a closer look at processes using strace-parser\n\n\n    Strace-Parser can also be used to drill down into details of a process.\n\n\n    The previous output showed PID 24670 is one of the slower processes, so we\n    use the parser to understand how this slow call impacted the performance of\n    the process overall.\n\n\n    ```text\n\n    $ strace-parser trace.txt pid 24670\n\n\n    PID 24670\n\n      271 syscalls, active time: 5303.438ms, user time: 34.662ms, total time: 5338.100ms\n      start time: 13:45:16.116671    end time: 13:45:21.454771\n\n      syscall                 count    total (ms)      max (ms)      avg (ms)      min (ms)    errors\n      -----------------    --------    ----------    ----------    ----------    ----------    --------\n      open                       29      5223.073      5203.999       180.106         0.031    ENOENT: 9\n      read                       25        46.303        28.747         1.852         0.031\n      access                     11         6.948         4.131         0.632         0.056    ENOENT: 3\n      lstat                       6         5.116         2.130         0.853         0.077    ENOENT: 4\n      mmap                       32         3.868         0.485         0.121         0.028\n      openat                      2         3.757         2.934         1.878         0.823\n      fstat                      28         3.395         0.272         0.121         0.033\n      munmap                     11         2.551         0.929         0.232         0.056\n      rt_sigaction               59         2.548         0.121         0.043         0.024\n      close                      22         2.375         0.279         0.108         0.032\n      mprotect                   14         0.927         0.174         0.066         0.032\n      execve                      1         0.621         0.621         0.621         0.621\n      brk                         6         0.595         0.210         0.099         0.046\n      stat                        8         0.388         0.082         0.048         0.027    ENOENT: 3\n      getdents                    4         0.361         0.138         0.090         0.044\n      rt_sigprocmask              3         0.141         0.059         0.047         0.040\n      write                       1         0.101         0.101         0.101         0.101\n      dup2                        3         0.090         0.032         0.030         0.026\n      arch_prctl                  1         0.077         0.077         0.077         0.077\n      getrlimit                   1         0.062         0.062         0.062         0.062\n      getcwd                      1         0.044         0.044         0.044         0.044\n      set_robust_list             1         0.035         0.035         0.035         0.035\n      set_tid_address             1         0.032         0.032         0.032         0.032\n      setpgid                     1         0.030         0.030         0.030         0.030\n      ---------------\n\n      Program Executed: /opt/gitlab/embedded/bin/git\n      Args: [\"--git-dir\" \"/nfs/gitlab/gitdata/repositories/group/project.git\" \"cat-file\" \"--batch-check\"]\n\n      Parent PID:  3563\n\n      Slowest file open times for PID 24670:\n\n        dur (ms)       timestamp            error         file name\n      ----------    ---------------    ---------------    ---------\n        5203.999    13:45:16.152985           -           /efs/gitlab/home/.gitconfig\n           5.420    13:45:16.143520           -           /nfs/gitlab/gitdata/repositories/group/project.git/config\n           2.959    13:45:21.372776           -           /efs/gitlab/home/.gitconfig\n           2.934    13:45:21.401073           -           /nfs/gitlab/gitdata/repositories/group/project.git/refs/\n           2.736    13:45:21.417333        ENOENT         /nfs/gitlab/gitdata/repositories/group/project.git/info/grafts\n           2.683    13:45:21.421558           -           /nfs/gitlab/gitdata/repositories/group/project.git/objects/b7/ef5eba3a425af1e2a9cf6f51cb87454b6e1ad1\n           2.430    13:45:21.407170        ENOENT         /nfs/gitlab/gitdata/repositories/group/project.git/objects/info/alternates\n           0.992    13:45:21.420213        ENOENT         /nfs/gitlab/gitdata/repositories/group/project.git/shallow\n           0.823    13:45:21.405535           -           /nfs/gitlab/gitdata/repositories/group/project.git/objects/pack\n           0.275    13:45:21.380382           -           /nfs/gitlab/gitdata/repositories/group/project.git/config\n\n    ```\n\n\n    The output shows the time this process spent working was dominated by the\n    slow file open. This data points the Support team in the right direction for\n    fixing the underlying issue.\n\n\n\n    Strace itself has the `-c` flag which provides a similar summary, but its\n    utility is limited when multiple processes are traced as it cannot break out\n    per-process statistics.  Strace-Parser breaks these down to the PID level,\n    and can also include the details of parent and child processes on demand.\n\n\n    “In this case Will has identified an interesting area for our customer and\n    then very quickly anchored it in the fact that when we look at that one spot\n    it was slow,” says Lee. “When we’re debugging, having this data available\n    really helps us pinpoint the problem for our customers so we can give them\n    answers.”\n\n\n    The typical GitLab deployment has many different processes and services\n    running at a time, which can create dozens of different child processes, so\n    there is a large surface area for potential errors or slowness to occur.\n\n\n    Strace-Parser is an open source, generic tool that anyone can use to better\n    understand their strace data.\n\n\n    ## 2. [JSON Stats](https://gitlab.com/gitlab-com/support/toolbox/json_stats)\n\n\n    Will also built [JSON\n    Stats](https://gitlab.com/gitlab-com/support/toolbox/json_stats), a script\n    that pulls performance statistics for different logs from the customer’s\n    GitLab environment and summarizes the results in an easy-to-interpret table.\n\n\n    ```text\n\n    METHOD                             COUNT     RPS     PERC99     PERC95\n    MEDIAN         MAX        MIN          SCORE    % FAIL\n\n    FetchRemote                         2542    0.17  962176.08  130154.88\n    36580.23  4988513.00    1940.45  2445851585.19      1.06\n\n    FindAllTags                         5200    0.34   30000.37   11538.63\n    1941.84    30006.23     252.10   156001924.68      1.63\n\n    FindCommit                          3506    0.23   20859.98   16622.78\n    10841.86    30001.59    2528.67    73135073.75      0.23\n\n    FindAllRemoteBranches               1664    0.11   20432.93   12996.75\n    8606.60   405503.94    1430.84    34000396.10      0.00\n\n    AddRemote                           2603    0.17   10001.03    8094.97\n    825.46    10007.46     228.13    26032673.70      3.00\n\n    FindLocalBranches                   2535    0.16   10004.68   10002.90\n    9051.91    10036.16    1260.89    25361871.05     34.32\n\n    ```\n\n\n    This output shows that we’re calling the “FindLocalBranches” service 2500+\n    times, and it’s failing 34% of the time.\n\n\n\n    The Support team can use JSON Stats to ground their findings in evidence\n    when evaluating overall performance for a customer. It's the same concept as\n    strace-parser. Can we pivot the information in a way that it clearly becomes\n    meaningful data?\n\n\n    “It’s a quick way of extracting data that you can give to a customer.\n    Instead of saying ‘Look, this failed once,’ we can say, ‘Look, this is\n    failing a third of the time and that suggests there’s a problem with X,’”\n    says Will.\n\n\n    In the sample output we see that JSON Stats is working with Gitaly logs, but\n    the tool is nimble enough to work on the logs from all the heavy components\n    of GitLab, including Rails, which runs the UI, and Sidekiq, which works on\n    background tasks.\n\n\n    “Some of our customers are very sophisticated and may have advanced\n    monitoring that could give us this information. But we wanted to build a\n    tool that would help us align and easily standardize on how we can get this\n    performance information for customers that don’t have an advanced monitoring\n    setup,” says Lee.\n\n\n    While this specific tool isn't as helpful for people outside of the GitLab\n    community, hopefully it helps to inspire others to consider how they are\n    drawing conclusions, and how they can speed that process up.\n\n\n    ### Benchmarking with JSON Stats\n\n\n    Will is building a future iteration of JSON Stats that will compare the\n    performance of a customer’s GitLab instance with GitLab.com.\n\n\n    ![JSON benchmarking\n    table](https://about.gitlab.com/images/blogimages/support-tools-update.png)\n\n\n    Benchmarking the performance of GitLab.com (the first row) with the customer\n    environment (second row), and the ratio between the two (third row). We can\n    see that in the worst case, the customer’s 99th percentile FindCommit\n    latency was almost eight times slower than it was on GitLab.com.\n\n\n\n    “Our vision here is to give accountability to our customers. We’re going to\n    treat GitLab.com as the pinnacle experience for GitLab,” says Lee. “We want\n    to use JSON Stats with benchmarking to help us understand how far away our\n    customers are from GitLab.com.”\n\n\n    Lee and Will are still assessing how to set the target range for the\n    customer’s instance of GitLab. But considering the wealth of resources\n    allocated to GitLab.com, any self-managed customer that is performing within\n    5-10% of GitLab.com would be considered hugely successful.\n\n\n    ## 3. [GitLab SOS](https://gitlab.com/gitlab-com/support/toolbox/gitlabsos)\n\n\n    When a customer encounters an issue, but they are unsure of what they\n    problem is, they can run [GitLab\n    SOS](https://gitlab.com/gitlab-com/support/toolbox/gitlabsos), created by\n    support engineer [Cody West](/company/team/#codyww), to create a snapshot of\n    different activities happening on their system. It's been so helpful in\n    debugging GitLab that it's being added into our [Omnibus\n    delivery](https://gitlab.com/gitlab-org/omnibus-gitlab/merge_requests/3430).\n\n\n    By capturing so much data about a moment in time during or shortly after\n    encountering a problem, the support team is able to work asynchronously to\n    troubleshoot on behalf of the customer.\n\n\n    ```text\n\n    cpuinfo              getenforce           iotop\n    netstat              opt                  sestatus             unicorn_stats\n\n    df_h                 gitlab_status        lscpu\n    netstat_i            pidstat              systemctl_unit_files uptime\n\n    dmesg                gitlabsos.log        meminfo\n    nfsiostat            ps                   tainted              var\n\n    etc                  hostname             mount\n    nfsstat              sar_dev              ulimit               vmstat\n\n    free_m               iostat               mpstat\n    ntpq                 sar_tcp              uname\n\n    ```\n\n\n    GitLab SOS works best if the script is run while an issue is occurring, or\n    moments after, but even if the window of opportunity is missed you can still\n    successfully gather information to diagnose the problem.\n\n\n\n    “If a customer is sharp, they may know what problems to look for already,”\n    says Lee. “But if a customer is scared and they don’t know what to look for,\n    then they can lean on a tool like GitLab SOS and learn from GitLab SOS. We\n    even have some sharp customers that will generate the SOS output and begin\n    to troubleshoot themselves because of the comprehensive overview it\n    provides.”\n\n\n    ## These new tools drive data-driven decision-making in Support\n\n\n    Tools like strace-parser, JSON Stats, and GitLab SOS provide the Support\n    team and GitLab customers with critical evidence about performance. By\n    letting the data drive decision-making, the Support team is able to identify\n    problems faster and quickly start debugging customer environments.\n    Performance is a key feature of GitLab, and by filling our toolbox with\n    data-driven solutions we can ensure greater\n    [transparency](https://handbook.gitlab.com/handbook/values/#transparency)\n    between GitLab and our customers.\n\n\n    Learn more about debugging from a support engineering perspective in a\n    GitLab Unfiltered video.\n\n\n    \u003Cfigure class=\"video_container\">\n      \u003Ciframe src=\"https://www.youtube.com/embed/9W6QnpYewik\" frameborder=\"0\" allowfullscreen=\"true\"> \u003C/iframe>\n    \u003C/figure>\n\n\n    Cover photo by [Diogo\n    Nunes](https://unsplash.com/@dialex?utm_source=unsplash&utm_medium=referral&utm_content=creditCopyText)\n    on\n    [Unsplash](https://unsplash.com/search/photos/tools?utm_source=unsplash&utm_medium=referral&utm_content=creditCopyText)\n\n  category: engineering\n  tags:\n    - open source\n    - features\n    - inside GitLab\nconfig:\n  slug: three-new-support-tools\n  featured: false\n  template: BlogPost\n",{"title":5,"description":19,"ogTitle":5,"ogDescription":19,"noIndex":16,"ogImage":21,"ogUrl":34,"ogSiteName":35,"ogType":36,"canonicalUrls":34},"https://about.gitlab.com/blog/three-new-support-tools","https://about.gitlab.com","article","en-us/blog/three-new-support-tools",[39,25,40],"open-source","inside-gitlab",[24,25,26],"fiJtDqQi3mbBwzRux2PuFibPObaeovMv0sze4jIBvJw",{"data":44},{"logo":45,"freeTrial":50,"sales":55,"login":60,"items":65,"search":374,"minimal":405,"duo":424,"switchNav":433,"pricingDeployment":444},{"config":46},{"href":47,"dataGaName":48,"dataGaLocation":49},"/","gitlab logo","header",{"text":51,"config":52},"Get free trial",{"href":53,"dataGaName":54,"dataGaLocation":49},"https://gitlab.com/-/trial_registrations/new?glm_source=about.gitlab.com&glm_content=default-saas-trial/","free trial",{"text":56,"config":57},"Talk to sales",{"href":58,"dataGaName":59,"dataGaLocation":49},"/sales/","sales",{"text":61,"config":62},"Sign in",{"href":63,"dataGaName":64,"dataGaLocation":49},"https://gitlab.com/users/sign_in/","sign in",[66,93,188,193,295,355],{"text":67,"config":68,"cards":70},"Platform",{"dataNavLevelOne":69},"platform",[71,77,85],{"title":67,"description":72,"link":73},"The intelligent orchestration platform for DevSecOps",{"text":74,"config":75},"Explore our Platform",{"href":76,"dataGaName":69,"dataGaLocation":49},"/platform/",{"title":78,"description":79,"link":80},"GitLab Duo Agent Platform","Agentic AI for the entire software lifecycle",{"text":81,"config":82},"Meet GitLab Duo",{"href":83,"dataGaName":84,"dataGaLocation":49},"/gitlab-duo-agent-platform/","gitlab duo agent platform",{"title":86,"description":87,"link":88},"Why GitLab","See the top reasons enterprises choose GitLab",{"text":89,"config":90},"Learn more",{"href":91,"dataGaName":92,"dataGaLocation":49},"/why-gitlab/","why gitlab",{"text":94,"left":30,"config":95,"link":97,"lists":101,"footer":170},"Product",{"dataNavLevelOne":96},"solutions",{"text":98,"config":99},"View all Solutions",{"href":100,"dataGaName":96,"dataGaLocation":49},"/solutions/",[102,126,149],{"title":103,"description":104,"link":105,"items":110},"Automation","CI/CD and automation to accelerate deployment",{"config":106},{"icon":107,"href":108,"dataGaName":109,"dataGaLocation":49},"AutomatedCodeAlt","/solutions/delivery-automation/","automated software delivery",[111,115,118,122],{"text":112,"config":113},"CI/CD",{"href":114,"dataGaLocation":49,"dataGaName":112},"/solutions/continuous-integration/",{"text":78,"config":116},{"href":83,"dataGaLocation":49,"dataGaName":117},"gitlab duo agent platform - product menu",{"text":119,"config":120},"Source Code Management",{"href":121,"dataGaLocation":49,"dataGaName":119},"/solutions/source-code-management/",{"text":123,"config":124},"Automated Software Delivery",{"href":108,"dataGaLocation":49,"dataGaName":125},"Automated software delivery",{"title":127,"description":128,"link":129,"items":134},"Security","Deliver code faster without compromising security",{"config":130},{"href":131,"dataGaName":132,"dataGaLocation":49,"icon":133},"/solutions/application-security-testing/","security and compliance","ShieldCheckLight",[135,139,144],{"text":136,"config":137},"Application Security Testing",{"href":131,"dataGaName":138,"dataGaLocation":49},"Application security testing",{"text":140,"config":141},"Software Supply Chain Security",{"href":142,"dataGaLocation":49,"dataGaName":143},"/solutions/supply-chain/","Software supply chain security",{"text":145,"config":146},"Software Compliance",{"href":147,"dataGaName":148,"dataGaLocation":49},"/solutions/software-compliance/","software compliance",{"title":150,"link":151,"items":156},"Measurement",{"config":152},{"icon":153,"href":154,"dataGaName":155,"dataGaLocation":49},"DigitalTransformation","/solutions/visibility-measurement/","visibility and measurement",[157,161,165],{"text":158,"config":159},"Visibility & Measurement",{"href":154,"dataGaLocation":49,"dataGaName":160},"Visibility and Measurement",{"text":162,"config":163},"Value Stream Management",{"href":164,"dataGaLocation":49,"dataGaName":162},"/solutions/value-stream-management/",{"text":166,"config":167},"Analytics & Insights",{"href":168,"dataGaLocation":49,"dataGaName":169},"/solutions/analytics-and-insights/","Analytics and insights",{"title":171,"items":172},"GitLab for",[173,178,183],{"text":174,"config":175},"Enterprise",{"href":176,"dataGaLocation":49,"dataGaName":177},"/enterprise/","enterprise",{"text":179,"config":180},"Small Business",{"href":181,"dataGaLocation":49,"dataGaName":182},"/small-business/","small business",{"text":184,"config":185},"Public Sector",{"href":186,"dataGaLocation":49,"dataGaName":187},"/solutions/public-sector/","public sector",{"text":189,"config":190},"Pricing",{"href":191,"dataGaName":192,"dataGaLocation":49,"dataNavLevelOne":192},"/pricing/","pricing",{"text":194,"config":195,"link":197,"lists":201,"feature":286},"Resources",{"dataNavLevelOne":196},"resources",{"text":198,"config":199},"View all resources",{"href":200,"dataGaName":196,"dataGaLocation":49},"/resources/",[202,235,258],{"title":203,"items":204},"Getting started",[205,210,215,220,225,230],{"text":206,"config":207},"Install",{"href":208,"dataGaName":209,"dataGaLocation":49},"/install/","install",{"text":211,"config":212},"Quick start guides",{"href":213,"dataGaName":214,"dataGaLocation":49},"/get-started/","quick setup checklists",{"text":216,"config":217},"Learn",{"href":218,"dataGaLocation":49,"dataGaName":219},"https://university.gitlab.com/","learn",{"text":221,"config":222},"Product documentation",{"href":223,"dataGaName":224,"dataGaLocation":49},"https://docs.gitlab.com/","product documentation",{"text":226,"config":227},"Best practice videos",{"href":228,"dataGaName":229,"dataGaLocation":49},"/getting-started-videos/","best practice videos",{"text":231,"config":232},"Integrations",{"href":233,"dataGaName":234,"dataGaLocation":49},"/integrations/","integrations",{"title":236,"items":237},"Discover",[238,243,248,253],{"text":239,"config":240},"Customer success stories",{"href":241,"dataGaName":242,"dataGaLocation":49},"/customers/","customer success stories",{"text":244,"config":245},"Blog",{"href":246,"dataGaName":247,"dataGaLocation":49},"/blog/","blog",{"text":249,"config":250},"The Source",{"href":251,"dataGaName":252,"dataGaLocation":49},"/the-source/","the source",{"text":254,"config":255},"Remote",{"href":256,"dataGaName":257,"dataGaLocation":49},"https://handbook.gitlab.com/handbook/company/culture/all-remote/","remote",{"title":259,"items":260},"Connect",[261,266,271,276,281],{"text":262,"config":263},"GitLab Services",{"href":264,"dataGaName":265,"dataGaLocation":49},"/services/","services",{"text":267,"config":268},"Community",{"href":269,"dataGaName":270,"dataGaLocation":49},"/community/","community",{"text":272,"config":273},"Forum",{"href":274,"dataGaName":275,"dataGaLocation":49},"https://forum.gitlab.com/","forum",{"text":277,"config":278},"Events",{"href":279,"dataGaName":280,"dataGaLocation":49},"/events/","events",{"text":282,"config":283},"Partners",{"href":284,"dataGaName":285,"dataGaLocation":49},"/partners/","partners",{"textColor":287,"title":288,"text":289,"link":290},"#000","What’s new in GitLab","Stay updated with our latest features and improvements.",{"text":291,"config":292},"Read the latest",{"href":293,"dataGaName":294,"dataGaLocation":49},"/releases/whats-new/","whats new",{"text":296,"config":297,"lists":299},"Company",{"dataNavLevelOne":298},"company",[300],{"items":301},[302,307,313,315,320,325,330,335,340,345,350],{"text":303,"config":304},"About",{"href":305,"dataGaName":306,"dataGaLocation":49},"/company/","about",{"text":308,"config":309,"footerGa":312},"Jobs",{"href":310,"dataGaName":311,"dataGaLocation":49},"/jobs/","jobs",{"dataGaName":311},{"text":277,"config":314},{"href":279,"dataGaName":280,"dataGaLocation":49},{"text":316,"config":317},"Leadership",{"href":318,"dataGaName":319,"dataGaLocation":49},"/company/team/e-group/","leadership",{"text":321,"config":322},"Team",{"href":323,"dataGaName":324,"dataGaLocation":49},"/company/team/","team",{"text":326,"config":327},"Handbook",{"href":328,"dataGaName":329,"dataGaLocation":49},"https://handbook.gitlab.com/","handbook",{"text":331,"config":332},"Investor relations",{"href":333,"dataGaName":334,"dataGaLocation":49},"https://ir.gitlab.com/","investor relations",{"text":336,"config":337},"Trust Center",{"href":338,"dataGaName":339,"dataGaLocation":49},"/security/","trust center",{"text":341,"config":342},"AI Transparency Center",{"href":343,"dataGaName":344,"dataGaLocation":49},"/ai-transparency-center/","ai transparency center",{"text":346,"config":347},"Newsletter",{"href":348,"dataGaName":349,"dataGaLocation":49},"/company/contact/#contact-forms","newsletter",{"text":351,"config":352},"Press",{"href":353,"dataGaName":354,"dataGaLocation":49},"/press/","press",{"text":356,"config":357,"lists":358},"Contact us",{"dataNavLevelOne":298},[359],{"items":360},[361,364,369],{"text":56,"config":362},{"href":58,"dataGaName":363,"dataGaLocation":49},"talk to sales",{"text":365,"config":366},"Support portal",{"href":367,"dataGaName":368,"dataGaLocation":49},"https://support.gitlab.com","support portal",{"text":370,"config":371},"Customer portal",{"href":372,"dataGaName":373,"dataGaLocation":49},"https://customers.gitlab.com/customers/sign_in/","customer portal",{"close":375,"login":376,"suggestions":383},"Close",{"text":377,"link":378},"To search repositories and projects, login to",{"text":379,"config":380},"gitlab.com",{"href":63,"dataGaName":381,"dataGaLocation":382},"search login","search",{"text":384,"default":385},"Suggestions",[386,388,392,394,398,402],{"text":78,"config":387},{"href":83,"dataGaName":78,"dataGaLocation":382},{"text":389,"config":390},"Code Suggestions (AI)",{"href":391,"dataGaName":389,"dataGaLocation":382},"/solutions/code-suggestions/",{"text":112,"config":393},{"href":114,"dataGaName":112,"dataGaLocation":382},{"text":395,"config":396},"GitLab on AWS",{"href":397,"dataGaName":395,"dataGaLocation":382},"/partners/technology-partners/aws/",{"text":399,"config":400},"GitLab on Google Cloud",{"href":401,"dataGaName":399,"dataGaLocation":382},"/partners/technology-partners/google-cloud-platform/",{"text":403,"config":404},"Why GitLab?",{"href":91,"dataGaName":403,"dataGaLocation":382},{"freeTrial":406,"mobileIcon":411,"desktopIcon":416,"secondaryButton":419},{"text":407,"config":408},"Start free trial",{"href":409,"dataGaName":54,"dataGaLocation":410},"https://gitlab.com/-/trials/new/","nav",{"altText":412,"config":413},"Gitlab Icon",{"src":414,"dataGaName":415,"dataGaLocation":410},"https://res.cloudinary.com/about-gitlab-com/image/upload/v1758203874/jypbw1jx72aexsoohd7x.svg","gitlab icon",{"altText":412,"config":417},{"src":418,"dataGaName":415,"dataGaLocation":410},"https://res.cloudinary.com/about-gitlab-com/image/upload/v1758203875/gs4c8p8opsgvflgkswz9.svg",{"text":420,"config":421},"Get Started",{"href":422,"dataGaName":423,"dataGaLocation":410},"https://gitlab.com/-/trial_registrations/new?glm_source=about.gitlab.com/get-started/","get started",{"freeTrial":425,"mobileIcon":429,"desktopIcon":431},{"text":426,"config":427},"Learn more about GitLab Duo",{"href":83,"dataGaName":428,"dataGaLocation":410},"gitlab duo",{"altText":412,"config":430},{"src":414,"dataGaName":415,"dataGaLocation":410},{"altText":412,"config":432},{"src":418,"dataGaName":415,"dataGaLocation":410},{"button":434,"mobileIcon":439,"desktopIcon":441},{"text":435,"config":436},"/switch",{"href":437,"dataGaName":438,"dataGaLocation":410},"#contact","switch",{"altText":412,"config":440},{"src":414,"dataGaName":415,"dataGaLocation":410},{"altText":412,"config":442},{"src":443,"dataGaName":415,"dataGaLocation":410},"https://res.cloudinary.com/about-gitlab-com/image/upload/v1773335277/ohhpiuoxoldryzrnhfrh.png",{"freeTrial":445,"mobileIcon":450,"desktopIcon":452},{"text":446,"config":447},"Back to pricing",{"href":191,"dataGaName":448,"dataGaLocation":410,"icon":449},"back to pricing","GoBack",{"altText":412,"config":451},{"src":414,"dataGaName":415,"dataGaLocation":410},{"altText":412,"config":453},{"src":418,"dataGaName":415,"dataGaLocation":410},{"title":455,"button":456,"config":461},"See how agentic AI transforms software delivery",{"text":457,"config":458},"Watch GitLab Transcend now",{"href":459,"dataGaName":460,"dataGaLocation":49},"/events/transcend/virtual/","transcend event",{"layout":462,"icon":463,"disabled":30},"release","AiStar",{"data":465},{"text":466,"source":467,"edit":473,"contribute":478,"config":483,"items":488,"minimal":695},"Git is a trademark of Software Freedom Conservancy and our use of 'GitLab' is under license",{"text":468,"config":469},"View page source",{"href":470,"dataGaName":471,"dataGaLocation":472},"https://gitlab.com/gitlab-com/marketing/digital-experience/about-gitlab-com/","page source","footer",{"text":474,"config":475},"Edit this page",{"href":476,"dataGaName":477,"dataGaLocation":472},"https://gitlab.com/gitlab-com/marketing/digital-experience/about-gitlab-com/-/blob/main/content/","web ide",{"text":479,"config":480},"Please contribute",{"href":481,"dataGaName":482,"dataGaLocation":472},"https://gitlab.com/gitlab-com/marketing/digital-experience/about-gitlab-com/-/blob/main/CONTRIBUTING.md/","please contribute",{"twitter":484,"facebook":485,"youtube":486,"linkedin":487},"https://twitter.com/gitlab","https://www.facebook.com/gitlab","https://www.youtube.com/channel/UCnMGQ8QHMAnVIsI3xJrihhg","https://www.linkedin.com/company/gitlab-com",[489,536,590,634,661],{"title":189,"links":490,"subMenu":505},[491,495,500],{"text":492,"config":493},"View plans",{"href":191,"dataGaName":494,"dataGaLocation":472},"view plans",{"text":496,"config":497},"Why Premium?",{"href":498,"dataGaName":499,"dataGaLocation":472},"/pricing/premium/","why premium",{"text":501,"config":502},"Why Ultimate?",{"href":503,"dataGaName":504,"dataGaLocation":472},"/pricing/ultimate/","why ultimate",[506],{"title":507,"links":508},"Contact Us",[509,512,514,516,521,526,531],{"text":510,"config":511},"Contact sales",{"href":58,"dataGaName":59,"dataGaLocation":472},{"text":365,"config":513},{"href":367,"dataGaName":368,"dataGaLocation":472},{"text":370,"config":515},{"href":372,"dataGaName":373,"dataGaLocation":472},{"text":517,"config":518},"Status",{"href":519,"dataGaName":520,"dataGaLocation":472},"https://status.gitlab.com/","status",{"text":522,"config":523},"Terms of use",{"href":524,"dataGaName":525,"dataGaLocation":472},"/terms/","terms of use",{"text":527,"config":528},"Privacy statement",{"href":529,"dataGaName":530,"dataGaLocation":472},"/privacy/","privacy statement",{"text":532,"config":533},"Cookie preferences",{"dataGaName":534,"dataGaLocation":472,"id":535,"isOneTrustButton":30},"cookie preferences","ot-sdk-btn",{"title":94,"links":537,"subMenu":546},[538,542],{"text":539,"config":540},"DevSecOps platform",{"href":76,"dataGaName":541,"dataGaLocation":472},"devsecops platform",{"text":543,"config":544},"AI-Assisted Development",{"href":83,"dataGaName":545,"dataGaLocation":472},"ai-assisted development",[547],{"title":548,"links":549},"Topics",[550,555,560,565,570,575,580,585],{"text":551,"config":552},"CICD",{"href":553,"dataGaName":554,"dataGaLocation":472},"/topics/ci-cd/","cicd",{"text":556,"config":557},"GitOps",{"href":558,"dataGaName":559,"dataGaLocation":472},"/topics/gitops/","gitops",{"text":561,"config":562},"DevOps",{"href":563,"dataGaName":564,"dataGaLocation":472},"/topics/devops/","devops",{"text":566,"config":567},"Version Control",{"href":568,"dataGaName":569,"dataGaLocation":472},"/topics/version-control/","version control",{"text":571,"config":572},"DevSecOps",{"href":573,"dataGaName":574,"dataGaLocation":472},"/topics/devsecops/","devsecops",{"text":576,"config":577},"Cloud Native",{"href":578,"dataGaName":579,"dataGaLocation":472},"/topics/cloud-native/","cloud native",{"text":581,"config":582},"AI for Coding",{"href":583,"dataGaName":584,"dataGaLocation":472},"/topics/devops/ai-for-coding/","ai for coding",{"text":586,"config":587},"Agentic AI",{"href":588,"dataGaName":589,"dataGaLocation":472},"/topics/agentic-ai/","agentic ai",{"title":591,"links":592},"Solutions",[593,595,597,602,606,609,613,616,618,621,624,629],{"text":136,"config":594},{"href":131,"dataGaName":136,"dataGaLocation":472},{"text":125,"config":596},{"href":108,"dataGaName":109,"dataGaLocation":472},{"text":598,"config":599},"Agile development",{"href":600,"dataGaName":601,"dataGaLocation":472},"/solutions/agile-delivery/","agile delivery",{"text":603,"config":604},"SCM",{"href":121,"dataGaName":605,"dataGaLocation":472},"source code management",{"text":551,"config":607},{"href":114,"dataGaName":608,"dataGaLocation":472},"continuous integration & delivery",{"text":610,"config":611},"Value stream management",{"href":164,"dataGaName":612,"dataGaLocation":472},"value stream management",{"text":556,"config":614},{"href":615,"dataGaName":559,"dataGaLocation":472},"/solutions/gitops/",{"text":174,"config":617},{"href":176,"dataGaName":177,"dataGaLocation":472},{"text":619,"config":620},"Small business",{"href":181,"dataGaName":182,"dataGaLocation":472},{"text":622,"config":623},"Public sector",{"href":186,"dataGaName":187,"dataGaLocation":472},{"text":625,"config":626},"Education",{"href":627,"dataGaName":628,"dataGaLocation":472},"/solutions/education/","education",{"text":630,"config":631},"Financial services",{"href":632,"dataGaName":633,"dataGaLocation":472},"/solutions/finance/","financial services",{"title":194,"links":635},[636,638,640,642,645,647,649,651,653,655,657,659],{"text":206,"config":637},{"href":208,"dataGaName":209,"dataGaLocation":472},{"text":211,"config":639},{"href":213,"dataGaName":214,"dataGaLocation":472},{"text":216,"config":641},{"href":218,"dataGaName":219,"dataGaLocation":472},{"text":221,"config":643},{"href":223,"dataGaName":644,"dataGaLocation":472},"docs",{"text":244,"config":646},{"href":246,"dataGaName":247,"dataGaLocation":472},{"text":239,"config":648},{"href":241,"dataGaName":242,"dataGaLocation":472},{"text":254,"config":650},{"href":256,"dataGaName":257,"dataGaLocation":472},{"text":262,"config":652},{"href":264,"dataGaName":265,"dataGaLocation":472},{"text":267,"config":654},{"href":269,"dataGaName":270,"dataGaLocation":472},{"text":272,"config":656},{"href":274,"dataGaName":275,"dataGaLocation":472},{"text":277,"config":658},{"href":279,"dataGaName":280,"dataGaLocation":472},{"text":282,"config":660},{"href":284,"dataGaName":285,"dataGaLocation":472},{"title":296,"links":662},[663,665,667,669,671,673,675,679,684,686,688,690],{"text":303,"config":664},{"href":305,"dataGaName":298,"dataGaLocation":472},{"text":308,"config":666},{"href":310,"dataGaName":311,"dataGaLocation":472},{"text":316,"config":668},{"href":318,"dataGaName":319,"dataGaLocation":472},{"text":321,"config":670},{"href":323,"dataGaName":324,"dataGaLocation":472},{"text":326,"config":672},{"href":328,"dataGaName":329,"dataGaLocation":472},{"text":331,"config":674},{"href":333,"dataGaName":334,"dataGaLocation":472},{"text":676,"config":677},"Sustainability",{"href":678,"dataGaName":676,"dataGaLocation":472},"/sustainability/",{"text":680,"config":681},"Diversity, inclusion and belonging (DIB)",{"href":682,"dataGaName":683,"dataGaLocation":472},"/diversity-inclusion-belonging/","Diversity, inclusion and belonging",{"text":336,"config":685},{"href":338,"dataGaName":339,"dataGaLocation":472},{"text":346,"config":687},{"href":348,"dataGaName":349,"dataGaLocation":472},{"text":351,"config":689},{"href":353,"dataGaName":354,"dataGaLocation":472},{"text":691,"config":692},"Modern Slavery Transparency Statement",{"href":693,"dataGaName":694,"dataGaLocation":472},"https://handbook.gitlab.com/handbook/legal/modern-slavery-act-transparency-statement/","modern slavery transparency statement",{"items":696},[697,700,703],{"text":698,"config":699},"Terms",{"href":524,"dataGaName":525,"dataGaLocation":472},{"text":701,"config":702},"Cookies",{"dataGaName":534,"dataGaLocation":472,"id":535,"isOneTrustButton":30},{"text":704,"config":705},"Privacy",{"href":529,"dataGaName":530,"dataGaLocation":472},[707,720],{"id":708,"title":10,"body":28,"config":709,"content":711,"description":28,"extension":27,"meta":715,"navigation":30,"path":716,"seo":717,"stem":718,"__hash__":719},"blogAuthors/en-us/blog/authors/will-chandler.yml",{"template":710},"BlogAuthor",{"name":10,"config":712},{"headshot":713,"ctfId":714},"https://res.cloudinary.com/about-gitlab-com/image/upload/v1749659488/Blog/Author%20Headshots/gitlab-logo-extra-whitespace.png","DKiIGSSRIyO6QdTQkRkjs",{},"/en-us/blog/authors/will-chandler",{},"en-us/blog/authors/will-chandler","sK-WrYRb0JT7v9_RxRcyh5Fy5NWToYOo_uR1jkiGNqw",{"id":721,"title":11,"body":28,"config":722,"content":723,"description":28,"extension":27,"meta":727,"navigation":30,"path":728,"seo":729,"stem":730,"__hash__":731},"blogAuthors/en-us/blog/authors/sara-kassabian.yml",{"template":710},{"name":11,"config":724},{"headshot":725,"ctfId":726},"","skassabian",{},"/en-us/blog/authors/sara-kassabian",{},"en-us/blog/authors/sara-kassabian","6cCCPpjzkCDfb77tMVCyG8_FImRVQXA2VCMyiAjhFX0",[733,747,760],{"content":734,"config":745},{"title":735,"description":736,"authors":737,"heroImage":739,"date":740,"body":741,"category":13,"tags":742},"How to build CI/CD observability at scale","This practical guide to GitLab pipeline analytics helps self-managed users gain operational insights using Prometheus and Grafana.",[738],"Paul Meresanu","https://res.cloudinary.com/about-gitlab-com/image/upload/v1774465167/n5hlvrsrheadeccyr1oz.png","2026-04-28","CI/CD optimization starts with visibility. Building a successful DevOps platform at enterprise scale **should include** understanding pipeline performance, job execution patterns, and quantifiable operational insights — especially for organizations running GitLab self-managed instances.\n\nTo help GitLab customers maximize their platform investments, we developed the GitLab CI/CD Observability solution as part of our Platform Excellence program, which transforms raw pipeline metrics into actionable operational insights.\n\nA leading financial services organization partnered with GitLab's customer success architect to gain visibility into their GitLab self-managed deployment. Together, we implemented a containerized observability solution combining the open-source gitlab-ci-pipelines-exporter with enterprise-grade Prometheus and Grafana infrastructure.\n\nIn this article, you'll learn the challenges they faced managing pipelines at scale and how GitLab CI/CD Observability addressed them with a practical, end-to-end implementation.\n\n## The challenge: Measuring CI/CD performance\nBefore implementing any observability solution, define your measurement landscape:\n*   **What metrics matter?** Pipeline duration, job success rates, queue times, runner utilization\n*   **Who needs visibility?** Developers, DevOps engineers, platform teams, leadership\n*   **What decisions will this drive?** Infrastructure investment, bottleneck remediation, capacity planning\n\n## Solution architecture: A full set of dashboards for observability\nOnce deployed, the observability stack provides a set of Grafana dashboards that give real-time and historical visibility into your CI/CD platform. A typical deployment includes:\n*   **Pipeline Overview Dashboard:** A top-level view showing total pipeline runs, success/failure rates over time (as stacked bar or time-series charts), and average pipeline duration trends. Panels use color-coded status indicators (green for success, red for failure, amber for cancelled) so platform teams can spot degradation at a glance.\n*   **Job Performance Dashboard:** Drill-down panels showing individual job duration distributions (histogram), the top 10 slowest jobs by average duration, and job failure heatmaps by project and stage. This is where teams identify specific bottleneck jobs worth optimizing.\n*   **Runner & Infrastructure Dashboard:** Combines Node Exporter host metrics (CPU, memory, disk) with pipeline queue-time data to correlate infrastructure saturation with pipeline wait times. Useful for capacity planning decisions such as scaling runner pools or upgrading instance sizes.\n*   **Deployment Frequency Dashboard:** Tracks deployment count and deployment duration over time per environment, aligned with DORA metrics. Helps engineering leadership assess delivery throughput and environment drift (commits behind main).\n\nEach dashboard is provisioned automatically via Grafana's file-based provisioning, so it deploys consistently across environments. The dashboards can be further customized with Grafana variables to filter by project, ref/branch, or time range.\n\n![Solution architecture](https://res.cloudinary.com/about-gitlab-com/image/upload/v1777382608/Blog/Imported/blog-building-ci-cd-observability-stack-for-gitlab-self-managed/image1.png)\n\nThe solution requires two exporters:\n*   **Pipeline Exporter:** Collects CI/CD metrics via GitLab API (pipeline duration, job status, deployments)\n*   **Node Exporter:** Collects host-level metrics (CPU, memory, disk) for infrastructure correlation\n\n**Prerequisites:**\n*   GitLab Self-Managed Version 18.1+\n*   **Container orchestration platform:** A Kubernetes cluster (recommended for enterprise deployments) or a container runtime such as Docker/Podman for smaller scale or proof-of-concept environments. The primary deployment guide below targets Kubernetes; a Docker Compose alternative is provided in the appendix for local testing and evaluation\n*   GitLab Personal Access Token (**read_api** scope)\n\n## Kubernetes deployment (recommended)\nFor enterprise environments, deploy each component as a separate Deployment within a dedicated namespace. This approach integrates with existing cluster infrastructure, secrets management, and network policies.\n\n### 1. Create namespace and secret\n```bash\nkubectl create namespace gitlab-observability\n\n# Create the GitLab token secret (see Secrets Management section below\n# for enterprise-grade approaches using external secret operators)\nkubectl create secret generic gitlab-token \\\n  --from-literal=token=glpat-xxxxxxxxxxxx \\\n  -n gitlab-observability\n```\n\n\n### 2. Deploy the Pipeline Exporter\n```yaml\n# exporter-deployment.yaml\napiVersion: apps/v1\nkind: Deployment\nmetadata:\n  name: gitlab-ci-pipelines-exporter\n  namespace: gitlab-observability\nspec:\n  replicas: 1\n  selector:\n    matchLabels:\n      app: gitlab-ci-pipelines-exporter\n  template:\n    metadata:\n      labels:\n        app: gitlab-ci-pipelines-exporter\n    spec:\n      containers:\n        - name: exporter\n          image: mvisonneau/gitlab-ci-pipelines-exporter:latest\n          ports:\n            - containerPort: 8080\n          env:\n            - name: GCPE_GITLAB_TOKEN\n              valueFrom:\n                secretKeyRef:\n                  name: gitlab-token\n                  key: token\n            - name: GCPE_CONFIG\n              value: /etc/gcpe/config.yml\n          volumeMounts:\n            - name: config\n              mountPath: /etc/gcpe\n      volumes:\n        - name: config\n          configMap:\n            name: gcpe-config\n---\napiVersion: v1\nkind: Service\nmetadata:\n  name: gitlab-ci-pipelines-exporter\n  namespace: gitlab-observability\nspec:\n  selector:\n    app: gitlab-ci-pipelines-exporter\n  ports:\n    - port: 8080\n      targetPort: 8080\n```\n\n### 3. Deploy Node Exporter (DaemonSet)\n```yaml\n# node-exporter-daemonset.yaml\napiVersion: apps/v1\nkind: DaemonSet\nmetadata:\n  name: node-exporter\n  namespace: gitlab-observability\nspec:\n  selector:\n    matchLabels:\n      app: node-exporter\n  template:\n    metadata:\n      labels:\n        app: node-exporter\n    spec:\n      containers:\n        - name: node-exporter\n          image: prom/node-exporter:latest\n          ports:\n            - containerPort: 9100\n---\napiVersion: v1\nkind: Service\nmetadata:\n  name: node-exporter\n  namespace: gitlab-observability\nspec:\n  selector:\n    app: node-exporter\n  ports:\n    - port: 9100\n      targetPort: 9100\n```\n\n### 4. Deploy Prometheus\n```yaml\n# prometheus-deployment.yaml\napiVersion: apps/v1\nkind: Deployment\nmetadata:\n  name: prometheus\n  namespace: gitlab-observability\nspec:\n  replicas: 1\n  selector:\n    matchLabels:\n      app: prometheus\n  template:\n    metadata:\n      labels:\n        app: prometheus\n    spec:\n      containers:\n        - name: prometheus\n          image: prom/prometheus:latest\n          ports:\n            - containerPort: 9090\n          volumeMounts:\n            - name: config\n              mountPath: /etc/prometheus\n      volumes:\n        - name: config\n          configMap:\n            name: prometheus-config\n---\napiVersion: v1\nkind: Service\nmetadata:\n  name: prometheus\n  namespace: gitlab-observability\nspec:\n  selector:\n    app: prometheus\n  ports:\n    - port: 9090\n      targetPort: 9090\n```\n\n### 5. Deploy Grafana\nThe Grafana deployment below starts with authentication disabled (`GF_AUTH_ANONYMOUS_ENABLED: true`) for initial setup convenience.\n\n**This setting allows anyone with network access to view all dashboards without logging in.** For production deployments, remove this variable or set it to false and configure a proper authentication provider (LDAP, SAML/SSO, or OAuth) to restrict access to authorized users.\n```yaml\n# grafana-deployment.yaml\napiVersion: apps/v1\nkind: Deployment\nmetadata:\n  name: grafana\n  namespace: gitlab-observability\nspec:\n  replicas: 1\n  selector:\n    matchLabels:\n      app: grafana\n  template:\n    metadata:\n      labels:\n        app: grafana\n    spec:\n      containers:\n        - name: grafana\n          image: grafana/grafana:10.0.0\n          ports:\n            - containerPort: 3000\n          env:\n            # REMOVE or set to 'false' for production.\n            # When 'true', any user with network access can\n            # view dashboards without authentication.\n            - name: GF_AUTH_ANONYMOUS_ENABLED\n              value: 'true'\n          volumeMounts:\n            - name: dashboards-provider\n              mountPath: /etc/grafana/provisioning/dashboards\n            - name: datasources\n              mountPath: /etc/grafana/provisioning/datasources\n            - name: dashboards\n              mountPath: /var/lib/grafana/dashboards\n      volumes:\n        - name: dashboards-provider\n          configMap:\n            name: grafana-dashboards-provider\n        - name: datasources\n          configMap:\n            name: grafana-datasources\n        - name: dashboards\n          configMap:\n            name: grafana-dashboards\n---\napiVersion: v1\nkind: Service\nmetadata:\n  name: grafana\n  namespace: gitlab-observability\nspec:\n  selector:\n    app: grafana\n  ports:\n    - port: 3000\n      targetPort: 3000\n```\n\n### 6. Set network policy\nRestrict inter-pod traffic to only the required communication paths:\n```yaml\n# network-policy.yaml\napiVersion: networking.k8s.io/v1\nkind: NetworkPolicy\nmetadata:\n  name: observability-policy\n  namespace: gitlab-observability\nspec:\n  podSelector: {}\n  policyTypes:\n    - Ingress\n  ingress:\n    # Prometheus scrapes exporter and node-exporter\n    - from:\n        - podSelector:\n            matchLabels:\n              app: prometheus\n      ports:\n        - port: 8080\n        - port: 9100\n    # Grafana queries Prometheus\n    - from:\n        - podSelector:\n            matchLabels:\n              app: grafana\n      ports:\n        - port: 9090\n```\n\n### 7. Validate\n```bash\nkubectl get pods -n gitlab-observability\nkubectl port-forward svc/grafana 3000:3000 -n gitlab-observability\ncurl http://localhost:3000/api/health\n```\n\n## Configuration reference\n### Exporter configuration\n```yaml\n# gitlab-ci-pipelines-exporter.yml (ConfigMap: gcpe-config)\nlog:\n  level: info\ngitlab:\n  url: https://gitlab.your-domain.com\n  maximum_requests_per_second: 10\nproject_defaults:\n  pull:\n    pipeline:\n      jobs:\n        enabled: true\nwildcards:\n  - owner:\n      name: your-group-name\n      kind: group\n    archived: false\n```\n\n### Prometheus configuration\n```yaml\n# prometheus.yml (ConfigMap: prometheus-config)\nglobal:\n  scrape_interval: 15s\nscrape_configs:\n  - job_name: 'gitlab-ci-pipelines-exporter'\n    static_configs:\n      - targets: ['gitlab-ci-pipelines-exporter:8080']\n  - job_name: 'node-exporter'\n    static_configs:\n      - targets: ['node-exporter:9100']\n```\n\n### Grafana data sources\n```yaml\n# datasources.yml (ConfigMap: grafana-datasources)\napiVersion: 1\ndatasources:\n  - name: Prometheus\n    type: prometheus\n    access: proxy\n    url: http://prometheus:9090\n    isDefault: true\n# dashboards.yml (ConfigMap: grafana-dashboards-provider)\napiVersion: 1\nproviders:\n  - name: 'default'\n    folder: 'GitLab CI/CD'\n    type: file\n    options:\n      path: /var/lib/grafana/dashboards\n```\n\n## Key metrics\n### Pipeline Exporter metrics\n| Metric | Description |\n| :---- | :---- |\n| `gitlab_ci_pipeline_duration_seconds` | Pipeline execution time |\n| `gitlab_ci_pipeline_status` | Pipeline success/failure by project |\n| `gitlab_ci_pipeline_job_duration_seconds` | Individual job execution time |\n| `gitlab_ci_pipeline_job_status` | Job success/failure status |\n| `gitlab_ci_pipeline_job_artifact_size_bytes` | Artifact storage consumption |\n| `gitlab_ci_pipeline_coverage` | Code coverage percentage |\n| `gitlab_ci_environment_deployment_count` | Deployment frequency |\n| `gitlab_ci_environment_deployment_duration_seconds` | Deployment execution time |\n| `gitlab_ci_environment_behind_commits_count` | Environment drift from main |\n\n### Node Exporter metrics\n| Metric | Description |\n| :---- | :---- |\n| `node_cpu_seconds_total` | CPU utilization |\n| `node_memory_MemAvailable_bytes` | Available memory |\n| `node_filesystem_avail_bytes` | Disk space available |\n| `node_load1` | 1-minute load average |\n\n## Troubleshooting\n### Air-gapped Grafana plugin installation\nFor offline environments, install plugins manually. Example for Kubernetes:\n```bash\n# Copy plugin zip into the Grafana pod\nkubectl cp grafana-polystat-panel-2.1.16.zip \\\n  gitlab-observability/grafana-\u003Cpod-id>:/tmp/\n# Extract plugin\nkubectl exec -it -n gitlab-observability deploy/grafana -- \\\n  sh -c \"unzip /tmp/grafana-polystat-panel-2.1.16.zip -d /var/lib/grafana/plugins/\"\n# Restart Grafana pod\nkubectl rollout restart deployment/grafana -n gitlab-observability\n# Verify installation\nkubectl exec -it -n gitlab-observability deploy/grafana -- \\\n  ls -al /var/lib/grafana/plugins/\n```\n\n## Enterprise considerations\nFor regulated industries, ensure:\n*   **Token security:** Store GitLab Personal Access Tokens in a dedicated secrets manager rather than hardcoded in ConfigMaps. Enforce token rotation policies and limit scope to **read\\_api** only.\n*   **Network segmentation:** Deploy behind a reverse proxy with TLS termination. In Kubernetes, use an Ingress controller with automated certificate provisioning.\n*   **Authentication:** Configure Grafana with your organization's identity provider (SAML, LDAP, or OAuth/OIDC) to enforce role-based access control on dashboards.\n\n## Why GitLab?\nGitLab's API-first design enables custom observability solutions that complement native capabilities like Value Stream Analytics and DORA metrics. The open architecture allows organizations to integrate proven open-source tooling — like the gitlab-ci-pipelines-exporter — directly with their existing enterprise infrastructure, without disrupting established workflows.\n\nAs your observability maturity grows, GitLab's built-in Observability capabilities provide a natural next step — offering deeper, integrated visibility without additional tooling. Learn more about what's available natively in the platform for [GitLab Observability](https://docs.gitlab.com/operations/observability/observability/).\n",[112,743,744],"product","tutorial",{"featured":16,"template":17,"slug":746},"how-to-build-ci-cd-observability-at-scale",{"content":748,"config":758},{"body":749,"title":750,"description":751,"authors":752,"heroImage":754,"date":755,"category":13,"tags":756},"Most CI/CD tools can run a build and ship a deployment. Where they diverge is what happens when your delivery needs get real: a monorepo with a dozen services, microservices spread across multiple repositories, deployments to dozens of environments, or a platform team trying to enforce standards without becoming a bottleneck.\n  \nGitLab's pipeline execution model was designed for that complexity. Parent-child pipelines, DAG execution, dynamic pipeline generation, multi-project triggers, merge request pipelines with merged results, and CI/CD Components each solve a distinct class of problems. Because they compose, understanding the full model unlocks something more than a faster pipeline. In this article, you'll learn about the five patterns where that model stands out, each mapped to a real engineering scenario with the configuration to match.\n  \nThe configs below are illustrative. The scripts use echo commands to keep the signal-to-noise ratio low. Swap them out for your actual build, test, and deploy steps and they are ready to use.\n\n\n## 1. Monorepos: Parent-child pipelines + DAG execution\n\n\nThe problem: Your monorepo has a frontend, a backend, and a docs site. Every commit triggers a full rebuild of everything, even when only a README changed.\n\n\nGitLab solves this with two complementary features: [parent-child pipelines](https://docs.gitlab.com/ci/pipelines/downstream_pipelines/#parent-child-pipelines) (which let a top-level pipeline spawn isolated sub-pipelines) and [DAG execution via `needs`](https://docs.gitlab.com/ci/yaml/#needs) (which breaks rigid stage-by-stage ordering and lets jobs start the moment their dependencies finish).\n\n\nA parent pipeline detects what changed and triggers only the relevant child pipelines:\n\n```yaml\n# .gitlab-ci.yml\nstages:\n  - trigger\n\ntrigger-services:\n  stage: trigger\n  trigger:\n    include:\n      - local: '.gitlab/ci/api-service.yml'\n      - local: '.gitlab/ci/web-service.yml'\n      - local: '.gitlab/ci/worker-service.yml'\n    strategy: depend\n```\n\n\nEach child pipeline is a fully independent pipeline with its own stages, jobs, and artifacts. The parent waits for all of them via [strategy: depend](https://docs.gitlab.com/ci/pipelines/downstream_pipelines/#wait-for-downstream-pipeline-to-complete) so you get a single green/red signal at the top level, with full drill-down into each service's pipeline. This organizational separation is the bigger win for large teams: each service owns its pipeline config, changes in one cannot break another, and the complexity stays manageable as the repo grows.\n\n\nOne thing worth knowing: when you pass [multiple files to a single `trigger: include:`](https://docs.gitlab.com/ci/pipelines/downstream_pipelines/#combine-multiple-child-pipeline-configuration-files), GitLab merges them into a single child pipeline configuration. This means jobs defined across those files share the same pipeline context and can reference each other with `needs:`, which is what makes the DAG optimization possible. If you split them into separate trigger jobs instead, each would be its own isolated pipeline and cross-file `needs:` references would not work.\n\n\nCombine this with `needs:` inside each child pipeline and you get DAG execution. Your integration tests can start the moment the build finishes, without waiting for other jobs in the same stage.\n\n```yaml\n# .gitlab/ci/api-service.yml\nstages:\n  - build\n  - test\n\nbuild-api:\n  stage: build\n  script:\n    - echo \"Building API service\"\n\ntest-api:\n  stage: test\n  needs: [build-api]\n  script:\n    - echo \"Running API tests\"\n```\n\n\nWhy it matters: Teams with large monorepos typically report significant reductions in pipeline runtime after switching to DAG execution, since jobs no longer wait on unrelated work in the same stage. Parent-child pipelines add the organizational layer that keeps the configuration maintainable as the repo and team grow.\n\n![Local downstream pipelines](https://res.cloudinary.com/about-gitlab-com/image/upload/v1775738759/Blog/Imported/hackathon-fake-blog-post-s/image3_vwj3rz.png \"Local downstream pipelines\")\n\n## 2. Microservices: Cross-repo, multi-project pipelines\n\n\nThe problem: Your frontend lives in one repo, your backend in another. When the frontend team ships a change, they have no visibility into whether it broke the backend integration and vice versa.\n\n\nGitLab's [multi-project pipelines](https://docs.gitlab.com/ci/pipelines/downstream_pipelines/#multi-project-pipelines) let one project trigger a pipeline in a completely separate project and wait for the result. The triggering project gets a linked downstream pipeline right in its own pipeline view.\n\n\nThe frontend pipeline builds an API contract artifact and publishes it, then triggers the backend pipeline. The backend fetches that artifact directly using the [Jobs API](https://docs.gitlab.com/api/jobs/#download-a-single-artifact-file-from-specific-tag-or-branch) and validates it before allowing anything to proceed. If a breaking change is detected, the backend pipeline fails and the frontend pipeline fails with it.\n\n```yaml\n# frontend repo: .gitlab-ci.yml\nstages:\n  - build\n  - test\n  - trigger-backend\n\nbuild-frontend:\n  stage: build\n  script:\n    - echo \"Building frontend and generating API contract...\"\n    - mkdir -p dist\n    - |\n      echo '{\n        \"api_version\": \"v2\",\n        \"breaking_changes\": false\n      }' > dist/api-contract.json\n    - cat dist/api-contract.json\n  artifacts:\n    paths:\n      - dist/api-contract.json\n    expire_in: 1 hour\n\ntest-frontend:\n  stage: test\n  script:\n    - echo \"All frontend tests passed!\"\n\ntrigger-backend-pipeline:\n  stage: trigger-backend\n  trigger:\n    project: my-org/backend-service\n    branch: main\n    strategy: depend\n  rules:\n    - if: $CI_COMMIT_BRANCH == \"main\"\n```\n\n```yaml\n# backend repo: .gitlab-ci.yml\nstages:\n  - build\n  - test\n\nbuild-backend:\n  stage: build\n  script:\n    - echo \"All backend tests passed!\"\n\nintegration-test:\n  stage: test\n  rules:\n    - if: $CI_PIPELINE_SOURCE == \"pipeline\"\n  script:\n    - echo \"Fetching API contract from frontend...\"\n    - |\n      curl --silent --fail \\\n        --header \"JOB-TOKEN: $CI_JOB_TOKEN\" \\\n        --output api-contract.json \\\n        \"${CI_API_V4_URL}/projects/${FRONTEND_PROJECT_ID}/jobs/artifacts/main/raw/dist/api-contract.json?job=build-frontend\"\n    - cat api-contract.json\n    - |\n      if grep -q '\"breaking_changes\": true' api-contract.json; then\n        echo \"FAIL: Breaking API changes detected - backend integration blocked!\"\n        exit 1\n      fi\n      echo \"PASS: API contract is compatible!\"\n```\n\n\nA few things worth noting in this config. The `integration-test` job uses `$CI_PIPELINE_SOURCE == \"pipeline\"` to ensure it only runs when triggered by an upstream pipeline, not on a standalone push to the backend repo. The frontend project ID is referenced via `$FRONTEND_PROJECT_ID`, which should be set as a [CI/CD variable](https://docs.gitlab.com/ci/variables/) in the backend project settings to avoid hardcoding it.\n\n\nWhy it matters: Cross-service breakage that previously surfaced in production gets caught in the pipeline instead. The dependency between services stops being invisible and becomes something teams can see, track, and act on.\n\n\n![Cross-project pipelines](https://res.cloudinary.com/about-gitlab-com/image/upload/v1775738762/Blog/Imported/hackathon-fake-blog-post-s/image4_h6mfsb.png \"Cross-project pipelines\")\n\n\n## 3. Multi-tenant / matrix deployments: Dynamic child pipelines\n\n\nThe problem: You deploy the same application to 15 customer environments, or three cloud regions, or dev/staging/prod. Updating a deploy stage across all of them one by one is the kind of work that leads to configuration drift. Writing a separate pipeline for each environment is unmaintainable from day one.\n\n\nGitLab's [dynamic child pipelines](https://docs.gitlab.com/ci/pipelines/downstream_pipelines/#dynamic-child-pipelines) let you generate a pipeline at runtime. A job runs a script that produces a YAML file, and that YAML becomes the pipeline for the next stage. The pipeline structure itself becomes data.\n\n\n```yaml\n# .gitlab-ci.yml\nstages:\n  - generate\n  - trigger-environments\n\ngenerate-config:\n  stage: generate\n  script:\n    - |\n      # ENVIRONMENTS can be passed as a CI variable or read from a config file.\n      # Default to dev, staging, prod if not set.\n      ENVIRONMENTS=${ENVIRONMENTS:-\"dev staging prod\"}\n      for ENV in $ENVIRONMENTS; do\n        cat > ${ENV}-pipeline.yml \u003C\u003C EOF\n      stages:\n        - deploy\n        - verify\n      deploy-${ENV}:\n        stage: deploy\n        script:\n          - echo \"Deploying to ${ENV} environment\"\n      verify-${ENV}:\n        stage: verify\n        script:\n          - echo \"Running smoke tests on ${ENV}\"\n      EOF\n      done\n  artifacts:\n    paths:\n      - \"*.yml\"\n    exclude:\n      - \".gitlab-ci.yml\"\n\n.trigger-template:\n  stage: trigger-environments\n  trigger:\n    strategy: depend\n\ntrigger-dev:\n  extends: .trigger-template\n  trigger:\n    include:\n      - artifact: dev-pipeline.yml\n        job: generate-config\n\ntrigger-staging:\n  extends: .trigger-template\n  needs: [trigger-dev]\n  trigger:\n    include:\n      - artifact: staging-pipeline.yml\n        job: generate-config\n\ntrigger-prod:\n  extends: .trigger-template\n  needs: [trigger-staging]\n  trigger:\n    include:\n      - artifact: prod-pipeline.yml\n        job: generate-config\n  when: manual\n```\n\n\nThe generation script loops over an `ENVIRONMENTS` variable rather than hardcoding each environment separately. Pass in a different list via a CI variable or read it from a config file and the pipeline adapts without touching the YAML. The trigger jobs use [extends:](https://docs.gitlab.com/ci/yaml/#extends) to inherit shared configuration from `.trigger-template`, so `strategy: depend` is defined once rather than repeated on every trigger job. Add a new environment by updating the variable, not by duplicating pipeline config. Add [when: manual](https://docs.gitlab.com/ci/yaml/#when) to the production trigger and you get a promotion gate baked right into the pipeline graph.\n\n\nWhy it matters: SaaS companies and platform teams use this pattern to manage dozens of environments without duplicating pipeline logic. The pipeline structure itself stays lean as the deployment matrix grows.\n\n\n![Dynamic pipeline](https://res.cloudinary.com/about-gitlab-com/image/upload/v1775738765/Blog/Imported/hackathon-fake-blog-post-s/image7_wr0kx2.png \"Dynamic pipeline\")\n\n\n## 4. MR-first delivery: Merge request pipelines, merged results, and workflow routing\n\n\nThe problem: Your pipeline runs on every push to every branch. Expensive tests run on feature branches that will never merge. Meanwhile, you have no guarantee that what you tested is actually what will land on `main` after a merge.\n\n\nGitLab has three interlocking features that solve this together:\n\n\n*   [Merge request pipelines](https://docs.gitlab.com/ci/pipelines/merge_request_pipelines/) run only when a merge request exists, not on every branch push. This alone eliminates a significant amount of wasted compute.\n\n*   [Merged results pipelines](https://docs.gitlab.com/ci/pipelines/merged_results_pipelines/) go further. GitLab creates a temporary merge commit (your branch plus the current target branch) and runs the pipeline against that. You are testing what will actually exist after the merge, not just your branch in isolation.\n\n*   [Workflow rules](https://docs.gitlab.com/ci/yaml/workflow/) let you define exactly which pipeline type runs under which conditions and suppress everything else. The `$CI_OPEN_MERGE_REQUESTS` guard below prevents duplicate pipelines firing for both a branch and its open MR simultaneously.\n\n\nWith those three working together, here is what a tiered pipeline looks like:\n\n```yaml\n# .gitlab-ci.yml\nworkflow:\n  rules:\n    - if: $CI_PIPELINE_SOURCE == \"merge_request_event\"\n    - if: $CI_COMMIT_BRANCH && $CI_OPEN_MERGE_REQUESTS\n      when: never\n    - if: $CI_COMMIT_BRANCH\n    - if: $CI_PIPELINE_SOURCE == \"schedule\"\n\nstages:\n  - fast-checks\n  - expensive-tests\n  - deploy\n\nlint-code:\n  stage: fast-checks\n  script:\n    - echo \"Running linter\"\n  rules:\n    - if: $CI_PIPELINE_SOURCE == \"push\"\n    - if: $CI_PIPELINE_SOURCE == \"merge_request_event\"\n    - if: $CI_COMMIT_BRANCH == \"main\"\n\nunit-tests:\n  stage: fast-checks\n  script:\n    - echo \"Running unit tests\"\n  rules:\n    - if: $CI_PIPELINE_SOURCE == \"push\"\n    - if: $CI_PIPELINE_SOURCE == \"merge_request_event\"\n    - if: $CI_COMMIT_BRANCH == \"main\"\n\nintegration-tests:\n  stage: expensive-tests\n  script:\n    - echo \"Running integration tests (15 min)\"\n  rules:\n    - if: $CI_PIPELINE_SOURCE == \"merge_request_event\"\n    - if: $CI_COMMIT_BRANCH == \"main\"\n\ne2e-tests:\n  stage: expensive-tests\n  script:\n    - echo \"Running E2E tests (30 min)\"\n  rules:\n    - if: $CI_PIPELINE_SOURCE == \"merge_request_event\"\n    - if: $CI_COMMIT_BRANCH == \"main\"\n\nnightly-comprehensive-scan:\n  stage: expensive-tests\n  script:\n    - echo \"Running full nightly suite (2 hours)\"\n  rules:\n    - if: $CI_PIPELINE_SOURCE == \"schedule\"\n\ndeploy-production:\n  stage: deploy\n  script:\n    - echo \"Deploying to production\"\n  rules:\n    - if: $CI_COMMIT_BRANCH == \"main\"\n      when: manual\n```\n\nWith this setup, the pipeline behaves differently depending on context. A push to a feature branch with no open MR runs lint and unit tests only. Once an MR is opened, the workflow rules switch from a branch pipeline to an MR pipeline, and the full integration and E2E suite runs against the merged result. Merging to `main` queues a manual production deployment. A nightly schedule runs the comprehensive scan once, not on every commit.\n\n\nWhy it matters: Teams routinely cut CI costs significantly with this pattern, not by running fewer tests, but by running the right tests at the right time. Merged results pipelines catch the class of bugs that only appear after a merge, before they ever reach `main`.\n\n\n![Conditional pipelines (within a branch with no MR)](https://res.cloudinary.com/about-gitlab-com/image/upload/v1775738768/Blog/Imported/hackathon-fake-blog-post-s/image6_dnfcny.png \"Conditional pipelines (within a branch with no MR)\")\n\n\n\n![Conditional pipelines (within an MR)](https://res.cloudinary.com/about-gitlab-com/image/upload/v1775738772/Blog/Imported/hackathon-fake-blog-post-s/image1_wyiafu.png \"Conditional pipelines (within an MR)\")\n\n\n\n![Conditional pipelines (on the main branch)](https://res.cloudinary.com/about-gitlab-com/image/upload/v1775738774/Blog/Imported/hackathon-fake-blog-post-s/image5_r6lkfd.png \"Conditional pipelines (on the main branch)\")\n\n## 5. Governed pipelines: CI/CD Components\n\n\nThe problem: Your platform team has defined the right way to build, test, and deploy. But every team has their own `.gitlab-ci.yml` with subtle variations. Security scanning gets skipped. Deployment standards drift. Audits are painful.\n\n\nGitLab [CI/CD Components](https://docs.gitlab.com/ci/components/) let platform teams publish versioned, reusable pipeline building blocks. Application teams consume them with a single `include:` line and optional inputs — no copy-paste, no drift. Components are discoverable through the [CI/CD Catalog](https://docs.gitlab.com/ci/components/#cicd-catalog), which means teams can find and adopt approved building blocks without needing to go through the platform team directly.\n\n\nHere is a component definition from a shared library:\n\n```yaml\n# templates/deploy.yml\nspec:\n  inputs:\n    stage:\n      default: deploy\n    environment:\n      default: production\n---\ndeploy-job:\n  stage: $[[ inputs.stage ]]\n  script:\n    - echo \"Deploying $APP_NAME to $[[ inputs.environment ]]\"\n    - echo \"Deploy URL: $DEPLOY_URL\"\n  environment:\n    name: $[[ inputs.environment ]]\n```\nAnd here is how an application team consumes it:\n\n```yaml\n# Application repo: .gitlab-ci.yml\nvariables:\n  APP_NAME: \"my-awesome-app\"\n  DEPLOY_URL: \"https://api.example.com\"\n\ninclude:\n  - component: gitlab.com/my-org/component-library/build@v1.0.6\n  - component: gitlab.com/my-org/component-library/test@v1.0.6\n  - component: gitlab.com/my-org/component-library/deploy@v1.0.6\n    inputs:\n      environment: staging\n\nstages:\n  - build\n  - test\n  - deploy\n```\n\nThree lines of `include:` replace hundreds of lines of duplicated YAML. The platform team can push a security fix to `v1.0.7` and teams opt in on their own schedule — or the platform team can pin everyone to a minimum version. Either way, one change propagates everywhere instead of needing to be applied repo by repo.\n\n\nPair this with [resource groups](https://docs.gitlab.com/ci/resource_groups/) to prevent concurrent deployments to the same environment, and [protected environments](https://docs.gitlab.com/ci/environments/protected_environments/) to enforce approval gates - and you have a governed delivery platform where compliance is the default, not the exception.\n\n\nWhy it matters: This is the pattern that makes GitLab CI/CD scale across hundreds of teams. Platform engineering teams enforce compliance without becoming a bottleneck. Application teams get a fast path to a working pipeline without reinventing the wheel.\n\n\n![Component pipeline (imported jobs)](https://res.cloudinary.com/about-gitlab-com/image/upload/v1775738776/Blog/Imported/hackathon-fake-blog-post-s/image2_pizuxd.png \"Component pipeline (imported jobs)\")\n\n## Putting it all together\n\nNone of these features exist in isolation. The reason GitLab's pipeline model is worth understanding deeply is that these primitives compose:\n\n*   A monorepo uses parent-child pipelines, and each child uses DAG execution\n\n*   A microservices platform uses multi-project pipelines, and each project uses MR pipelines with merged results\n\n*   A governed platform uses CI/CD components to standardize the patterns above across every team\n\n\nMost teams discover one of these features when they hit a specific pain point. The ones who invest in understanding the full model end up with a delivery system that actually reflects how their engineering organization works, not a pipeline that fights it.\n\n## Other patterns worth exploring\n\n\nThe five patterns above cover the most common structural pain points, but GitLab's pipeline model goes further. A few others worth looking into as your needs grow:\n\n\n*   [Review apps with dynamic environments](https://docs.gitlab.com/ci/environments/) let you spin up a live preview for every feature branch and tear it down automatically when the MR closes. Useful for teams doing frontend work or API changes that need stakeholder sign-off before merging.\n\n*   [Caching and artifact strategies](https://docs.gitlab.com/ci/caching/) are often the fastest way to cut pipeline runtime after the structural work is done. Structuring `cache:` keys around dependency lockfiles and being deliberate about what gets passed between jobs with [artifacts:](https://docs.gitlab.com/ci/yaml/#artifacts) can make a significant difference without changing your pipeline shape at all.\n\n*   [Scheduled and API-triggered pipelines](https://docs.gitlab.com/ci/pipelines/schedules/) are worth knowing about because not everything should run on a code push. Nightly security scans, compliance reports, and release automation are better modeled as scheduled or [API-triggered](https://docs.gitlab.com/ci/triggers/) pipelines with `$CI_PIPELINE_SOURCE` routing the right jobs for each context.\n\n## How to get started\n\nModern software delivery is complex. Teams are managing monorepos with dozens of services, coordinating across multiple repositories, deploying to many environments at once, and trying to keep standards consistent as organizations grow. GitLab's pipeline model was built with all of that in mind.\n\nWhat makes it worth investing time in is how well the pieces fit together. Parent-child pipelines bring structure to large codebases. Multi-project pipelines make cross-team dependencies visible and testable. Dynamic pipelines turn environment management into something that scales gracefully. MR-first delivery with merged results ensures confidence at every step of the review process. And CI/CD Components give platform teams a way to share best practices across an entire organization without becoming a bottleneck.\n\nEach of these features is powerful on its own, and even more so when combined. GitLab gives you the building blocks to design a delivery system that fits how your team actually works, and grows with you as your needs evolve.\n\n> [Start a free trial of GitLab Ultimate](https://about.gitlab.com/free-trial/) to use pipeline logic today.\n\n## Read more\n\n*   [Variable and artifact sharing in GitLab parent-child pipelines](https://about.gitlab.com/blog/variable-and-artifact-sharing-in-gitlab-parent-child-pipelines/)\n*   [CI/CD inputs: Secure and preferred method to pass parameters to a pipeline](https://about.gitlab.com/blog/ci-cd-inputs-secure-and-preferred-method-to-pass-parameters-to-a-pipeline/)\n*   [Tutorial: How to set up your first GitLab CI/CD component](https://about.gitlab.com/blog/tutorial-how-to-set-up-your-first-gitlab-ci-cd-component/)\n*   [How to include file references in your CI/CD components](https://about.gitlab.com/blog/how-to-include-file-references-in-your-ci-cd-components/)\n*   [FAQ: GitLab CI/CD Catalog](https://about.gitlab.com/blog/faq-gitlab-ci-cd-catalog/)\n*   [Building a GitLab CI/CD pipeline for a monorepo the easy way](https://about.gitlab.com/blog/building-a-gitlab-ci-cd-pipeline-for-a-monorepo-the-easy-way/)\n*   [A CI/CD component builder's journey](https://about.gitlab.com/blog/a-ci-component-builders-journey/)\n*   [CI/CD Catalog goes GA: No more building pipelines from scratch](https://about.gitlab.com/blog/ci-cd-catalog-goes-ga-no-more-building-pipelines-from-scratch/)","5 ways GitLab pipeline logic solves real engineering problems","Learn how to scale CI/CD with composable patterns for monorepos, microservices, environments, and governance.",[753],"Omid Khan","https://res.cloudinary.com/about-gitlab-com/image/upload/v1772721753/frfsm1qfscwrmsyzj1qn.png","2026-04-09",[112,757,744,25],"DevOps platform",{"featured":30,"template":17,"slug":759},"5-ways-gitlab-pipeline-logic-solves-real-engineering-problems",{"content":761,"config":770},{"title":762,"description":763,"authors":764,"heroImage":766,"date":767,"body":768,"category":13,"tags":769},"How to use GitLab Container Virtual Registry with Docker Hardened Images","Learn how to simplify container image management with this step-by-step guide.",[765],"Tim Rizzi","https://res.cloudinary.com/about-gitlab-com/image/upload/v1772111172/mwhgbjawn62kymfwrhle.png","2026-03-12","If you're a platform engineer, you've probably had this conversation:\n  \n*\"Security says we need to use hardened base images.\"*\n\n*\"Great, where do I configure credentials for yet another registry?\"*\n\n*\"Also, how do we make sure everyone actually uses them?\"*\n\nOr this one:\n\n*\"Why are our builds so slow?\"*\n\n*\"We're pulling the same 500MB image from Docker Hub in every single job.\"*\n\n*\"Can't we just cache these somewhere?\"*\n\nI've been working on [Container Virtual Registry](https://docs.gitlab.com/user/packages/virtual_registry/container/) at GitLab specifically to solve these problems. It's a pull-through cache that sits in front of your upstream registries — Docker Hub, dhi.io (Docker Hardened Images), MCR, and Quay — and gives your teams a single endpoint to pull from. Images get cached on the first pull. Subsequent pulls come from the cache. Your developers don't need to know or care which upstream a particular image came from.\n\nThis article shows you how to set up Container Virtual Registry, specifically with Docker Hardened Images in mind, since that's a combination that makes a lot of sense for teams concerned about security and not making their developers' lives harder.\n\n## What problem are we actually solving?\n\nThe Platform teams I usually talk to manage container images across three to five registries:\n\n* **Docker Hub** for most base images\n* **dhi.io** for Docker Hardened Images (security-conscious workloads)\n* **MCR** for .NET and Azure tooling\n* **Quay.io** for Red Hat ecosystem stuff\n* **Internal registries** for proprietary images\n\nEach one has its own:\n\n* Authentication mechanism\n* Network latency characteristics\n* Way of organizing image paths\n\nYour CI/CD configs end up littered with registry-specific logic. Credential management becomes a project unto itself. And every pipeline job pulls the same base images over the network, even though they haven't changed in weeks.\n\nContainer Virtual Registry consolidates this. One registry URL. One authentication flow (GitLab's). Cached images are served from GitLab's infrastructure rather than traversing the internet each time.\n\n## How it works\n\nThe model is straightforward:\n\n```text\nYour pipeline pulls:\n  gitlab.com/virtual_registries/container/1000016/python:3.13\n\nVirtual registry checks:\n  1. Do I have this cached? → Return it\n  2. No? → Fetch from upstream, cache it, return it\n\n```\n\nYou configure upstreams in priority order. When a pull request comes in, the virtual registry checks each upstream until it finds the image. The result gets cached for a configurable period (default 24 hours).\n\n```text\n┌─────────────────────────────────────────────────────────┐\n│                    CI/CD Pipeline                       │\n│                          │                              │\n│                          ▼                              │\n│   gitlab.com/virtual_registries/container/\u003Cid>/image   │\n└─────────────────────────────────────────────────────────┘\n                           │\n                           ▼\n┌─────────────────────────────────────────────────────────┐\n│            Container Virtual Registry                   │\n│                                                         │\n│  Upstream 1: Docker Hub ────────────────┐               │\n│  Upstream 2: dhi.io (Hardened) ────────┐│               │\n│  Upstream 3: MCR ─────────────────────┐││               │\n│  Upstream 4: Quay.io ────────────────┐│││               │\n│                                      ││││               │\n│                    ┌─────────────────┴┴┴┴──┐            │\n│                    │        Cache          │            │\n│                    │  (manifests + layers) │            │\n│                    └───────────────────────┘            │\n└─────────────────────────────────────────────────────────┘\n```\n\n## Why this matters for Docker Hardened Images\n\n[Docker Hardened Images](https://docs.docker.com/dhi/) are great because of the minimal attack surface, near-zero CVEs, proper software bills of materials (SBOMs), and SLSA provenance. If you're evaluating base images for security-sensitive workloads, they should be on your list.\n\nBut adopting them creates the same operational friction as any new registry:\n\n* **Credential distribution**: You need to get Docker credentials to every system that pulls images from dhi.io.\n* **CI/CD changes**: Every pipeline needs to be updated to authenticate with dhi.io.\n* **Developer friction**: People need to remember to use the hardened variants.\n* **Visibility gap**: It's difficult to tell if teams are actually using hardened images vs. regular ones.\n\nVirtual registry addresses each of these:\n\n**Single credential**: Teams authenticate to GitLab. The virtual registry handles upstream authentication. You configure Docker credentials once, at the registry level, and they apply to all pulls.\n\n**No CI/CD changes per-team**: Point pipelines at your virtual registry. Done. The upstream configuration is centralized.\n\n**Gradual adoption**: Since images get cached with their full path, you can see in the cache what's being pulled. If someone's pulling `library/python:3.11` instead of the hardened variant, you'll know.\n\n**Audit trail**: The cache shows you exactly which images are in active use. Useful for compliance, useful for understanding what your fleet actually depends on.\n\n## Setting it up\n\nHere's a real setup using the Python client from this demo project.\n\n### Create the virtual registry\n\n```python\nfrom virtual_registry_client import VirtualRegistryClient\n\nclient = VirtualRegistryClient()\n\nregistry = client.create_virtual_registry(\n    group_id=\"785414\",  # Your top-level group ID\n    name=\"platform-images\",\n    description=\"Cached container images for platform teams\"\n)\n\nprint(f\"Registry ID: {registry['id']}\")\n# You'll need this ID for the pull URL\n```\n\n### Add Docker Hub as an upstream\n\nFor official images like Alpine, Python, etc.:\n\n```python\ndocker_upstream = client.create_upstream(\n    registry_id=registry['id'],\n    url=\"https://registry-1.docker.io\",\n    name=\"Docker Hub\",\n    cache_validity_hours=24\n)\n```\n\n### Add Docker Hardened Images (dhi.io)\n\nDocker Hardened Images are hosted on `dhi.io`, a separate registry that requires authentication:\n\n```python\ndhi_upstream = client.create_upstream(\n    registry_id=registry['id'],\n    url=\"https://dhi.io\",\n    name=\"Docker Hardened Images\",\n    username=\"your-docker-username\",\n    password=\"your-docker-access-token\",\n    cache_validity_hours=24\n)\n```\n\n### Add other upstreams\n\n```python\n# MCR for .NET teams\nclient.create_upstream(\n    registry_id=registry['id'],\n    url=\"https://mcr.microsoft.com\",\n    name=\"Microsoft Container Registry\",\n    cache_validity_hours=48\n)\n\n# Quay for Red Hat stuff\nclient.create_upstream(\n    registry_id=registry['id'],\n    url=\"https://quay.io\",\n    name=\"Quay.io\",\n    cache_validity_hours=24\n)\n```\n\n### Update your CI/CD\n\nHere's a `.gitlab-ci.yml` that pulls through the virtual registry:\n\n```yaml\nvariables:\n  VIRTUAL_REGISTRY_ID: \u003Cyour_virtual_registry_ID>\n\n  \nbuild:\n  image: docker:24\n  services:\n    - docker:24-dind\n  before_script:\n    # Authenticate to GitLab (which handles upstream auth for you)\n    - echo \"${CI_JOB_TOKEN}\" | docker login -u gitlab-ci-token --password-stdin gitlab.com\n  script:\n    # All of these go through your single virtual registry\n    \n    # Official Docker Hub images (use library/ prefix)\n    - docker pull gitlab.com/virtual_registries/container/${VIRTUAL_REGISTRY_ID}/library/alpine:latest\n    \n    # Docker Hardened Images from dhi.io (no prefix needed)\n    - docker pull gitlab.com/virtual_registries/container/${VIRTUAL_REGISTRY_ID}/python:3.13\n    \n    # .NET from MCR\n    - docker pull gitlab.com/virtual_registries/container/${VIRTUAL_REGISTRY_ID}/dotnet/sdk:8.0\n```\n\n### Image path formats\n\nDifferent registries use different path conventions:\n\n| Registry | Pull URL Example |\n|----------|------------------|\n| Docker Hub (official) | `.../library/python:3.11-slim` |\n| Docker Hardened Images (dhi.io) | `.../python:3.13` |\n| MCR | `.../dotnet/sdk:8.0` |\n| Quay.io | `.../prometheus/prometheus:latest` |\n\n### Verify it's working\n\nAfter some pulls, check your cache:\n\n```python\nupstreams = client.list_registry_upstreams(registry['id'])\nfor upstream in upstreams:\n    entries = client.list_cache_entries(upstream['id'])\n    print(f\"{upstream['name']}: {len(entries)} cached entries\")\n\n```\n\n## What the numbers look like\n\nI ran tests pulling images through the virtual registry:\n\n| Metric | Without Cache | With Warm Cache |\n|--------|---------------|-----------------|\n| Pull time (Alpine) | 10.3s | 4.2s |\n| Pull time (Python 3.13 DHI) | 11.6s | ~4s |\n| Network roundtrips to upstream | Every pull | Cache misses only |\n\n\n\n\nThe first pull is the same speed (it has to fetch from upstream). Every pull after that, for the cache validity period, comes straight from GitLab's storage. No network hop to Docker Hub, dhi.io, MCR, or wherever the image lives.\n\nFor a team running hundreds of pipeline jobs per day, that's hours of cumulative build time saved.\n\n## Practical considerations\nHere are some considerations to keep in mind:\n\n### Cache validity\n\n24 hours is the default. For security-sensitive images where you want patches quickly, consider 12 hours or less:\n\n```python\nclient.create_upstream(\n    registry_id=registry['id'],\n    url=\"https://dhi.io\",\n    name=\"Docker Hardened Images\",\n    username=\"your-username\",\n    password=\"your-token\",\n    cache_validity_hours=12\n)\n```\n\nFor stable, infrequently-updated images (like specific version tags), longer validity is fine.\n\n### Upstream priority\n\nUpstreams are checked in order. If you have images with the same name on different registries, the first matching upstream wins.\n\n### Limits\n\n* Maximum of 20 virtual registries per group\n* Maximum of 20 upstreams per virtual registry\n\n## Configuration via UI\n\nYou can also configure virtual registries and upstreams directly from the GitLab UI—no API calls required. Navigate to your group's **Settings > Packages and registries > Virtual Registry** to:\n\n* Create and manage virtual registries\n* Add, edit, and reorder upstream registries\n* View and manage the cache\n* Monitor which images are being pulled\n\n## What's next\n\nWe're actively developing:\n\n* **Allow/deny lists**: Use regex to control which images can be pulled from specific upstreams.\n\nThis is beta software. It works, people are using it in production, but we're still iterating based on feedback.\n\n## Share your feedback\n\nIf you're a platform engineer dealing with container registry sprawl, I'd like to understand your setup:\n\n* How many upstream registries are you managing?\n* What's your biggest pain point with the current state?\n* Would something like this help, and if not, what's missing?\n\nPlease share your experiences in the [Container Virtual Registry feedback issue](https://gitlab.com/gitlab-org/gitlab/-/work_items/589630).\n## Related resources\n- [New GitLab metrics and registry features help reduce CI/CD bottlenecks](https://about.gitlab.com/blog/new-gitlab-metrics-and-registry-features-help-reduce-ci-cd-bottlenecks/#container-virtual-registry)\n- [Container Virtual Registry documentation](https://docs.gitlab.com/user/packages/virtual_registry/container/)\n- [Container Virtual Registry API](https://docs.gitlab.com/api/container_virtual_registries/)",[744,743,25],{"featured":16,"template":17,"slug":771},"using-gitlab-container-virtual-registry-with-docker-hardened-images",{"promotions":773},[774,788,799,811],{"id":775,"categories":776,"header":778,"text":779,"button":780,"image":785},"ai-modernization",[777],"ai-ml","Is AI achieving its promise at scale?","Quiz will take 5 minutes or less",{"text":781,"config":782},"Get your AI maturity score",{"href":783,"dataGaName":784,"dataGaLocation":247},"/assessments/ai-modernization-assessment/","modernization assessment",{"config":786},{"src":787},"https://res.cloudinary.com/about-gitlab-com/image/upload/v1772138786/qix0m7kwnd8x2fh1zq49.png",{"id":789,"categories":790,"header":791,"text":779,"button":792,"image":796},"devops-modernization",[743,574],"Are you just managing tools or shipping innovation?",{"text":793,"config":794},"Get your DevOps maturity score",{"href":795,"dataGaName":784,"dataGaLocation":247},"/assessments/devops-modernization-assessment/",{"config":797},{"src":798},"https://res.cloudinary.com/about-gitlab-com/image/upload/v1772138785/eg818fmakweyuznttgid.png",{"id":800,"categories":801,"header":803,"text":779,"button":804,"image":808},"security-modernization",[802],"security","Are you trading speed for security?",{"text":805,"config":806},"Get your security maturity score",{"href":807,"dataGaName":784,"dataGaLocation":247},"/assessments/security-modernization-assessment/",{"config":809},{"src":810},"https://res.cloudinary.com/about-gitlab-com/image/upload/v1772138786/p4pbqd9nnjejg5ds6mdk.png",{"id":812,"paths":813,"header":816,"text":817,"button":818,"image":823},"github-azure-migration",[814,815],"migration-from-azure-devops-to-gitlab","integrating-azure-devops-scm-and-gitlab","Is your team ready for GitHub's Azure move?","GitHub is already rebuilding around Azure. Find out what it means for you.",{"text":819,"config":820},"See how GitLab compares to GitHub",{"href":821,"dataGaName":822,"dataGaLocation":247},"/compare/gitlab-vs-github/github-azure-migration/","github azure migration",{"config":824},{"src":798},{"header":826,"blurb":827,"button":828,"secondaryButton":833},"Start building faster today","See what your team can do with the intelligent orchestration platform for DevSecOps.\n",{"text":829,"config":830},"Get your free trial",{"href":831,"dataGaName":54,"dataGaLocation":832},"https://gitlab.com/-/trial_registrations/new?glm_content=default-saas-trial&glm_source=about.gitlab.com/","feature",{"text":510,"config":834},{"href":58,"dataGaName":59,"dataGaLocation":832},1777493649420]