How to build CI/CD observability at scale

CI/CD optimization starts with visibility. Building a successful DevOps platform at enterprise scale should include understanding pipeline performance, job execution patterns, and quantifiable operational insights — especially for organizations running GitLab self-managed instances.

To help GitLab customers maximize their platform investments, we developed the GitLab CI/CD Observability solution as part of our Platform Excellence program, which transforms raw pipeline metrics into actionable operational insights.

A leading financial services organization partnered with GitLab's customer success architect to gain visibility into their GitLab self-managed deployment. Together, we implemented a containerized observability solution combining the open-source gitlab-ci-pipelines-exporter with enterprise-grade Prometheus and Grafana infrastructure.

In this article, you'll learn the challenges they faced managing pipelines at scale and how GitLab CI/CD Observability addressed them with a practical, end-to-end implementation.

The challenge: Measuring CI/CD performance

Before implementing any observability solution, define your measurement landscape:

What metrics matter? Pipeline duration, job success rates, queue times, runner utilization
Who needs visibility? Developers, DevOps engineers, platform teams, leadership
What decisions will this drive? Infrastructure investment, bottleneck remediation, capacity planning

Solution architecture: A full set of dashboards for observability

Once deployed, the observability stack provides a set of Grafana dashboards that give real-time and historical visibility into your CI/CD platform. A typical deployment includes:

Pipeline Overview Dashboard: A top-level view showing total pipeline runs, success/failure rates over time (as stacked bar or time-series charts), and average pipeline duration trends. Panels use color-coded status indicators (green for success, red for failure, amber for cancelled) so platform teams can spot degradation at a glance.
Job Performance Dashboard: Drill-down panels showing individual job duration distributions (histogram), the top 10 slowest jobs by average duration, and job failure heatmaps by project and stage. This is where teams identify specific bottleneck jobs worth optimizing.
Runner & Infrastructure Dashboard: Combines Node Exporter host metrics (CPU, memory, disk) with pipeline queue-time data to correlate infrastructure saturation with pipeline wait times. Useful for capacity planning decisions such as scaling runner pools or upgrading instance sizes.
Deployment Frequency Dashboard: Tracks deployment count and deployment duration over time per environment, aligned with DORA metrics. Helps engineering leadership assess delivery throughput and environment drift (commits behind main).

Each dashboard is provisioned automatically via Grafana's file-based provisioning, so it deploys consistently across environments. The dashboards can be further customized with Grafana variables to filter by project, ref/branch, or time range.

Solution architecture

The solution requires two exporters:

Pipeline Exporter: Collects CI/CD metrics via GitLab API (pipeline duration, job status, deployments)
Node Exporter: Collects host-level metrics (CPU, memory, disk) for infrastructure correlation

Prerequisites:

GitLab Self-Managed Version 18.1+
Container orchestration platform: A Kubernetes cluster (recommended for enterprise deployments) or a container runtime such as Docker/Podman for smaller scale or proof-of-concept environments. The primary deployment guide below targets Kubernetes; a Docker Compose alternative is provided in the appendix for local testing and evaluation
GitLab Personal Access Token (read_api scope)

Kubernetes deployment (recommended)

For enterprise environments, deploy each component as a separate Deployment within a dedicated namespace. This approach integrates with existing cluster infrastructure, secrets management, and network policies.

1. Create namespace and secret

      kubectl create namespace gitlab-observability

# Create the GitLab token secret (see Secrets Management section below
# for enterprise-grade approaches using external secret operators)
kubectl create secret generic gitlab-token \
  --from-literal=token=glpat-xxxxxxxxxxxx \
  -n gitlab-observability

2. Deploy the Pipeline Exporter

      # exporter-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: gitlab-ci-pipelines-exporter
  namespace: gitlab-observability
spec:
  replicas: 1
  selector:
    matchLabels:
      app: gitlab-ci-pipelines-exporter
  template:
    metadata:
      labels:
        app: gitlab-ci-pipelines-exporter
    spec:
      containers:
        - name: exporter
          image: mvisonneau/gitlab-ci-pipelines-exporter:latest
          ports:
            - containerPort: 8080
          env:
            - name: GCPE_GITLAB_TOKEN
              valueFrom:
                secretKeyRef:
                  name: gitlab-token
                  key: token
            - name: GCPE_CONFIG
              value: /etc/gcpe/config.yml
          volumeMounts:
            - name: config
              mountPath: /etc/gcpe
      volumes:
        - name: config
          configMap:
            name: gcpe-config
---
apiVersion: v1
kind: Service
metadata:
  name: gitlab-ci-pipelines-exporter
  namespace: gitlab-observability
spec:
  selector:
    app: gitlab-ci-pipelines-exporter
  ports:
    - port: 8080
      targetPort: 8080

3. Deploy Node Exporter (DaemonSet)

      # node-exporter-daemonset.yaml
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: node-exporter
  namespace: gitlab-observability
spec:
  selector:
    matchLabels:
      app: node-exporter
  template:
    metadata:
      labels:
        app: node-exporter
    spec:
      containers:
        - name: node-exporter
          image: prom/node-exporter:latest
          ports:
            - containerPort: 9100
---
apiVersion: v1
kind: Service
metadata:
  name: node-exporter
  namespace: gitlab-observability
spec:
  selector:
    app: node-exporter
  ports:
    - port: 9100
      targetPort: 9100

4. Deploy Prometheus

      # prometheus-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: prometheus
  namespace: gitlab-observability
spec:
  replicas: 1
  selector:
    matchLabels:
      app: prometheus
  template:
    metadata:
      labels:
        app: prometheus
    spec:
      containers:
        - name: prometheus
          image: prom/prometheus:latest
          ports:
            - containerPort: 9090
          volumeMounts:
            - name: config
              mountPath: /etc/prometheus
      volumes:
        - name: config
          configMap:
            name: prometheus-config
---
apiVersion: v1
kind: Service
metadata:
  name: prometheus
  namespace: gitlab-observability
spec:
  selector:
    app: prometheus
  ports:
    - port: 9090
      targetPort: 9090

5. Deploy Grafana

The Grafana deployment below starts with authentication disabled (GF_AUTH_ANONYMOUS_ENABLED: true) for initial setup convenience.

This setting allows anyone with network access to view all dashboards without logging in. For production deployments, remove this variable or set it to false and configure a proper authentication provider (LDAP, SAML/SSO, or OAuth) to restrict access to authorized users.

      # grafana-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: grafana
  namespace: gitlab-observability
spec:
  replicas: 1
  selector:
    matchLabels:
      app: grafana
  template:
    metadata:
      labels:
        app: grafana
    spec:
      containers:
        - name: grafana
          image: grafana/grafana:10.0.0
          ports:
            - containerPort: 3000
          env:
            # REMOVE or set to 'false' for production.
            # When 'true', any user with network access can
            # view dashboards without authentication.
            - name: GF_AUTH_ANONYMOUS_ENABLED
              value: 'true'
          volumeMounts:
            - name: dashboards-provider
              mountPath: /etc/grafana/provisioning/dashboards
            - name: datasources
              mountPath: /etc/grafana/provisioning/datasources
            - name: dashboards
              mountPath: /var/lib/grafana/dashboards
      volumes:
        - name: dashboards-provider
          configMap:
            name: grafana-dashboards-provider
        - name: datasources
          configMap:
            name: grafana-datasources
        - name: dashboards
          configMap:
            name: grafana-dashboards
---
apiVersion: v1
kind: Service
metadata:
  name: grafana
  namespace: gitlab-observability
spec:
  selector:
    app: grafana
  ports:
    - port: 3000
      targetPort: 3000

6. Set network policy

Restrict inter-pod traffic to only the required communication paths:

      # network-policy.yaml
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: observability-policy
  namespace: gitlab-observability
spec:
  podSelector: {}
  policyTypes:
    - Ingress
  ingress:
    # Prometheus scrapes exporter and node-exporter
    - from:
        - podSelector:
            matchLabels:
              app: prometheus
      ports:
        - port: 8080
        - port: 9100
    # Grafana queries Prometheus
    - from:
        - podSelector:
            matchLabels:
              app: grafana
      ports:
        - port: 9090

7. Validate

      kubectl get pods -n gitlab-observability
kubectl port-forward svc/grafana 3000:3000 -n gitlab-observability
curl http://localhost:3000/api/health

Configuration reference

Exporter configuration

      # gitlab-ci-pipelines-exporter.yml (ConfigMap: gcpe-config)
log:
  level: info
gitlab:
  url: https://gitlab.your-domain.com
  maximum_requests_per_second: 10
project_defaults:
  pull:
    pipeline:
      jobs:
        enabled: true
wildcards:
  - owner:
      name: your-group-name
      kind: group
    archived: false

Prometheus configuration

      # prometheus.yml (ConfigMap: prometheus-config)
global:
  scrape_interval: 15s
scrape_configs:
  - job_name: 'gitlab-ci-pipelines-exporter'
    static_configs:
      - targets: ['gitlab-ci-pipelines-exporter:8080']
  - job_name: 'node-exporter'
    static_configs:
      - targets: ['node-exporter:9100']

Grafana data sources

      # datasources.yml (ConfigMap: grafana-datasources)
apiVersion: 1
datasources:
  - name: Prometheus
    type: prometheus
    access: proxy
    url: http://prometheus:9090
    isDefault: true
# dashboards.yml (ConfigMap: grafana-dashboards-provider)
apiVersion: 1
providers:
  - name: 'default'
    folder: 'GitLab CI/CD'
    type: file
    options:
      path: /var/lib/grafana/dashboards

Key metrics

Pipeline Exporter metrics

Metric	Description
`gitlab_ci_pipeline_duration_seconds`	Pipeline execution time
`gitlab_ci_pipeline_status`	Pipeline success/failure by project
`gitlab_ci_pipeline_job_duration_seconds`	Individual job execution time
`gitlab_ci_pipeline_job_status`	Job success/failure status
`gitlab_ci_pipeline_job_artifact_size_bytes`	Artifact storage consumption
`gitlab_ci_pipeline_coverage`	Code coverage percentage
`gitlab_ci_environment_deployment_count`	Deployment frequency
`gitlab_ci_environment_deployment_duration_seconds`	Deployment execution time
`gitlab_ci_environment_behind_commits_count`	Environment drift from main

Node Exporter metrics

Metric	Description
`node_cpu_seconds_total`	CPU utilization
`node_memory_MemAvailable_bytes`	Available memory
`node_filesystem_avail_bytes`	Disk space available
`node_load1`	1-minute load average

Troubleshooting

Air-gapped Grafana plugin installation

For offline environments, install plugins manually. Example for Kubernetes:

      # Copy plugin zip into the Grafana pod
kubectl cp grafana-polystat-panel-2.1.16.zip \
  gitlab-observability/grafana-<pod-id>:/tmp/
# Extract plugin
kubectl exec -it -n gitlab-observability deploy/grafana -- \
  sh -c "unzip /tmp/grafana-polystat-panel-2.1.16.zip -d /var/lib/grafana/plugins/"
# Restart Grafana pod
kubectl rollout restart deployment/grafana -n gitlab-observability
# Verify installation
kubectl exec -it -n gitlab-observability deploy/grafana -- \
  ls -al /var/lib/grafana/plugins/

Enterprise considerations

For regulated industries, ensure:

Token security: Store GitLab Personal Access Tokens in a dedicated secrets manager rather than hardcoded in ConfigMaps. Enforce token rotation policies and limit scope to read_api only.
Network segmentation: Deploy behind a reverse proxy with TLS termination. In Kubernetes, use an Ingress controller with automated certificate provisioning.
Authentication: Configure Grafana with your organization's identity provider (SAML, LDAP, or OAuth/OIDC) to enforce role-based access control on dashboards.

Why GitLab?

GitLab's API-first design enables custom observability solutions that complement native capabilities like Value Stream Analytics and DORA metrics. The open architecture allows organizations to integrate proven open-source tooling — like the gitlab-ci-pipelines-exporter — directly with their existing enterprise infrastructure, without disrupting established workflows.

As your observability maturity grows, GitLab's built-in Observability capabilities provide a natural next step — offering deeper, integrated visibility without additional tooling. Learn more about what's available natively in the platform for GitLab Observability.

How to build CI/CD observability at scale

The challenge: Measuring CI/CD performance

Solution architecture: A full set of dashboards for observability

Kubernetes deployment (recommended)

1. Create namespace and secret

2. Deploy the Pipeline Exporter

3. Deploy Node Exporter (DaemonSet)

4. Deploy Prometheus

5. Deploy Grafana

6. Set network policy

7. Validate

Configuration reference

Exporter configuration

Prometheus configuration

Grafana data sources

Key metrics

Pipeline Exporter metrics

Node Exporter metrics

Troubleshooting

Air-gapped Grafana plugin installation

Enterprise considerations

Why GitLab?

More to explore

5 ways GitLab pipeline logic solves real engineering problems

How to use GitLab Container Virtual Registry with Docker Hardened Images

How IIT Bombay students are coding the future with GitLab

We want to hear from you

Start building faster today

Pricing

Contact Us

Product

Topics

Solutions

Resources

Company

How to build CI/CD observability at scale

The challenge: Measuring CI/CD performance

Solution architecture: A full set of dashboards for observability

Kubernetes deployment (recommended)

1. Create namespace and secret

2. Deploy the Pipeline Exporter

3. Deploy Node Exporter (DaemonSet)

4. Deploy Prometheus

5. Deploy Grafana

6. Set network policy

7. Validate

Configuration reference

Exporter configuration

Prometheus configuration

Grafana data sources

Key metrics

Pipeline Exporter metrics

Node Exporter metrics

Troubleshooting

Air-gapped Grafana plugin installation

Enterprise considerations

Why GitLab?

Stay in the know with GitLab's monthly newsletter

More to explore

5 ways GitLab pipeline logic solves real engineering problems

How to use GitLab Container Virtual Registry with Docker Hardened Images

How IIT Bombay students are coding the future with GitLab

We want to hear from you

Start building faster today