GitHub Copilot's policy for AI training: A governance wake-up call

GitHub recently announced a significant change to how it handles data from Copilot users. Starting April 24, 2026, interaction data from Copilot Free, Pro, and Pro+ users, including inputs, outputs, code snippets, and associated context, will be used to train AI models by default, unless users actively opt out. Copilot Business and Enterprise customers are exempt under existing contract terms.

For organizations in regulated industries, including finance, healthcare, defense, and public sector, the policy shift raises questions that go beyond individual developer preferences. It forces a harder look at a question that engineering and security leaders should be asking every AI vendor in their stack: Do you train on our code?

GitLab's answer is no. GitLab does not train AI models on customer code at any tier, and AI vendors are contractually prohibited from using customer inputs or outputs for their own purposes. The GitLab AI Transparency Center makes that commitment auditable: a single location documenting which models power which features, how data is handled, subprocessor relationships, and data retention periods. The GitLab AI Transparency Center also lists the compliance status of each feature, including confirmation that GitLab's current AI features do not qualify as high-risk systems under the EU AI Act. It's a standard GitLab CEO Bill Staples has consistently reiterated and one reflected in GitLab's mission and Trust Center.

What the policy change actually means

GitHub's announcement also specifies that the data may be shared with GitHub affiliates, including Microsoft, for AI development purposes.

A policy change of this nature forces organizations to re-examine their AI governance posture, audit their Copilot license tiers, and confirm that the right controls are configured across their teams.

Why AI governance matters in regulated environments

Source code is often among an organization's most sensitive intellectual property. It may contain references to internal systems, reflect proprietary business logic, or touch data flows governed by strict retention and access policies. When that code passes through an AI assistant, questions about training data usage, model vendor relationships, and data residency become compliance concerns.

The exposure is particularly acute for financial services firms that have invested in proprietary algorithms, fraud detection logic, credit risk models, underwriting rules, trading strategies. When AI tooling processes that code and uses it to train models serving competitors, vendor data practices become an IP concern.

Financial institutions operating under the Federal Reserve's Supervisory Guidance on Model Risk Management (SR 11-7) and the Digital Operational Resilience Act (DORA) are required to maintain documented, auditable oversight of third-party technology providers, including understanding how those providers handle data. Third-party AI tools used in development workflows increasingly fall within the scope of model risk oversight, and material changes to vendor data practices require updated documentation.

In the public sector, the National Institute of Standards and Technology Special Publication 800-53 (NIST 800-53) and the Federal Information Security Modernization Act (FISMA) establish that sensitive or classified code must never leave a controlled boundary. For U.S. Department of Defense and intelligence community environments in particular, a vendor's default data posture is an operational concern. In healthcare, the Health Insurance Portability and Accountability Act (HIPAA) governs how patient-adjacent data is handled by third parties, and development environments that touch clinical systems increasingly fall within that scope.

Across all of these contexts, the common thread is the same: A vendor policy that changes data usage defaults, requires individual opt-out, and offers different protections depending on account tier introduces exactly the kind of uncontrolled variable that compliance teams cannot afford.

What regulated industries actually need from AI vendors

Regulated organizations have largely moved past debating whether to adopt AI in development workflows. The focus now is on doing so in a way they can defend to regulators, boards, and customers. That shift has surfaced a consistent set of requirements regardless of sector.

Contractual certainty. Regulated firms need to know, with specificity, what happens to their data. A clear, documented, unconditional commitment is what's required, not something that varies by plan or requires action before a deadline.

Auditability. Model risk management frameworks require organizations to understand and validate the AI systems they deploy, including the training data behind those models and the third parties involved in their development. Vendors who cannot answer these questions create documentation risk for the organizations relying on them.

Separation from vendor incentives. When an AI vendor trains models on customer usage data, code and workflows become inputs to a system that also serves competitors. For institutions with proprietary trading logic, underwriting models, or fraud detection systems, that's a genuine IP exposure.

GitLab's position on AI data governance

GitLab does not use customer code to train AI models. This commitment applies at every tier, and AI vendors are contractually prohibited from using inputs or outputs associated with GitLab customers for their own purposes.

This is a deliberate architectural and policy choice, not a feature of a particular pricing tier. As GitLab's post on enterprise independence notes, data governance has become "an increasingly critical factor in enterprise technology decisions, driven by a complex web of national and regional data protection laws and growing concern about control over sensitive intellectual property."

GitLab is also cloud-neutral and model-neutral while supporting self-hosted deployments, not commercially tied to any single cloud provider or large language model (LLM). That independence matters for regulated organizations evaluating vendor concentration risk. The AI Continuity Plan documents how vendor changes are managed, including material changes to how AI vendors treat customer data, a direct response to the governance requirements under frameworks like DORA.

The governance gap AI teams need to close

GitHub's policy update is a reminder that for organizations in regulated industries, understanding exactly how an AI tool handles data is a prerequisite for using it at all. That means asking vendors for clear, documented answers: Is our data used for model training? Who are your AI model subprocessors? What happens if a vendor changes its data practices? Can we deploy in a way that keeps all AI processing within our own infrastructure? What indemnification do you offer for AI-generated output?

Vendors who can answer those questions clearly, and document those answers in an auditable form, are vendors you can build on. Those who cannot will create compliance debt every time they ship a policy update. And when a vendor can change its data practices with 30 days notice, that's not a partnership built for regulated industries. That's a liability.

Learn more about GitLab's approach to AI governance at the GitLab AI Transparency Center.

GitHub Copilot's new policy for AI training is a governance wake-up call

What the policy change actually means

Why AI governance matters in regulated environments

What regulated industries actually need from AI vendors

GitLab's position on AI data governance

The governance gap AI teams need to close

More to explore

GitLab and Anthropic: Governed AI for enterprise development

Give your AI agent direct, structured GitLab access with glab CLI

GitLab and Vertex AI on Google Cloud: Advancing agentic software development

We want to hear from you

Start building faster today

Pricing

Contact Us

Product

Topics

Solutions

Resources

Company

GitHub Copilot's new policy for AI training is a governance wake-up call

What the policy change actually means

Why AI governance matters in regulated environments

What regulated industries actually need from AI vendors

GitLab's position on AI data governance

The governance gap AI teams need to close

Is AI achieving its promise at scale?

More to explore

GitLab and Anthropic: Governed AI for enterprise development

Give your AI agent direct, structured GitLab access with glab CLI

GitLab and Vertex AI on Google Cloud: Advancing agentic software development

We want to hear from you

Start building faster today