Data Governance for AI Must Be Executable
Why AI models stall between proof of concept and production, and what technology leaders can do about it
The Problem Isn’t the Model
A customer-facing AI agent confidently answers a benefits question. The answer is wrong because it retrieved a superseded policy document from an unversioned, access-uncontrolled corpus. The business now has three concurrent problems: customer harm, regulatory exposure, and an internal investigation that cannot reproduce the retrieval context that produced the response. No one can say which version of which document the model saw, because provenance was never captured. The investigation drags on for weeks; the root cause, ungoverned data, remains in place for the next incident. In AI-enabled enterprises, this is not an edge case. It is the predictable outcome of deploying AI on a data substrate built for reporting, not autonomous action. Models are only as trustworthy as the data they ingest; in most organizations, that data is neither traceable enough to explain nor governed well enough to defend.
In 2024, one of the authors was serving as CTO of a healthcare research organization when the team deployed its first RAG solution. We anticipated the usual technical challenges: problematic chunking strategies, improper ranking algorithms, nonfunctional requirements like system performance. The biggest problem, however, was none of these. It was out-of-date and contradictory data sources, resulting in the system misrepresenting current scientific thinking and organizational policy. The technical architecture worked, but the data substrate beneath it was ungoverned. Now amplify this lesson across an enterprise deploying autonomous agents that depend on data sources spanning dozens of systems and domains, whose immediate responses and downstream decisions may never be reviewed by a human. The problems we expected in 2024 were largely architectural. The problem that actually mattered was upstream, as ungoverned, conflicting data quietly degraded output quality as the AI became more relied upon and essential to our business.
In this context, data governance and modern data management together form the trust architecture for AI-driven analytics and automation. Governance defines decision rights, standards, and accountability for data meaning, quality, provenance, and access. These policies and standards do not enforce themselves; they must be translated into technical controls that operate within the data ecosystem. Modern data management enforces those standards as controls across pipelines, catalogs, and APIs, and ideally, it produces an evidence trail that makes outputs explainable and defensible. When governance and management are disconnected, AI scales faster than trust, and small data defects become systemic failures.
Governance structures and operational risk frameworks define who is accountable and what must be monitored. This paper addresses an important question previously raised: whether the data infrastructure beneath those frameworks is engineered to make accountability and monitoring possible? For related concepts, see AI Governance is Broken: Here’s How to Fix It and AI Risks Don’t Wait for Committees.
The central problem for most enterprises is not a lack of data, but rather is the lack of executable governance: policies that are enforced automatically at the point of data movement and model use, rather than documented in frameworks that nobody operationalizes. Until governance is implemented as enforceable controls within systems, AI will continue to scale faster than trust. In working with organizations across healthcare, financial services, and insurance, we have found three structural failures that recur with striking consistency, and they persist not because leaders are unaware of data governance, but because their governance programs are designed to produce documentation rather than controls.
Three Things Most Organizations Have Wrong
Each of these recurring problems is well-understood in isolation. What makes them dangerous in combination is that they compound. Inconsistent semantic definitions feed ungoverned pipelines, which feed ungoverned AI artifacts, and the resulting errors become progressively harder to trace back to their origin. Three in particular define the gap between governance programs that exist on paper and governance that operates in production.
→ Governance is not a committee. It is a build artifact.
If a policy is not enforced automatically at the point of data movement or model inference, it is documentation. Documentation does not prevent incidents. The question is not whether the organization has a governance program. It is whether that program produces controls or produces PDFs.
→ Semantic drift is the silent failure mode of enterprise AI.
The most damaging data quality failures are not nulls and duplicates, but rather, they are semantic. “Active customer” means something different in the CRM than in the data warehouse. “Net revenue” changes definition when accounting policy changes, with no version event recorded. The model trained on the old definition produces outputs that are internally consistent and operationally wrong. Teams debug model performance when they should be debugging meaning.
THE SEMANTIC DRIFT FAILURE IN PRACTICE
A risk model in production begins producing anomalous scores. The model team investigates: the algorithm is unchanged, the pipeline is running cleanly, there are no obvious data quality issues. After a few weeks of investigation, a data engineer discovers that the feature store’s definition of “active member” was updated three months earlier to reflect a new product line. The model was trained on the old definition. The model is not broken. The semantic contract between the feature store and the model was broken, silently, with no lineage event and no downstream notification. Weeks of engineering time spent diagnosing a data governance failure.
Preventing semantic drift requires both shared definitions and structural controls. Master Data Management (MDM) creates authoritative representations of key business entities across various domains. Data contracts establish the mutually agreed-upon expectations between data producers and consumers, addressing parameters such as schema, semantic definitions, quality standards, and refresh intervals. By integrating these controls into data pipelines and APIs, governance standards become actionable mechanisms that help minimize the risk of unnoticed drift affecting models.
→ RAG and agents have expanded the governance perimeter beyond tables and reports.
Production AI systems increasingly depend on retrieval corpora, prompt libraries, vector stores, and agent interaction logs—assets that exist outside the model but directly shape its outputs. These artifacts influence what the system sees, retrieves, and generates. In many enterprises, they live in ad hoc repositories with no version control, no access management, and no audit trail. Within this ungoverned perimeter, the next significant incident is incubating.
The exposure does not end at model behavior. APIs are the distribution boundary for governed data products—the point at which AI and analytics outputs cross domain or organizational boundaries. When data controls are not enforced at this boundary, sensitive data can leak, entitlements can drift, and quality inconsistencies can propagate at scale. APIs are not a parallel governance discipline; they are the enforcement surface for data governance in motion.
These gaps take on new significance once AI systems move beyond pilots. Three forces are converging to make these gaps untenable.
Why This Matters Now: Regulation, Competitive Velocity, and Production Scale
Executable data governance is no longer theoretical. Regulatory expectations are becoming more concrete, deployment cycles are shortening, and boards are asking how AI investments translate into operational results. These pressures are converging on the same requirement: governance that functions in practice, not just on paper.
→ The Window is Closing: Regulation is Arriving in Phases.
The EU AI Act applies in stages through 2026 and 2027, depending on system category and risk classification. While the Act addresses a broad set of system-level obligations—including risk management, documentation, and oversight—many of these requirements depend directly on data governance capabilities: documented training data provenance, monitoring for data and distributional drift, transparency of data sources, and the ability to demonstrate that controls are operating as designed. Organizations that cannot produce this evidence at audit will face deployment restrictions, not just fines.
For most global enterprises, EU jurisdiction is not optional. If AI systems touch EU data subjects, or if partners or customers operate under EU law, the obligations apply. The time to build the evidence infrastructure is before the audit, not during one. Organizations that invest early in lineage, version control, and control verification will move from reactive compliance to operational readiness.
→ Governance as Accelerant: Organizations with Executable Governance ship faster.
The competitive consequence of the governance gap is not that organizations build worse models. It is that they operate them more slowly, defend them less confidently, and spend more engineering time on incidents that governed pipelines would have prevented. Organizations that have built the trust layer (lineage tracking, quality controls embedded in pipelines, provenance for AI artifacts) report deployment cycles that are substantially faster, because sign-off is a verification step rather than an investigation. Leadership sees this in shorter and more predictable approval cycles.
The underlying driver is governance infrastructure.Mature governance and data management transform risk into strategic advantage across four dimensions: speed to trusted AI through governed pipelines with lineage and provenance that accelerate model deployment; risk containment through runtime enforcement that detects anomalies before they reach production; monetization enablement through API governance frameworks that allow safe data product exposure and new revenue streams; and operational scalability through AI-augmented governance—automated classification, quality monitoring, and policy enforcement—that grows with data and model volume without proportional headcount increases.
→ The POC-to-production gap is now a board-level question.
Enterprise AI investment has shifted from “should we invest” to “where is the return.” Organizations running AI as a portfolio of pilots are facing pressure to demonstrate that the capability can be operationalized at scale. When the honest answer is “our models work in the lab but cannot ship because legal cannot sign off on the data,” that answer has a short shelf life with a board or an investment committee. Governance is what makes operationalization possible. Without it, the answer to “why is this still in testing” eventually stops being acceptable.
These pressures ultimately reduce to a specific operational requirement: the ability to reproduce and explain system behavior.
Reproducibility And Explainability
An AI model’s reproducibility reconstructs the conditions that produced an output. In turn, explainability translates that reconstruction into an account that regulators, customers, and business leaders can evaluate. The EU AI Act’s transparency requirements, for instance, do not ask whether an organization can replay a pipeline, but instead ask whether the organization can explain why a system behaved as it did, using documented evidence. Without reproducibility, explainability is narrative. But with reproducibility, explainability becomes defensible.
In practice, this means that AI outputs must be reproducible. When an answer affects a customer, a partner, or a regulatory obligation, the organization must be able to reconstruct the exact data inputs, versions, transformations, retrieval context, and model configuration that produced that result. Without this capability, incident response becomes forensic guesswork. With it, findings are evidence-based and defensible.
Reproducibility depends on versioned corpora, prompt and embedding version control, capture of retrieval context, and linkage between model versions and training data snapshots. These controls transform lineage from static documentation into enforceable operational capability. They are not enhancements, but instead are foundational data governance requirements for any AI system operating in production.
Engineering reproducibility requires embedding these controls across data pipelines, model deployment processes, retrieval layers, and observability tooling. The opening scenario in this article – an answer sourced from a superseded policy document with no retrievable context – is a reproducibility failure. Fortunately, it is also a preventable one.
Reproducibility and explainability establish the minimum standard for production AI. The remaining question is how much control is required before those systems are allowed to drive different types of decisions.
Scaling Control Requirements by Decision Impact
As analytics and AI outputs are used in progressively higher-impact business decisions, the required strength of data controls must scale accordingly. The business consequence of a decision determines the necessary rigor of data lineage, versioning, validation, and reproducibility.
Higher-impact decisions, such as pricing, claims routing, or credit approval, require documented semantic definitions, complete lineage coverage, controlled data versioning, and validated training data snapshots before outputs are operationalized. Lower-impact analytical use cases may tolerate lighter controls.
Consider a health insurer’s claims system. An AI agent that auto-adjudicates low-complexity claims under a dollar threshold might operate with standard lineage and periodic quality checks. The same system routing a high-value or clinically complex claim to denial requires complete semantic traceability, versioned training data, documented retrieval context, and a human-in-the-loop override. The data governance requirements are different not because the technology is different, but because the consequence of a wrong answer is different.
Each tier maps minimum required controls (semantic definitions, lineage completeness, versioned datasets, validated snapshots, and reproducible retrieval context) to decision rights: who can approve automation, what thresholds trigger human review, and what fail-closed behavior applies when controls are not met. For high-impact decisions, this means the system is engineered to halt the transaction if the data’s lineage or quality score falls below the required threshold.
How are these controls implemented?
The Executable Governance Model: Policy, Controls, Evidence
Organizations have policies but lack the infrastructure to enforce them automatically or the telemetry to demonstrate that they are working. Closing that gap requires three elements operating together.
Consider a simple example. The organization defines a policy requiring documented training data lineage for every production model. The control plane translates that requirement into an automated deployment check that blocks promotion without a verified lineage record. The evidence plane records each execution of that check — what ran, what it evaluated, and whether it passed or failed.
Policy defines the standard. Controls enforce it. Evidence verifies it.
The evidence plane instruments control execution across the system. Lineage graphs, audit trails, and quality dashboards are not reporting outputs; they are proof that enforcement occurred. Without this instrumentation, governance cannot demonstrate effectiveness to regulators, auditors, or boards.
This continuous evidence generation enables something more fundamental: auditability.
Auditability is the operational test of data governance. If an organization cannot answer who changed a definition, which data version trained a model, or which control executed at a given point in time, governance exists only on paper. These signals must be generated continuously within production systems — not assembled after the fact.
This model mirrors established risk frameworks: policy defines expectations, controls enforce them, and evidence demonstrates compliance. Executable governance embeds that logic directly into the data platform, turning oversight into verification rather than negotiation.
This model must also govern AI-specific artifacts.
Expanding the Data Governance Perimeter to AI Artifacts
Traditional data governance already covers both structured datasets and unstructured content. For AI use, the same governance model must extend to the additional artifacts that sit between those sources and model behavior—prompt libraries, retrieval corpora and vector stores, synthetic datasets, retraining pipelines, and agent interaction logs. Like any enterprise data asset, these artifacts require defined standards, automated enforcement, and continuous evidence that controls operated as intended.
These artifacts differ from traditional data assets not merely in scope, but in consequence. A data quality issue in a warehouse might produce an incorrect report which a human reviews before acting. However, an ungoverned prompt in a production library, a stale document in a retrieval corpus, or a misconfigured agent policy shapes what the AI system does and says, autonomously and in real time.These artifacts function as behavioral control surfaces within AI systems.
Including these artifacts within governance is necessary, but inclusion alone does not create control.
The same structural failure that affected structured data—policies without enforcement—will repeat in AI artifacts unless data governance is operationalized.
Governing these assets requires alignment between data governance and data management.
Data Governance and Management Must Operate as One System
Data governance defines semantic standards, quality thresholds, access rules, and accountability for enterprise data. Data management builds and operates the systems that ingest, transform, store, and expose that data for analytics and AI. For AI systems to operate reliably, these functions cannot run independently.
In many organizations, data governance and data platforms evolve separately. Governance teams define semantic definitions and quality expectations, while engineering teams optimize pipelines and models for delivery speed. When these tracks remain disconnected, application of data standards varies across teams and domains. The issue is integration between data governance and data management.
Operating as one system means that defined business metrics and entity definitions—such as “active customer,” “net revenue,” or “approved claim”—are applied consistently wherever data is transformed or used in analytics and AI systems. It means quality checks, access controls, and lineage capture are implemented within data pipelines and model workflows, rather than handled through external review processes.
This integration requires a clear operating model for data governance at enterprise scale. In large organizations, that model is typically federated: enterprise data standards are defined once and executed within business domains.
→ The Federated Data Governance Model in Practice
Federated data governance often fails when responsibility is distributed but enforcement mechanisms are not. Enterprise data standards are defined centrally, while domain teams are expected to enforce them independently. Without shared data control infrastructure, implementation diverges. A working federated model separates responsibilities clearly:
An enterprise data governance function defines semantic standards, quality baselines, and AI-specific data policies such as provenance and explainability requirements. It also provides shared enforcement patterns that can be reused across data pipelines, feature stores, and APIs.
Domain data teams own their data products and AI use cases within those guardrails. They implement pipelines and models using shared control mechanisms rather than creating independent data definitions or control logic.
A cross-functional data governance council resolves semantic conflicts, aligns priorities, and defines escalation paths when enterprise data standards and operational realities diverge.
This structure works when data control patterns are shared and execution is distributed. It breaks down when standards are issued without implementation support or when domains reinterpret data definitions independently.
When data governance and data management operate as one system, data standards are applied consistently, controls are implemented within production workflows, and accountability for data quality and lineage is clear across domains.
With the operating model defined, the question becomes how to implement it.
How to Start: One Model, Build the Controls, Then Scale
The organizations that have successfully closed the data governance gap have taken a phased approach. They identified one high-stakes model, built the trust layer it required, demonstrated that controls operated as designed and that approval became predictable, and scaled from that proof point.
Phase 1: Assess — Produce a Trust Gap Map
Map the data flows feeding your highest-priority AI model. Identify where lineage is missing, where semantic definitions are inconsistent across sources, where quality controls are manual or absent, and where AI artifacts (prompts, retrieval corpora, interaction logs) fall outside the governance perimeter. The output is a prioritized control backlog, not a maturity assessment. Each gap should have an owner, a control pattern, and an estimated effort.
Common derailment: Teams turn this into a broad governance review instead of a focused model-level assessment, delaying action.
Phase 2: Design — Define the Operating Model and Control Backlog
Establish decision rights and accountability: who owns semantic definitions, who approves AI artifacts for production, who is responsible for evidence generation. Design the specific controls required: automated lineage capture, semantic consistency checks, and programmable policy enforcement in pipelines and APIs.Build the operating model as a RACI (a Responsible, Accountable, Consulted, and Informed matrix), not an org chart. Then translate the highest-priority gaps from the trust gap map into specific control designs.
Common derailment: The operating model is documented, but enforcement mechanisms are not engineered into pipelines and deployment workflows.
Phase 3: Pilot — One Governed Use Case in Production
Deploy the controls for one model end-to-end. The goal of the pilot is to demonstrate that controls operate as designed and that production approval becomes predictable.Track the metrics that matter to leadership: time from model readiness to production approval, number of sign-off cycles required, incidents prevented by quality controls. These metrics become the business case for scaling.
Common derailment: The pilot measures technical performance but fails to measure deployment speed and approval friction.
Phase 4: Scale — Reusable Control Components
The output of the pilot should be a set of reusable governance components: pipeline templates with embedded quality checks, policy-as-code patterns for API enforcement, lineage capture configurations, semantic layer definitions that can be shared across teams. Integrate with platforms including lakehouses, semantic layers, and API gateways. The goal is that the second model to go through the process takes half the time of the first, because the infrastructure is already in place.
Common derailment: Teams scale the process rather than the infrastructure. Each new model repeats bespoke control implementation instead of reusing shared components, and the third deployment takes as long as the first.
Phase 5: Sustain — Quarterly Control Effectiveness Reviews
Governance decays without deliberate maintenance. Establish a quarterly review cadence with defined thresholds: quality metrics, lineage coverage, AI artifact compliance, policy exception rates. Set escalation criteria—when a metric falls below threshold, who is responsible for remediation and on what timeline. This prevents governance from reverting to periodic documentation exercises.
Common derailment: Reviews focus on documentation rather than whether controls are actually executing as designed.
Implementation alone is not enough; impact must be measured.
Measuring What Matters
Governance investment earns credibility when it is linked to metrics that leadership already tracks. Establish baselines before the pilot, track quarterly, and link directly to OKRs for speed and incentives.
One concrete example of what this produces: a governed model retraining pipeline that enables 40% faster model updates, reducing forecast error by 15% and protecting measurable revenue—not as a projection, but as a documented outcome with a traceable evidence trail. That evidence trail is itself a governance artifact.
Conclusions
The opening scenario in this paper, a customer harmed by an answer retrieved from a superseded document with no way to reconstruct what the system saw or why, is not a technology failure. It is a governance failure, and a preventable one. The model worked and the pipeline ran. However, the data beneath both was ungoverned.
Organizations that close this gap will not simply build better AI. They will operate it with clearer accountability and greater regulatory defensibility.The organizations that do not will continue to build impressive proofs of concept that never reach production. Their models will not necessarily fail, but their data infrastructure was never engineered to make them trustworthy.
The question is no longer whether to invest in executable data governance. It is whether your organization will implement executable governance before the next incident. The authors advise organizations on closing the governance gap between policy and production. Reach out to discuss how these frameworks apply to your AI portfolio.








