Why Federal AI Pilots Are Finally Converting to Programs of Record

Federal agencies spent five years running AI pilots that rarely converted to funded, baseline programs — a pattern the GAO documented exhaustively and the Office of Management and Budget acknowledged as a structural failure of acquisition and governance design. The 2026 environment is different: updated procurement frameworks, revised FedRAMP AI pathways, and explicit OMB guidance on AI program-of-record thresholds have produced a conversion rate of 41% from pilot to program, up from below 10% in 2022. The $14 billion in federal AI outlays for FY2026 is the downstream consequence of a bureaucratic infrastructure that finally supports the technology.

FEDERAL AI OUTLAYS FY2026

$14B

↑ vs $6.2B FY2023 baseline

PILOTS CONVERTED TO PROGRAMS OF RECORD

41%

↑ vs sub-10% conversion rate through FY2023

AVG TIME-TO-PROGRAM

24 mo

↓ from 48-60 mo under pre-2025 acquisition rules

COMPLIANCE OFFICER PRODUCTIVITY GAIN

8.7%

↑ AI-assisted regulatory review vs manual baseline

The Pilot Trap That Cost Five Years

The federal government's relationship with AI from 2018 to 2023 was characterized by a structural pathology that CSET at Georgetown documented in detail: agencies had neither the acquisition vehicles nor the authorization frameworks to move AI from experimentation to baseline operations. Pilots were funded through research and development appropriations or Other Transaction Agreements that explicitly limited their scope and duration. When a pilot succeeded — and many did — there was no clear procurement pathway to a follow-on production contract, no established FedRAMP authorization process for AI systems, and no OMB guidance on what evidence of AI system reliability satisfied the requirements of the Federal Acquisition Regulation for operational deployment.

The GAO's AI Adoption Report for 2026 documents that between FY2018 and FY2022, fewer than 10% of federally funded AI pilots converted to programs of record within five years of pilot initiation. The remainder were either abandoned, repeated in modified form under different funding authorities, or persisted as permanent pilots — funded year to year, never baseline-ized, never institutionalized. The sunk cost across the federal government in never-converted AI pilots is not precisely calculable, but CSET's estimate of $3-4 billion in unrealized value from unconverted pilots is a reasonable order of magnitude.

What Changed in the Regulatory and Procurement Environment

Three specific changes in the 2024-2025 regulatory environment explain the conversion rate improvement from below 10% to 41%.

First, the FedRAMP AI Authorization Pathway, finalized in late 2024, established a specific authorization track for AI and machine learning systems that acknowledges the iterative, version-based nature of AI software — a departure from the static system authorization model that had made FedRAMP compliance prohibitively slow for AI vendors. Systems can now receive a provisional ATO with documented model update procedures, rather than requiring a full reauthorization for each model version. This reduced the time from pilot-to-authorization from an average of 28 months to approximately 11 months for systems with established security documentation.

Second, OMB's 2025 AI Acquisition Memo established specific evidence thresholds — accuracy benchmarks, bias audit requirements, and human-in-the-loop documentation standards — that agencies must demonstrate before transitioning an AI capability to a program of record. Paradoxically, this increased the conversion rate by giving agencies a clear finish line. Previously, pilots existed in regulatory ambiguity about what level of validation was sufficient for operational deployment. The OMB memo defined the bar, and agencies could plan toward it.

Third, the General Services Administration's AI contract vehicle — a pre-competed IDIQ that provides acquisition-ready task order vehicles for AI professional services and platform licensing — eliminated the need for individual agencies to conduct full and open competitions for AI capabilities that had already been competed at the governmentwide level. GovExec's Federal AI Survey found that agencies using the GSA vehicle are reducing time-to-contract by an average of 9 months.

The $14 Billion Baseline and What It Funds

The $14 billion in federal AI outlays for FY2026 represents a more than doubling of the $6.2 billion baseline in FY2023, but the composition is as important as the magnitude. GAO's analysis shows that approximately 60% of FY2026 outlays are in four agency clusters: Department of Defense (procurement optimization, logistics AI, intelligence analysis automation), Department of Health and Human Services (Medicare and Medicaid fraud detection, clinical AI in VA healthcare), Internal Revenue Service (tax compliance and audit selection AI), and the Social Security Administration (benefits processing and case management AI).

The concentration is not accidental. These are the agencies that had the most mature data infrastructure, the most defined use cases, and the most measurable performance baselines against which to evaluate AI system performance — the three preconditions for successful program conversion that CSET's research identified as most predictive of pilot-to-program success.

The 8.7% productivity gain in compliance officer roles — drawn from early performance data in IRS and HHS AI deployments — reflects the value available in regulatory review and classification work. AI systems that assist compliance officers in reviewing regulatory documentation, classifying submissions against applicable requirements, and flagging anomalies for human review are producing measurable throughput improvements that justify the program investment in terms the appropriations process recognizes.

The GAO Oversight Dynamic

The GAO's heightened AI oversight posture — reflected in the AI Adoption Report and in ongoing work requirements in the FY2026 National Defense Authorization Act — is a governance dynamic that program managers at converting agencies must account for. The GAO is conducting performance audits on AI programs of record with a specific focus on three risk areas: training data representativeness and bias, model performance degradation over time, and human override documentation.

Agencies that have converted pilots to programs of record without establishing monitoring and evaluation frameworks for these three risk areas are likely to receive GAO findings in FY2026 and FY2027 audits. CSET's research identifies model performance monitoring as the most common gap: agencies that validated AI system performance at authorization have not established ongoing performance measurement processes to detect degradation as the operating environment changes. The GAO finding for a degraded AI system that agencies did not detect will be more damaging to the program of record than a measured, managed performance issue disclosed proactively.

The 24-Month Time-to-Program and Its Implications for Vendor Strategy

The 24-month average time from pilot initiation to program of record status — down from 48-60 months in the pre-2025 environment — has commercial implications for vendors in the federal AI market. A 24-month cycle is fast enough for venture-backed AI companies to sustain themselves through the federal sales process; a 48-month cycle was not, which is why the federal AI market prior to 2025 was dominated by large defense contractors and system integrators rather than commercial AI product companies.

GovExec's survey shows that the fastest-converting AI pilots in FY2025 were commercial products from mid-size AI vendors that had invested in FedRAMP authorization concurrently with their first federal pilot — essentially treating FedRAMP as a sales channel qualification rather than a post-contract burden. That investment posture is now standard advice from federal market advisors, whereas three years ago it was considered prohibitively expensive for companies below $100 million in revenue.

The Takeaway

Federal agency CIOs and program managers with AI pilots in the pipeline should engage their acquisition and legal teams now on the OMB evidence thresholds and FedRAMP pathway requirements — not at the conclusion of the pilot, but at the outset. The 24-month time-to-program is achievable only for pilots that were designed from initiation with the program-of-record evidence requirements in view. Pilots designed exclusively to demonstrate technical feasibility and then retrofitted for operational authorization will not achieve the 24-month conversion timeline, and the conversion rate improvement will bypass them.

Figure 10. Federal AI pipeline funnel showing pilot initiation, FedRAMP authorization, OMB evidence threshold, and program-of-record conversion stages with timeline and conversion rates at each stage for FY20…