Prompt Version Control: Manage, Roll Back, and Govern AI Prompts Like Code

Most teams manage their AI prompts the way they managed configuration before infrastructure-as-code: scattered across documents, hardcoded in source files, edited in place with no history. Then a well-intentioned change breaks production. There is no audit trail. No rollback. Debugging happens by memory.

Prompts are code. They determine how your application behaves. A changed prompt is a changed deployment. Treating prompt management differently from code management is a gap that closes itself painfully, usually at the worst time.

Here is the governance model that prevents that incident.

Human-Readable Slugs, Not UUIDs

Every prompt template gets a URL-safe slug generated automatically on creation:

"Optimize Code Generation" + id=abc1def2
→"optimize-code-generation-abc1def2"

Format: {title-kebab}-{8-char-uuid-prefix}. Human-readable but collision-resistant. When this slug appears in a log, you know what ran. When a UUID like 3f8a2b1c-4e9d-... appears in a log, you know nothing.

Runtime delivery uses slugs as stable identifiers. Your application code references prompts/optimize-code-generation-abc1def2. The slug survives template title changes, environment migrations, and team handoffs.

Immutable Version Snapshots

Every update to a template snapshots the current state before overwriting. Snapshots are stored as full JSONB objects — not diffs, not deltas. The complete template content at every version is queryable.

The governance API:

GET /api/v1/templates/{id}/versions    # full history, DESC
GET /api/v1/templates/{id}/versions/{n}  # specific snapshot
POST /api/v1/templates/{id}/rollback/{n}  # restore from snapshot

Rollback is atomic. It creates a new version containing the restored content rather than mutating history. The incident trail is preserved — you can see that a rollback happened and when.

Nothing is ever destroyed. A production incident at 2am becomes: identify when the bad change went in, note the version number, call rollback, done in 30 seconds.

Environment Scoping

Templates are tagged to an environment: development, staging, or production. The delivery API respects environment when serving templates.

Development experiments don't accidentally get served to production users. Staging iterations don't affect live workflows. The runtime knows which environment it is operating in and requests the corresponding template version.

This is the same pattern as environment variables, feature flags, and configuration management — applied consistently to prompts. Teams that have invested heavily in config management but treat prompts as plaintext strings have a gap here.

The State Machine

Templates move through a defined lifecycle:

draft → published → archived

Draft: work in progress, not served by the delivery API. Safe to iterate without affecting running applications.

Published: live. The delivery API serves this version to requests. Publishing is an explicit action, not automatic.

Archived: retired but queryable. You don't delete prompts — you archive them. Archived templates remain in version history and can be queried for audit purposes. An archived template cannot be accidentally served.

The publish action sets the state to published and records a published_at timestamp and published_by identifier. Audit trail built in.

HMAC-Signed Webhooks on Update

Every template update fires a signed webhook to the configured endpoint:

POST {webhook_url}
X-Signature-256: sha256={hex_digest}
{
"event":"template.updated",
"slug":"optimize-code-gen-abc1def2",
"version": 13,
"environment":"production"
}

Signature uses HMAC-SHA256 with the template's webhook secret. Downstream systems verify the signature before acting on the payload.

Webhook delivery is fire-and-forget via asyncio.create_task — it never blocks the API response. If the webhook endpoint is down, the template update succeeds regardless.

What connects to these webhooks: CI/CD pipelines that re-run prompt tests when a template changes, dashboards that show current template state, Slack alerts when production templates are modified. Any system that needs to react to prompt changes gets a real-time signal.

Variable Interpolation

Templates use {{variable}} syntax. The delivery API interpolates on request:

POST /api/v1/prompts/optimize-code-gen-abc1def2/compiled
{
"language":"Python",
"task":"reverse a linked list"
}
→ compiled prompt with variables filled

The interpolation engine uses StrictUndefined — a missing variable surfaces as an error, not a silently blank string. This catches template mismatches at the API boundary, not in downstream behavior where they're harder to trace.

The template delivery endpoint also returns the variable list for any template, so client applications can discover what variables a template expects before constructing requests.

Why This Matters for Model-Agnostic Teams

LLM model switching is increasingly common — pricing changes, capability improvements, and availability issues all drive provider changes. When teams switch models, prompts often need adjustment. The question is whether you can make those adjustments with confidence.

With governance in place: create a new draft version, test in staging against the new model, publish to production, keep the previous version available for rollback if needed. Every step is auditable. Every change is reversible.

Without governance: edit the string in production, deploy, hope the new model handles it the same way, debug by observation when it doesn't.

The governance system also integrates with the evaluation framework — when a template is updated, the quick-evaluate endpoint can run the new version against a test set before publishing. Merge gates for prompts.

Prompt Version Control: How to Manage, Roll Back, and Govern AI Prompts Like Code