3-Tier Prompt Routing: How Rules-Based Optimization Avoids Unnecessary LLM Calls

The default approach to prompt optimization: send every prompt to a frontier LLM, ask it to improve the text, return the result. It works. It also consumes API budget on prompts that didn't need an LLM call in the first place, adds 1-3 seconds of latency to simple requests, and frequently makes straightforward prompts worse by introducing unnecessary complexity.

Plenty of incoming prompts are basic — clear intent, simple structure, no real ambiguity. Sending those to GPT-4o or Claude Sonnet isn't optimization. It's overhead. (Skip to Practical Results below for the actual measured tier distribution from a 360-prompt production run — it's not the illustrative split used to explain the tiers here.)

The solution is intelligent routing: classify each prompt before optimizing, then send it to the cheapest tier that can handle it correctly.

The Three Tiers

Tier 1: Rules-Based (<10ms)

Deterministic pattern-matching optimization with no LLM involvement. The system applies known transformations for the detected prompt context. A Terraform prompt gets IaC-specific structure enforced. A JSON conversion prompt gets strict field preservation rules applied. A code generation prompt for a specific language gets language-idiomatic formatting enforced.

Routes here when the composite routing score is ≤ 0.40. Latency under 10ms. Zero API cost.

Tier 2: Hybrid (Rules + Targeted LLM Call)

Rules run first and handle the deterministic improvements. A focused LLM call then addresses the parts that genuinely need intelligence — ambiguous phrasing, missing context, structural gaps that rules can't resolve. Lighter than full LLM optimization because the rules absorb the mechanical work.

Routes here when composite score is 0.40–0.85.

Tier 3: Full LLM

Complete LLM optimization with context-aware system prompting. Reserved for complex, expert-level prompts where a full LLM rewrite is genuinely justified — multi-step technical workflows, nuanced meta-prompts, high-stakes content where optimization quality directly affects outcomes.

Routes here when composite score ≥ 0.85.

The Routing Score Formula

Every prompt gets a composite routing score before optimization runs:

composite = (context_weight × 0.5)
      + (sophistication × 0.3)
      + (load_factor × 0.2)

Context Weight (50% of score)

The dominant factor. Derived from context detection confidence. High-confidence image generation prompts score higher toward LLM tier — creative enhancement benefits from LLM reasoning. High-confidence structured output prompts score lower — rules are sufficient and safer. If context detection confidence falls below 0.60, the router falls back to Tier 1 regardless of other signals. Don't apply sophisticated optimization to a prompt you can't confidently categorize.

Sophistication Score (30% of score)

Prompt complexity analysis."Generate a hello world function" is basic."Design a multi-region failover architecture with RPO constraints and runbook procedures" is expert-level. The sophistication detector maps lexical density, structural complexity, technical vocabulary depth, and instruction nesting.

Load Factor (20% of score)

Dynamic routing pressure from system load. Under heavy traffic, the router shifts borderline prompts toward lower tiers to maintain response time guarantees. A prompt that would normally route to Hybrid might route to Rules under peak load.

Value Hierarchy Routing Floors

User-defined value hierarchies can override the routing formula. When a NON-NEGOTIABLE priority is set (e.g.,"output must always include security considerations"), the router floors that prompt at a minimum routing score — ensuring it reaches a tier capable of enforcing the constraint. A HIGH-priority label floors at 0.45. NON-NEGOTIABLE floors at 0.72.

This prevents important prompts from being under-optimized by the cost-saving logic.

Practical Results

The routing system sends most prompts through the rules or hybrid tier instead of a full frontier-model rewrite. The rules tier returns results in under 10ms compared to 1-3 seconds for LLM tiers. Actual tier distribution depends on your prompt mix — in one 360-prompt production run, 26% of prompts resolved on the rules tier with zero LLM call, 74% used the hybrid tier, and less than 1% required the full LLM tier.

The quality tradeoff is minimal: rules-based optimization is deterministic and domain-specific. For prompts where rules are sufficient, rules produce more consistent output than LLM calls that may vary between requests.

Model-Agnostic Architecture

The routing system is independent of which LLM handles the optimization step. You can configure Claude 4.6 for the full LLM tier, GPT-4.1 for hybrid, and rules-only for the base tier — or use any other combination. Switching from one provider to another doesn't require changes to the routing logic.

This matters when LLM pricing changes (which it does frequently). The routing system's cost characteristics are determined by tier distribution, not by which specific model is on the other end.

Building Routing Into Your Own Pipeline

You don't need to use Prompt Optimizer to apply this pattern. The core insight is: classify before you spend. Any LLM pipeline that sends every request to the same endpoint, regardless of complexity, is leaving optimization headroom on the table.

Start with two tiers: a fast path for prompts that match known simple patterns, and a slow path for everything else. Measure how many requests go through each path. You'll find the distribution skews toward simple more than you expect.