Skip to main content
v1

Improvement Backlog

This page turns the correctness and scalability recommendations into implementation-ready work items. Use it as the execution plan after reading Data Correctness Gotchas.

Phase 1: Correctness-critical tickets

P0/P1 backlog

TicketProblemImplementationAcceptance criteriaEffort
CE-001: State transition guardModel output can force invalid `intent/state` jumps.Add transition validator before final persistence; block or quarantine disallowed transitions.Invalid transitions never persist; blocked transitions produce explicit error stage and fallback response.M
CE-002: Conversation optimistic lockingConcurrent same-conversation writes produce last-write-wins corruption.Add `version` column and optimistic lock handling on conversation updates.Conflicting parallel updates produce deterministic conflict behavior (retry or fail with known code).M
CE-003: LLM context lifecycle safetyThreadLocal LLM context can leak across pooled threads if not cleared.Wrap every `LlmInvocationContext.set(...)` in `try/finally` + `clear()`.No stale context observed under stress test with mixed conversation IDs.S
CE-004: Prompt variable allowlistAll input params are currently exposable to prompt rendering.Introduce allowlist + redaction for sensitive/unexpected keys.Only approved prompt keys appear in rendered prompt payloads.M
CE-005: Stale context eviction rulesPartial schema merges can keep incompatible old fields.Add per-intent/state field-retention policy and evict on transitions.Transition tests show old incompatible fields are removed deterministically.M

Phase 2: Operability and convenience tickets

Developer and operator UX

TicketProblemImplementationAcceptance criteriaEffort
CE-006: Config lint and dry-runBroken rules/prompts are discovered too late.Add validator command/endpoint for response mapping coverage, unresolved vars, rule loops, MCP safety checks.Invalid config sets fail lint in CI and are blocked from promotion.M
CE-007: Deterministic replay toolWrong-output incidents are hard to reproduce.Replay conversation turns against frozen config snapshot and compare expected vs actual transitions.At least one production incident can be replayed locally with identical state progression.M-L
CE-008: Scenario test harnessManual QA misses edge-path regressions.Add fixture-driven conversation tests (turn sequence + expected intent/state/output assertions).Regression suite catches known sticky-intent, rule-collision, and reset-flow bugs.M
CE-009: Transition map generatorState machine behavior is opaque to integrators.Generate graph from rules/responses/schema transitions with dead-end warnings.Docs include generated transition map and dead-end detection report.S-M

Phase 3: Scalability tickets

Throughput and horizontal scale

TicketProblemImplementationAcceptance criteriaEffort
CE-010: Hot-path query refactor`findAll().stream()` in request path degrades with config size.Replace with indexed query methods for response/template/schema selection.P95 latency remains stable when control-plane rows scale 10x.M
CE-011: Config cache with version invalidationRepeated config reads increase latency variability.Add cache per intent/state with invalidation on config mutation.Cache hit ratio > 90% in steady state without stale-config incidents.M-L
CE-012: Per-conversation execution serializationConcurrent turns create races as scale increases.Route requests by conversation key to single active worker/partition.No race-induced state drift in concurrency stress tests.L
CE-013: Canonical turn storeHistory reconstructed from audit can be incomplete/noisy.Persist normalized user/assistant turns and switch history provider to it.History quality checks pass even when audit levels change.M-L
CE-014: Bounded enrichment budgetsOptional enrichments can inflate synchronous latency.Apply strict timeout budget for container/MCP enrichments and degrade gracefully.SLO maintained under downstream slowdown with deterministic fallback behavior.M
  1. CE-003 and CE-001 first (cheap/high impact correctness guards).
  2. CE-002 before any high-concurrency scale work.
  3. CE-004 and CE-005 before prompt/template expansion.
  4. CE-006 + CE-008 to stop regressions while refactoring.
  5. CE-010 and CE-011 to stabilize throughput.
  6. CE-012 for horizontal scale and race elimination.
  7. CE-013 and CE-014 for long-term quality and latency control.

Done criteria for the program

Exit gates

GateTarget
CorrectnessNo illegal transition persistence in test suite + canary runtime.
ConcurrencyNo race-induced state drift under parallel same-conversation load tests.
SecurityPrompt exposure allowlist and MCP safety policy enforcement enabled by default.
ScalabilityStable p95/p99 under 10x config growth and peak expected QPS.
OperabilityConfig lint, replay, and scenario tests integrated into release workflow.
How to use this backlog

Treat each ticket as a tracked ADR-backed change. For every ticket: define owner, rollout guardrails, migration/rollback plan, and evidence artifact (test report or benchmark).