Skip to main content
v2

Data Correctness Gotchas (Current v2 Line)

This page is still relevant, but it needed a reset. The current v2 framework has added stronger guardrails, scoped MCP configuration, verbose telemetry, and better prompt/runtime metadata. Even with those upgrades, there are still ways to get semantically wrong output without crashing the engine.

This page focuses on the failure modes that still matter in the current repo, what the newer v2 features already improved, and what consumers should still protect at integration time.

What v2 improved already

Compared to the early v2 line, the current framework now reduces several older classes of mistakes:

  • ce_mcp_tool, ce_mcp_planner, ce_pending_action, and ce_verbose are startup-validated for scope/integrity
  • MCP scope is explicit (ANY / UNKNOWN), not ambiguous null wildcard behavior
  • CorrectionStep can keep confirmation/edit/retry turns in-place instead of forcing unnecessary reclassification
  • ce_verbose and step telemetry make degraded or skipped paths easier to detect
  • POST_SCHEMA_EXTRACTION and PRE_AGENT_MCP phases provide cleaner rule insertion points
  • ce_mcp_planner makes MCP prompt selection deterministic by intent/state scope instead of relying only on legacy config

Those are real upgrades, but they do not eliminate correctness risk. They mostly make bad behavior easier to prevent and easier to diagnose.

Highest-risk correctness failures that still exist

Current top correctness risks

RiskTriggerBad output patternFast detectionPractical mitigation
Concurrent same-conversation writesTwo requests hit the same `conversationId` at nearly the same time.Last-write-wins state/context drift even though each individual turn looks valid.Compare `updated_at`, audit ordering, and final persisted `ce_conversation` row for overlapping turns.Serialize by `conversationId` at ingress or add optimistic locking around `ce_conversation`.
Rule collisions produce valid-but-wrong transitionsMultiple `ce_rule` rows match across phases or priorities and mutate intent/state in sequence.Conversation lands in a reachable state, but not the one the business flow intended.Review `RULE_MATCH`, `RULE_APPLIED`, and final intent/state across all phases, not just `PRE_RESPONSE_RESOLUTION`.Keep rule ownership narrow, minimize cross-intent mutations, and regression-test critical paths.
Stale context survives topic or state changesA new turn merges into existing context without evicting fields that are no longer valid for the new flow.Old facts leak into prompt rendering, schema completeness checks, or final responses.Diff `context_json` before and after intent/state changes and look for old keys that should have been dropped.Define explicit reset rules on transition boundaries and clear stale fields when switching flows.
Consumer exposes too much prompt dataBroad `inputParams` or ad hoc metadata is allowed to influence prompt rendering.The model produces plausible output based on accidental or sensitive prompt variables.Inspect rendered prompt inputs during `INTENT_*`, `SCHEMA_*`, `MCP_PLAN_*`, and `RESOLVE_RESPONSE_*` LLM paths.Treat prompt exposure as an allowlist contract and keep internal-only keys out of prompt vars.
Missing response coverage for reachable statesRules, MCP, or pending actions move the session into a state with no usable `ce_response` / `ce_prompt_template` mapping.The turn completes but returns fallback, empty, or misleading generic text.Build trace shows state transition succeeded but response selection falls back or misses expected template/response rows.Audit every reachable state and make response coverage part of config review.

Subtle traps introduced by newer flexibility

The newer v2 line is more capable, but that also creates new configuration mistakes if consumers are careless.

Modern v2 flexibility traps

AreaWhat can go wrongWhy it matters nowRecommended guardrail
`CorrectionStep` routingA prompt row claims to allow `affirm`, `edit`, or `retry`, but the actual state contract is not safe for in-place reuse.The engine may skip schema extraction or intent resolution when the consumer really needed a full recompute.Use `interaction_mode` / `interaction_contract` only when the state really supports in-place continuation.
`SET_INPUT_PARAM` rule actionRules can now mutate request-level values mid-pipeline.Small config mistakes can alter downstream tool calls, prompt variables, or response behavior in non-obvious ways.Restrict `SET_INPUT_PARAM` to tightly scoped, auditable keys and keep a change log for those rules.
Scoped MCP rowsA tool/planner row is syntactically valid but scoped too broadly with `ANY` when it should be intent-specific.The engine stays deterministic, but the business blast radius of a tool becomes larger than intended.Default to exact intent/state scope first; widen to `ANY` only when the tool is truly global.
`ce_verbose` messagesVerbose rows tell the user or UI a misleading progress story even while the engine remains technically correct.Support teams may trust the progress text more than the actual state transition.Treat `ce_verbose` as a tested UI contract, not cosmetic copy.
Prompt renderer powerShared Thymeleaf rendering now applies across prompts and verbose messages.Template bugs can affect multiple runtime surfaces instead of just one prompt row.Lint templates before release and test variable availability per step.

Still-important framework limitations

These are not necessarily bugs. They are design constraints consumers should plan around.

Design limits to plan around

LimitationCurrent behaviorConsumer impactSafer operating posture
No built-in optimistic locking on `ce_conversation`The entity does not expose a version field for conflict detection.Concurrent same-ID turns can overwrite each other nondeterministically.Enforce one active turn per `conversationId` at the API boundary.
Turn work spans multiple step writesA turn is not wrapped in one global ACID-style business transaction boundary.Partial artifacts can persist across failures or stop paths.Make trace review part of incident debugging and prefer compensating logic over hidden assumptions.
Data-driven behavior can still be misconfiguredThe framework validates structure, but not every business semantic mistake in rules/prompts/responses.A startup-clean system can still behave incorrectly for specific conversations.Test seeded configurations as a product artifact, not just Java code.
Conversation ordering is still a consumer concern at scaleThe framework does not provide distributed per-conversation queueing by itself.Horizontal scale can amplify race and ordering ambiguity if ingress is naive.Use request serialization, partitioned workers, or upstream coordination.
Tool safety is only partly framework-enforcedScope checks, MCP next-tool guardrails, and handler models exist, but business authorization remains consumer-defined.A correctly scoped tool can still expose data or actions beyond policy if the consumer wires it loosely.Treat tool execution as a security boundary and add business-policy checks in handlers.

What to watch in current traces

If a conversation looks "wrong" but did not crash, these are the fastest places to inspect:

  • final persisted ce_conversation.intent_code, state_code, context_json
  • EngineSession.stepInfos via trace output
  • RULE_MATCH / RULE_APPLIED ordering across phases
  • ROUTING_DECISION values set by CorrectionStep
  • context.mcp.lifecycle.* and context.mcp.toolExecution.*
  • TOOL_ORCHESTRATION_* and MCP_* verbose/audit events
  • RESOLVE_RESPONSE_SELECTED vs the final user-visible payload

Current hardening checklist

What mature consumers should enforce

ControlWhy it mattersWhere to implement
Single active turn per conversationPrevents the most damaging state corruption class in the current framework.API gateway, controller, queue, or distributed lock layer
Config regression testsMost modern failures are configuration mistakes, not framework crashes.Fixture tests around seeded `ce_*` rows
Explicit response coverage auditReachable states without responses create misleading fallback behavior.Pre-release DML review and smoke tests
Tool policy reviewBroad `ANY` scope or under-protected handlers can create silent data risk.`ce_mcp_tool` design + consumer-side handler implementation
Prompt-variable hygieneThe current renderer is powerful enough to amplify accidental variable exposure.Prompt seeding discipline + consumer metadata filtering

Relevance summary

The old page was directionally useful, but parts of it were anchored to early v2 behavior. The current version of this page is still relevant because the core correctness themes remain:

  • race conditions are still the biggest operational correctness risk
  • rule/config drift is still the biggest semantic correctness risk
  • stale context and incomplete response coverage still produce believable but wrong output

What changed is that current v2 gives you better tools to detect and constrain those issues:

  • explicit scope validation
  • richer rule phases
  • correction routing
  • verbose runtime signals
  • step telemetry
  • scoped MCP planner behavior
Bottom line

Current v2 is safer and more diagnosable than the original release line, but it is still a highly configurable engine. Most production correctness issues now come from configuration design and concurrency policy, not from missing core framework primitives.