Data Correctness Gotchas (Current v2 Line)

This page is still relevant, but it needed a reset. The current v2 framework has added stronger guardrails, scoped MCP configuration, verbose telemetry, and better prompt/runtime metadata. Even with those upgrades, there are still ways to get semantically wrong output without crashing the engine.

This page focuses on the failure modes that still matter in the current repo, what the newer v2 features already improved, and what consumers should still protect at integration time.

What v2 improved already

Compared to the early v2 line, the current framework now reduces several older classes of mistakes:

ce_mcp_tool, ce_mcp_planner, ce_pending_action, and ce_verbose are startup-validated for scope/integrity
MCP scope is explicit (ANY / UNKNOWN), not ambiguous null wildcard behavior
CorrectionStep can keep confirmation/edit/retry turns in-place instead of forcing unnecessary reclassification
ce_verbose and step telemetry make degraded or skipped paths easier to detect
POST_SCHEMA_EXTRACTION and PRE_AGENT_MCP phases provide cleaner rule insertion points
ce_mcp_planner makes MCP prompt selection deterministic by intent/state scope instead of relying only on legacy config

Those are real upgrades, but they do not eliminate correctness risk. They mostly make bad behavior easier to prevent and easier to diagnose.

Highest-risk correctness failures that still exist

Current top correctness risks

Risk	Trigger	Bad output pattern	Fast detection	Practical mitigation
Concurrent same-conversation writes	Two requests hit the same `conversationId` at nearly the same time.	Last-write-wins state/context drift even though each individual turn looks valid.	Compare `updated_at`, audit ordering, and final persisted `ce_conversation` row for overlapping turns.	Serialize by `conversationId` at ingress or add optimistic locking around `ce_conversation`.
Rule collisions produce valid-but-wrong transitions	Multiple `ce_rule` rows match across phases or priorities and mutate intent/state in sequence.	Conversation lands in a reachable state, but not the one the business flow intended.	Review `RULE_MATCH`, `RULE_APPLIED`, and final intent/state across all phases, not just `PRE_RESPONSE_RESOLUTION`.	Keep rule ownership narrow, minimize cross-intent mutations, and regression-test critical paths.
Stale context survives topic or state changes	A new turn merges into existing context without evicting fields that are no longer valid for the new flow.	Old facts leak into prompt rendering, schema completeness checks, or final responses.	Diff `context_json` before and after intent/state changes and look for old keys that should have been dropped.	Define explicit reset rules on transition boundaries and clear stale fields when switching flows.
Consumer exposes too much prompt data	Broad `inputParams` or ad hoc metadata is allowed to influence prompt rendering.	The model produces plausible output based on accidental or sensitive prompt variables.	Inspect rendered prompt inputs during `INTENT_`, `SCHEMA_`, `MCP_PLAN_`, and `RESOLVE_RESPONSE_` LLM paths.	Treat prompt exposure as an allowlist contract and keep internal-only keys out of prompt vars.
Missing response coverage for reachable states	Rules, MCP, or pending actions move the session into a state with no usable `ce_response` / `ce_prompt_template` mapping.	The turn completes but returns fallback, empty, or misleading generic text.	Build trace shows state transition succeeded but response selection falls back or misses expected template/response rows.	Audit every reachable state and make response coverage part of config review.

Subtle traps introduced by newer flexibility

The newer v2 line is more capable, but that also creates new configuration mistakes if consumers are careless.

Modern v2 flexibility traps

Area	What can go wrong	Why it matters now	Recommended guardrail
`CorrectionStep` routing	A prompt row claims to allow `affirm`, `edit`, or `retry`, but the actual state contract is not safe for in-place reuse.	The engine may skip schema extraction or intent resolution when the consumer really needed a full recompute.	Use `interaction_mode` / `interaction_contract` only when the state really supports in-place continuation.
`SET_INPUT_PARAM` rule action	Rules can now mutate request-level values mid-pipeline.	Small config mistakes can alter downstream tool calls, prompt variables, or response behavior in non-obvious ways.	Restrict `SET_INPUT_PARAM` to tightly scoped, auditable keys and keep a change log for those rules.
Scoped MCP rows	A tool/planner row is syntactically valid but scoped too broadly with `ANY` when it should be intent-specific.	The engine stays deterministic, but the business blast radius of a tool becomes larger than intended.	Default to exact intent/state scope first; widen to `ANY` only when the tool is truly global.
`ce_verbose` messages	Verbose rows tell the user or UI a misleading progress story even while the engine remains technically correct.	Support teams may trust the progress text more than the actual state transition.	Treat `ce_verbose` as a tested UI contract, not cosmetic copy.
Prompt renderer power	Shared Thymeleaf rendering now applies across prompts and verbose messages.	Template bugs can affect multiple runtime surfaces instead of just one prompt row.	Lint templates before release and test variable availability per step.

Still-important framework limitations

These are not necessarily bugs. They are design constraints consumers should plan around.

Design limits to plan around

Limitation	Current behavior	Consumer impact	Safer operating posture
No built-in optimistic locking on `ce_conversation`	The entity does not expose a version field for conflict detection.	Concurrent same-ID turns can overwrite each other nondeterministically.	Enforce one active turn per `conversationId` at the API boundary.
Turn work spans multiple step writes	A turn is not wrapped in one global ACID-style business transaction boundary.	Partial artifacts can persist across failures or stop paths.	Make trace review part of incident debugging and prefer compensating logic over hidden assumptions.
Data-driven behavior can still be misconfigured	The framework validates structure, but not every business semantic mistake in rules/prompts/responses.	A startup-clean system can still behave incorrectly for specific conversations.	Test seeded configurations as a product artifact, not just Java code.
Conversation ordering is still a consumer concern at scale	The framework does not provide distributed per-conversation queueing by itself.	Horizontal scale can amplify race and ordering ambiguity if ingress is naive.	Use request serialization, partitioned workers, or upstream coordination.
Tool safety is only partly framework-enforced	Scope checks, MCP next-tool guardrails, and handler models exist, but business authorization remains consumer-defined.	A correctly scoped tool can still expose data or actions beyond policy if the consumer wires it loosely.	Treat tool execution as a security boundary and add business-policy checks in handlers.

What to watch in current traces

If a conversation looks "wrong" but did not crash, these are the fastest places to inspect:

final persisted ce_conversation.intent_code, state_code, context_json
EngineSession.stepInfos via trace output
RULE_MATCH / RULE_APPLIED ordering across phases
ROUTING_DECISION values set by CorrectionStep
context.mcp.lifecycle.* and context.mcp.toolExecution.*
TOOL_ORCHESTRATION_* and MCP_* verbose/audit events
RESOLVE_RESPONSE_SELECTED vs the final user-visible payload

Current hardening checklist

What mature consumers should enforce

Control	Why it matters	Where to implement
Single active turn per conversation	Prevents the most damaging state corruption class in the current framework.	API gateway, controller, queue, or distributed lock layer
Config regression tests	Most modern failures are configuration mistakes, not framework crashes.	Fixture tests around seeded `ce_*` rows
Explicit response coverage audit	Reachable states without responses create misleading fallback behavior.	Pre-release DML review and smoke tests
Tool policy review	Broad `ANY` scope or under-protected handlers can create silent data risk.	`ce_mcp_tool` design + consumer-side handler implementation
Prompt-variable hygiene	The current renderer is powerful enough to amplify accidental variable exposure.	Prompt seeding discipline + consumer metadata filtering

Relevance summary

The old page was directionally useful, but parts of it were anchored to early v2 behavior. The current version of this page is still relevant because the core correctness themes remain:

race conditions are still the biggest operational correctness risk
rule/config drift is still the biggest semantic correctness risk
stale context and incomplete response coverage still produce believable but wrong output

What changed is that current v2 gives you better tools to detect and constrain those issues:

explicit scope validation
richer rule phases
correction routing
verbose runtime signals
step telemetry
scoped MCP planner behavior

Bottom line

Current v2 is safer and more diagnosable than the original release line, but it is still a highly configurable engine. Most production correctness issues now come from configuration design and concurrency policy, not from missing core framework primitives.

What v2 improved already​

Highest-risk correctness failures that still exist​

Current top correctness risks

Subtle traps introduced by newer flexibility​

Modern v2 flexibility traps

Still-important framework limitations​

Design limits to plan around

What to watch in current traces​

Current hardening checklist​