MCP Semantic Query (`db.semantic.*`)

This page documents the semantic-query engine under: com.github.salilvnair.convengine.engine.mcp.query.semantic

It is the Text-to-Intent-to-SQL architecture for ConvEngine MCP mode.

ConvEngine utilizes a strict 2-Agent Pipeline architecture for semantic querying. This creates a powerful firewall between natural language understanding and physical schema generation, eliminating classic text-to-SQL hallucinations and increasing success rates significantly.

Agent 1 (db.semantic.interpret): Responsible strictly for understanding user intent based on your DB semantic model tables (ce_semantic_*). Yields a CanonicalIntent JSON or asks for clarification.
Agent 2 (db.semantic.query): Responsible strictly for compiling SQL safely. Takes the CanonicalIntent, maps it to database dialects with named parameters, utilizing guardrails and memory embeddings.

Runtime request flow

1Conversation API

2Engine Boot

3Intent Resolution

4McpPlanner Allocation

5db.semantic.interpret (Agent 1)

6db.semantic.query (Agent 2)

7Postgres Execution

8Synthesis & Output

Packages & Core Dependencies

Package responsibility map

Package	What it holds	Examples
contract/*	Internal cross-agent data contracts	CanonicalIntent, SemanticToolMeta, SemanticInterpretResponse, SemanticQueryResponseV2
handler/*	MCP Tool entry points	DbSemanticInterpretToolHandler, DbSemanticQueryToolHandler
service/*	Core agent logic	SemanticInterpretService, SemanticLlmQueryService
model/*	Semantic model loading and registry	SemanticModel, SemanticModelLoader, SemanticModelRegistry
feedback/*	SQL Failure/Correction embeddings loop	SemanticFailureFeedbackService, SemanticFailureRecord
embedding/*	Vector search implementations	SemanticEmbeddingDbService, SemanticQueryFailureRepository

Core libraries

DB-backed semantic overlay: Runtime semantic model is assembled from ce_semantic_* tables into SemanticModel Java objects.
JDBC/named parameters: SQL is executed entirely via bound parameters (no string concatenation of literals).

Semantic Model (DB-driven)

Semantic mode relies on DB semantic model tables (ce_semantic_*) that map business language to physical schema. Agent-1 uses this model to enforce business rules; Agent-2 uses it to map fields to real columns.

Semantic section ownership

Section	Primary consumers	Runtime effect
settings	author info	timezone, limits, SQL dialect hints
entities	Agent 1 & 2	business entities, synonyms, field validations, example questions
tables	Agent 2	physical columns and foreign keys for compile constraints
relationships	Agent 2	table traversal edges and cardinality hints
join_hints	planner/compiler	frequent join relationships
metrics	Agent 1 & 2	allowed arbitrary metric keys and SQL expressions
value_patterns	Agent 1	global value-prefix-driven field remap before validation
synonyms	Agent 1	domain query vocabulary expansion
intent_rules	Agent 1/Normalizer	deterministic query intent overrides (e.g., force_entity, force_select)
rules	policy guardrail	allowed tables, denied operations, max limit hints

Model source

Semantic model is loaded from ce_semantic_* tables (no YAML file path).

Semantic Section Breakdown

Below is a breakdown of how ConvEngine maps DB semantic sections into runtime constraints.

Important loader behavior

SemanticModelLoader is configured with FAIL_ON_UNKNOWN_PROPERTIES=false. Unknown model attributes not mapped by runtime loaders are ignored safely at load time. This allows you to store extra context in DB semantic model tables for the LLMs without breaking the Java parsers!

Model identity fields

Snippet: semantic model identity

YAML

At runtime, version is emitted in stage output metadata so operators can verify exactly which semantic model revision generated the answer. The description text is used as grounding context for the LLM interpretation.

entities block

Snippet: one entity contract

YAML

How each child tag is used:

Entity tag behavior

Tag	Used by	Effect on AST/SQL
entities.<name>	Agent-1	Becomes candidate entity for CanonicalIntent bounds.
description	Agent-1 Prompt	Improves LLM context on what the entity represents.
tables.primary	Agent-2 SQL	Primary base table candidate for FROM clause.
fields.<fieldName>.column	Agent-2 SQL	Maps semantic field key to physical table.column in SQL.
fields.<fieldName>.aliases	Agent-1	Maps LLM variants (e.g. status) to canonical semantic field keys.
fields.<fieldName>.allowed_values	Agent-1	Aids the LLM in understanding what constants live in the database so it generates cleaner filters.

tables + relationships

Snippet: tables and relationships

YAML

What this does:

Provides metadata for syntax validation.
SQL compiler uses edges to assist traversing tables reliably.

metrics

Snippet: metric definition

YAML

What this does:

Metric key becomes allowed value in AST output for aggregates.
Metric SQL string is literally inserted by the Agent-2 compiler when aggregation is requested.

value_patterns + intent_rules

Snippet: advanced mapping

YAML

What this does at runtime:

Global value_patterns apply deterministic filter-field remap based on value prefix (e.g., requestId=DON9001 -> disconnectOrderId=DON9001) before compilation saving hundreds of failed DB scans.
intent_rules gate activation through match_any + must_contain.
force_entity overrides LLM-selected entity.
enforce_where injects deterministic WHERE filters into AST without requiring LLM inference.

Full Pipeline Deep Dive

This section is intentionally written as a turn-by-turn debug timeline (like stepping through IntelliJ breakpoints) so you can see exactly where control moves and why.

UI -> ConversationController (request entry)

Frontend UI

show failed disconnect requests for DON9001 in last 24 hours

Backend API

ConversationController receives request, creates EngineContext, and passes control to ConversationalEngine.

ConversationController.java (entry point)

// REST entry: one turn in, one deterministic engine result out.
@PostMapping("/message")
public ResponseEntity<ConversationResponse> message(@RequestBody ConversationRequest request) {
    // 1) Convert transport payload -> engine-native request context.
    EngineContext ctx = engineContextFactory.from(request);
    // 2) Initial audit write so failures still keep traceability.
    auditService.stage(ctx, "REQUEST_RECEIVED");
    // 3) Hand-off to engine runtime pipeline (next breakpoint location).
    EngineResult result = conversationalEngine.process(ctx);
    // 4) Convert engine result -> API response DTO for UI.
    return ResponseEntity.ok(responseMapper.toApi(result));
}

Control then moves to DefaultConversationalEngine.process(...).

DefaultConversationalEngine -> Pipeline bootstrap

DefaultConversationalEngine.java (process)

public EngineResult process(EngineContext context) {
    // 1) Create per-turn mutable execution state.
    EngineSession session = sessionFactory.open(context);
    // 2) Load recent conversation history for multi-turn continuity.
    conversationLoader.hydrateHistory(session, 10);
    // 3) Execute canonical pipeline steps in order.
    pipelineExecutor.execute(session);
    // 4) Persist final state + payload snapshot.
    conversationPersister.persist(session);
    return session.toResult();
}

This is where pipeline stages start advancing: intent resolution, rules, planner, and semantic tool chain.

IntentResolutionStep -> MCP planner decision

IntentResolutionStep

intent: SEMANTIC_QUERYstate: ANALYZE

Classifier + rules set semantic path for this turn.

McpPlanner

intent: SEMANTIC_QUERYstate: MCP_PLAN

Planner chooses chained tools: db.semantic.interpret -> db.semantic.query -> postgres.query

At this point normal response rendering is paused and semantic tool orchestration starts.

Agent-1 (`db.semantic.interpret`) semantic retrieval + canonical intent

DbSemanticInterpretToolHandler.java -> SemanticInterpretService.java

public SemanticInterpretResponse execute(InterpretToolInput input, EngineSession session) {
    // 1) Load semantic model + DB semantic metadata.
    SemanticRuntimeContext runtime = semanticModelProvider.load(session);
    // 2) Hybrid retrieval: lexical + synonym + vector embedding blend.
    RetrievalResult retrieval = semanticRetriever.rank(input.question(), runtime);
    // 3) Build LLM prompt from top entities/query-classes + constraints.
    Prompt prompt = interpretPromptFactory.build(input.question(), retrieval, runtime);
    // 4) Force structured JSON output mapped to CanonicalIntent.
    CanonicalIntent canonicalIntent = llmClient.structured(prompt, CanonicalIntent.class);
    // 5) Clarification gate: if confidence below threshold, ask user instead of guessing.
    return responseBuilder.from(canonicalIntent, retrieval);
}

Example embedding retrieval rows (Agent-1 debug view)

table	target_type	target_name	score	why selected
ce_semantic_concept_embedding	ENTITY	DisconnectRequest	0.92	Strong vector + synonym overlap with 'disconnect request'
ce_semantic_concept_embedding	FIELD	disconnectOrderId	0.88	Value token DON9001 matches ID pattern and field aliases
ce_semantic_concept_embedding	QUERY_CLASS	FAILURE_ANALYSIS	0.83	Phrase 'failed ... last 24 hours' aligns with failure query class

Agent-1 semantic tables used in this stage

Table	Used for
ce_semantic_concept	Core business concepts and intent/entity anchors
ce_semantic_synonym	Natural language expansion to canonical terms
ce_semantic_query_class	Allowed query-shape contracts
ce_semantic_mapping	Field-level semantic -> physical mapping hints
ce_semantic_ambiguity_option	Deterministic clarification choices
ce_semantic_concept_embedding	Vector retrieval candidates and ranking
ce_semantic_entity	Entity override patches without code
ce_semantic_relationship	Relationship override patches without code
ce_semantic_join_hint	Join guidance for complex paths
ce_semantic_value_pattern	Pattern-based value-to-field reassignment

CanonicalIntent finalization + mapping resolution

Agent-1 output

intent: SEMANTIC_QUERYstate: SEMANTIC_INTERPRET

Canonical intent is emitted with entity, filters, time range, sort, and safe limit.

Field mapping selected before Agent-2 SQL generation

semantic field	physical map from ce_semantic_mapping	selected
disconnectOrderId	zp_disconnect_order.zp_disconnect_order_id	YES
requestStatus	zp_disco_request.zp_request_status	YES
requestedAt	zp_disco_request.zp_requested_at	YES

Agent-2 (`db.semantic.query`) SQL compile, guardrails, retries

DbSemanticQueryToolHandler.java -> SemanticLlmQueryService.java

public SemanticQueryResponseV2 execute(QueryToolInput input, EngineSession session) {
    // 1) Fast pre-guard: reject any write intent (INSERT/UPDATE/DELETE/etc).
    sqlReadOnlyGuard.assertReadOnlyIntent(input);
    // 2) Compile SQL + params from CanonicalIntent + mappings + query class.
    SemanticCompiledSql compiled = semanticLlmQueryService.compile(input.canonicalIntent(), session);
    // 3) Validate SQL policy (table allowlist, select-only, limits, join hops).
    guardrailService.validate(compiled, session);
    // 4) Retry loop with failure memory if DB validation/execution fails.
    for (int attempt = 1; attempt <= 3; attempt++) {
        try { return responseBuilder.success(compiled); }
        catch (Exception dbError) {
            semanticFailureFeedbackService.recordAndRetrieve(dbError, compiled, session);
            compiled = semanticLlmQueryService.rewriteWithFailureContext(compiled, dbError, session);
        }
    }
    return responseBuilder.failure("Semantic SQL compilation failed after retries");
}

Agent-2 failure-memory (`ce_semantic_query_failures`) debug snapshot

id	question	failing_sql	corrected_sql	similarity	used_in_retry
901	failed disconnect for DON IDs	join on wrong key	join zp_disco_request_id = zp_disconnect_order.zp_disco_request_id	0.86	YES
902	last 24h failed requests	missing timestamp cast	requested_at >= now() - interval '24 hours'	0.81	YES

Postgres execution + final assistant payload

postgres.query

intent: SEMANTIC_QUERYstate: POSTGRES_QUERY_EXECUTION

Parameterized read-only SQL executes with JDBC named params.

{}

{
  "params": {
    "disconnectOrderId": "DON9001",
    "status": "FAILED",
    "limit": 100
  },
  "rowCount": 7,
  "sql":
  select r.zp_disco_request_id, r.zp_request_status,
         r.zp_requested_at
  from zp_disco_request r
  JOIN zp_disconnect_order o
    ON o.zp_disco_request_id = r.zp_disco_request_id
  WHERE o.zp_disconnect_order_id = :disconnectOrderId
    AND r.zp_request_status = :status
    AND r.zp_requested_at >= NOW() - INTERVAL '24 hours'
  ORDER BY r.zp_requested_at DESC
  LIMIT :limit
}

Tables

business tables (R)
ce_audit (W)
ce_semantic_query_failures (W on failure only)

McpPlanner -> ResponseResolution

intent: SEMANTIC_QUERYstate: COMPLETED

Found 7 failed disconnect requests for DON9001 in the last 24 hours. Latest failure at 2026-03-13 10:12 UTC.

Execution is now complete and control returns to ConversationController, which returns the final API response to the UI.

Semantic Stage Matrix (who reads what)

Semantic stage -> table usage map

Stage	Primary Java classes	Key tables used
SEMANTIC_INTERPRET (Agent-1)	DbSemanticInterpretToolHandler, SemanticInterpretService	ce_semantic_concept, ce_semantic_synonym, ce_semantic_query_class, ce_semantic_mapping, ce_semantic_ambiguity_option, ce_semantic_concept_embedding, ce_semantic_entity, ce_semantic_relationship, ce_semantic_join_hint, ce_semantic_value_pattern
SEMANTIC_QUERY (Agent-2)	DbSemanticQueryToolHandler, SemanticLlmQueryService	ce_semantic_mapping, ce_semantic_query_class, ce_config, ce_semantic_query_failures
POSTGRES_QUERY_EXECUTION	PostgresQueryToolHandler	Business domain tables (read-only for semantic flow)
SEMANTIC_FAILURE_FEEDBACK	SemanticFailureFeedbackService	ce_semantic_query_failures (record + retrieve corrected SQL patterns)

The Java Data Contracts

All stage payloads map perfectly to explicit Java Records, allowing type-safe API boundaries for front-ends.

Contracts (Java records)

JAVA

Configuration Settings

`application.yml` Toggles

application.yml

YAML

convengine:
  mcp:
    enabled: true
    db:
      query:
        mode: semantic
      semantic:
        enabled: true
        tool-code: db.semantic.query
        # DB-driven semantic model tables (`ce_semantic_*`) are used; no model-path file required
        default-limit: 100
        max-limit: 500
        clarification-threshold: 0.80 # Generates Clarify intent if LLM confidence misses
        llm_retry_max_attempts: 3

`ce_config` Prompts and Constants

Semantic LLM Variables

Config Key	Effect
mcp.db.semantic.interpret.temperature	LLM temperature for Agent-1. Keep 0.00 for structural safety.
mcp.db.semantic.query.temperature	LLM temperature for Agent-2. Keep 0.00 for deterministic compilation.
mcp.db.semantic.interpret.model	Model override mapping for Agent-1 (e.g. claude-3-5-sonnet).
mcp.db.semantic.query.model	Model override mapping for Agent-2 (e.g. gpt-4o).

MCP Semantic Query (`db.semantic.*`)

Runtime request flow

Packages & Core Dependencies

Package responsibility map

Core libraries

Semantic Model (DB-driven)

Semantic section ownership

Semantic Section Breakdown

Model identity fields

entities block

Entity tag behavior

tables + relationships

metrics

value_patterns + intent_rules

Full Pipeline Deep Dive

UI -> ConversationController (request entry)

DefaultConversationalEngine -> Pipeline bootstrap

IntentResolutionStep -> MCP planner decision

Agent-1 (`db.semantic.interpret`) semantic retrieval + canonical intent

Example embedding retrieval rows (Agent-1 debug view)

Agent-1 semantic tables used in this stage

CanonicalIntent finalization + mapping resolution

Field mapping selected before Agent-2 SQL generation

Agent-2 (`db.semantic.query`) SQL compile, guardrails, retries

Agent-2 failure-memory (`ce_semantic_query_failures`) debug snapshot

Postgres execution + final assistant payload

Semantic Stage Matrix (who reads what)

Semantic stage -> table usage map

The Java Data Contracts

Configuration Settings

`application.yml` Toggles

`ce_config` Prompts and Constants

Semantic LLM Variables

Test this API with the Agent Boundaries

Agent-1 Ambiguity Testing (Trigger Clarification)

Agent-2 Security Boundary Testing (Should Reject)

Runtime request flow​

Packages & Core Dependencies​

Package responsibility map

Core libraries​

Semantic Model (DB-driven)​

Semantic section ownership

Semantic Section Breakdown​

Entity tag behavior

Full Pipeline Deep Dive​

Example embedding retrieval rows (Agent-1 debug view)

Agent-1 semantic tables used in this stage

Field mapping selected before Agent-2 SQL generation

Agent-2 failure-memory (`ce_semantic_query_failures`) debug snapshot

Semantic Stage Matrix (who reads what)​

Semantic stage -> table usage map

The Java Data Contracts​

Configuration Settings​

application.yml Toggles​

ce_config Prompts and Constants​

Semantic LLM Variables

Test this API with the Agent Boundaries

Agent-1 Ambiguity Testing (Trigger Clarification)

Agent-2 Security Boundary Testing (Should Reject)

Runtime request flow

Packages & Core Dependencies

Core libraries

Semantic Model (DB-driven)

Semantic Section Breakdown

Full Pipeline Deep Dive

Semantic Stage Matrix (who reads what)

The Java Data Contracts

Configuration Settings

`application.yml` Toggles

`ce_config` Prompts and Constants