Semantic Query Deep Dive
Text -> Intent -> SQLDB semantic model (
ce_semantic_*)Read-only + validation + retryOne-line mental model
Semantic runtime is not direct text-to-SQL.
It is Question -> CanonicalIntent -> constrained SQL -> execution -> answer.
Pipeline Architecture
Semantic Runtime
Planner -> Interpret -> Query -> Postgres -> Answer
Minimal anchors
Chat + Trace
| What to look at | Why it matters |
|---|---|
| Tool call order + stage transitions | Tells you where behavior diverged |
Canonical Intent + SQL
| What to inspect | Why it matters |
|---|---|
| Intent JSON and compiled SQL | Explains exactly why this SQL was produced |
Config + Table Dependencies
| What to inspect | Why it matters |
|---|---|
| `ce_semantic_*` rows used in this request | Most wrong outputs are metadata issues |
Prompt Suite
| What to inspect | Why it matters |
|---|---|
| Interpret prompt + SQL prompt + retry prompt | Shows what the LLM was allowed to do |
Running example
User question
TEXT
find all requests where notes text says cancelled due to duplicate
Runtime tables used in this trace (sample rows)
ce_semantic_concept (10 sample rows)
ce_semantic_concept
| ID | CONCEPT_CODE | DESCRIPTION |
|---|---|---|
| 1 | DisconnectRequest | Customer disconnect request |
| 2 | DisconnectOrder | Downstream disconnect order |
| 3 | Account | Customer account |
| 4 | SubAccount | Sub-account |
| 5 | Connection | Physical connection |
| 6 | Billing | Billing records |
| 7 | Payment | Payment records |
| 8 | SupportTicket | Support cases |
| 9 | Installation | Install request |
| 10 | ServicePlan | Service plan |
ce_semantic_synonym (10 sample rows)
ce_semantic_synonym
| ID | SYNONYM | MAPS_TO |
|---|---|---|
| 1 | request | DisconnectRequest |
| 2 | requests | DisconnectRequest |
| 3 | notes | notesText |
| 4 | note | notesText |
| 5 | remarks | notesText |
| 6 | comments | notesText |
| 7 | cancelled | CANCELLED_SIGNAL |
| 8 | duplicate | DUPLICATE_SIGNAL |
| 9 | last day | LAST_24_HOURS |
| 10 | today | TODAY |
ce_semantic_query_class (10 sample rows)
ce_semantic_query_class
| ID | QUERY_CLASS | DESCRIPTION |
|---|---|---|
| 1 | LIST | list results |
| 2 | FAILURE_ANALYSIS | failure analytics |
| 3 | COUNT | count |
| 4 | TREND | trend |
| 5 | SUMMARY | summary |
| 6 | DETAIL | detail |
| 7 | JOIN_VIEW | join-heavy lookup |
| 8 | ALERT | alert lookup |
| 9 | STATUS | status lookup |
| 10 | RECENT | recent lookup |
ce_semantic_mapping (10 sample rows)
ce_semantic_mapping
| ID | ENTITY | FIELD | TABLE | COLUMN |
|---|---|---|---|---|
| 1 | DisconnectRequest | requestId | zp_disco_request | zp_request_id |
| 2 | DisconnectRequest | status | zp_disco_request | status |
| 3 | DisconnectRequest | createdAt | zp_disco_request | created_ts |
| 4 | DisconnectRequest | subAccountId | zp_disco_request | sub_account_id |
| 5 | DisconnectRequest | city | zp_disco_request | city |
| 6 | DisconnectTransData | requestId | zp_disco_trans_data | zp_request_id |
| 7 | DisconnectTransData | notesText | zp_disco_trans_data | notes_text |
| 8 | DisconnectTransData | action | zp_disco_trans_data | action |
| 9 | DisconnectTransData | createdAt | zp_disco_trans_data | created_ts |
| 10 | DisconnectOrder | orderId | zp_disconnect_order | order_id |
ce_semantic_query_failures (10 sample rows)
ce_semantic_query_failures
| ID | BAD_SQL_PATTERN | CORRECTION_HINT | ROOT_CAUSE |
|---|---|---|---|
| 1 | r.notes_text | use t.notes_text | WRONG_TABLE_COLUMN |
| 2 | missing trans join | join on zp_request_id | MISSING_JOIN |
| 3 | SELECT without DISTINCT | use DISTINCT for request-level results | DUPLICATE_ROWS |
| 4 | wrong alias for trans table | alias trans table as t consistently | ALIAS_MISMATCH |
| 5 | unsafe write keyword | read-only select only | DENY_OPERATION |
| 6 | wrong timestamp column | use mapped created_ts | WRONG_TIME_COLUMN |
| 7 | missing limit | apply semantic max limit | LIMIT_POLICY |
| 8 | invalid field spelling | use mapped field names | FIELD_MAPPING_ERROR |
| 9 | bad join direction | follow relationship path | JOIN_PATH_ERROR |
| 10 | literal concatenation | use named params | PARAM_BINDING_POLICY |
Step-by-step deep trace (who does what)
Step 1: question enters runtime
Java entry
JAVA
// Java does this:
String question = request.getQuestion();
// LLM not called yet.
// DB not read yet.
Current state
| Java state | LLM state | DB state |
|---|---|---|
| question is present | not called | no read yet |
Step 2: planner chooses tool sequence
Planner outcome
JSON
{
"mcpAction": "CALL_TOOL",
"toolOrder": [
"db.semantic.interpret",
"db.semantic.query",
"postgres.query"
]
}
Who did what
| Component | Did | Concluded |
|---|---|---|
| Java (McpPlanner) | loaded tool metadata | semantic chain should run |
| LLM | not deciding SQL here | none |
| DB | read planner/tool config | available tools + order |
Step 3: Java loads semantic metadata for interpret stage
Java metadata load
JAVA
// Java does this:
var concepts = conceptRepo.findAllEnabled();
var synonyms = synonymRepo.findAllEnabled();
var queryClasses = queryClassRepo.findAllEnabled();
var mappings = mappingRepo.findAllEnabled();
// LLM still not called.
// DB returns rows from ce_semantic_* tables.
Tables read now
| Table | Used for |
|---|---|
| ce_semantic_concept | allowed concept vocabulary |
| ce_semantic_synonym | language normalization hints |
| ce_semantic_query_class | query shape options |
| ce_semantic_mapping | allowed semantic fields |
Step 4: Java builds interpret prompt
Interpret prompt package
TEXT
System prompt contains:
- Allowed entities from ce_semantic_concept
- Allowed query classes from ce_semantic_query_class
- Field vocabulary from ce_semantic_mapping
- Synonym hints from ce_semantic_synonym
User prompt contains:
find all requests where notes text says cancelled due to duplicate
Who did what
| Component | Did | Concluded |
|---|---|---|
| Java | assembled strict prompt + output schema | interpret call is ready |
| LLM | not run yet | none |
| DB | already provided metadata rows | constraints available |
Step 5: LLM interpret call happens
Interpret call
JAVA
// Java does this call.
var llmOutput = llmClient.generateJsonStrict(systemPrompt, userPrompt, interpretSchema);
// LLM does this inside the call:
// - chooses entity
// - chooses queryClass
// - chooses filters
// Java does NOT manually parse words into business meaning here.
LLM interpret output
JSON
{
"entity": "DisconnectRequest",
"queryClass": "LIST",
"filters": [
{
"field": "notesText",
"op": "CONTAINS",
"value": "cancelled due to duplicate"
}
],
"confidence": 0.91,
"needsClarification": false
}
Step 6: Java validates interpret output
Validation
JAVA
// Java validates that LLM output is legal against DB metadata.
boolean entityOk = conceptRepo.existsByConceptCode(intent.getEntity());
boolean classOk = queryClassRepo.existsByQueryClass(intent.getQueryClass());
boolean fieldsOk = intent.getFilters().stream()
.allMatch(f -> mappingRepo.existsByFieldCode(f.getField()));
// If invalid -> reject / clarify.
Current state
| Java state | LLM state | DB state |
|---|---|---|
| canonicalIntent accepted | completed interpret step | validated entity/class/field existence |
Step 7: CanonicalIntent becomes official runtime contract
CanonicalIntent
JSON
{
"entity": "DisconnectRequest",
"queryClass": "LIST",
"filters": [
{
"field": "notesText",
"op": "CONTAINS",
"value": "cancelled due to duplicate"
}
],
"sort": [
{
"field": "createdAt",
"dir": "DESC"
}
],
"limit": 100
}
Important
At this point, runtime has semantic meaning only. SQL is still not generated.
Step 8: Java loads query-stage metadata
Query-stage metadata load
JAVA
// Java does this before SQL generation:
var mappings = mappingRepo.findAllEnabled();
var failures = failureRepo.findRecent();
// DB does this:
// - returns field -> table.column mappings
// - returns prior SQL failure/correction memory
Tables read now
| Table | Contribution |
|---|---|
| ce_semantic_mapping | physical column resolution + join feasibility |
| ce_semantic_query_failures | retry guidance from past mistakes |
| ce_semantic_relationship | join-path preference when relevant |
Step 9: Java builds SQL prompt from CanonicalIntent + metadata
SQL prompt package
TEXT
Includes:
- CanonicalIntent JSON (entity, queryClass, filters)
- mapping rows (field -> table.column)
- failure hints (what previously failed)
- read-only requirement
- strict SQL JSON output schema
Who decided what so far
| Decision | Decider |
|---|---|
| Meaning of user text | LLM in interpret stage |
| Allowed fields/tables | DB metadata + Java validation |
| SQL not yet final | pending query-stage LLM call |
Step 10: LLM query call generates SQL candidate
SQL generation call
JAVA
// Java does this call:
var sqlOutput = llmClient.generateJsonStrict(sqlSystemPrompt, sqlUserPrompt, sqlSchema);
// LLM does this inside:
// - drafts SQL from CanonicalIntent
// - chooses joins from mapping context
// - returns sql + params + flags
SQL candidate
SQL
select
DISTINCT r.zp_request_id, r.status,
r.created_ts
from zp_disco_request r
JOIN zp_disco_trans_data t
on r.zp_request_id = t.zp_request_id
where t.notes_text ILIKE :notesText
order by r.created_ts DESC
limit :__limit; -- params -- :notesText = "%cancelled due to duplicate%" -- :__limit = 100
Step 11: Java validates SQL candidate
Validation + guardrail
JAVA
// Java safety checks before execution:
sqlGuardrail.assertReadOnly(sqlOutput.getSql());
sqlValidator.validateAgainstMappings(sqlOutput.getSql(), mappings);
// If validation fails:
// Java triggers retry with corrective feedback.
Validation responsibility
| Check | Owner |
|---|---|
| Read-only enforcement | Java guardrail |
| Column/table legality vs mapping | Java validator + ce_semantic_mapping |
| Retry decision | Java orchestration |
Step 12: Postgres executes named-parameter SQL
Execution
JAVA
// Java executes SQL against Postgres using named params.
var rows = namedParameterJdbcTemplate.queryForList(sqlOutput.getSql(), sqlOutput.getParams());
Current state
| Java state | LLM state | DB state |
|---|---|---|
| execution complete | not active in this step | business rows returned |
Step 13: final answer synthesis
Why this SQL was generated (exact reason chain)
Reason chain
| Signal | Decided by | SQL effect |
|---|---|---|
| `entity = DisconnectRequest` | Interpret-stage LLM | `FROM zp_disco_request r` |
| `field = notesText` | Interpret-stage LLM | need mapped notes column |
| `notesText -> zp_disco_trans_data.notes_text` | ce_semantic_mapping | `WHERE t.notes_text ...` |
| shared request key mapping | ce_semantic_mapping | `JOIN ... ON r.zp_request_id = t.zp_request_id` |
| duplicate risk memory | ce_semantic_query_failures | `SELECT DISTINCT` |
| read-only policy | Java guardrail + deny rules | no update/delete SQL allowed |
If output is wrong: who decided it
Root-cause map
| Observed issue | Primary decider | First place to debug |
|---|---|---|
| Wrong entity | Interpret-stage LLM | interpret prompt + `ce_semantic_concept`/`ce_semantic_synonym` |
| Wrong field in intent | Interpret-stage LLM | intent JSON + `ce_semantic_mapping` coverage |
| Missing join | Query-stage LLM | SQL prompt + mapping availability |
| Invalid column blocked | Java validator (correctly) | mapping completeness |
| Unexpected empty rows | Business data/filter | executed SQL + runtime params |
| Clarification asked unexpectedly | Interpret stage confidence logic | ambiguity options + intent confidence |
Prompt suite (concise)
Interpret prompt shape
TEXT
Goal: return CanonicalIntent JSON only.
Inputs: question + allowed entities/classes/fields/synonyms.
Constraint: choose only from allowed semantic model terms.
Output: entity, queryClass, filters, confidence, needsClarification.
SQL prompt shape
TEXT
Goal: compile read-only SQL from CanonicalIntent.
Inputs: canonicalIntent + mapping + relationship context + failure hints.
Constraint: named params only, read-only only, mapped columns only.
Output: sql, params, needsClarification, unsupported.
Retry prompt shape
TEXT
Goal: fix previous invalid SQL.
Inputs: previous SQL + exact validation error + allowed mapping.
Constraint: preserve intent meaning while fixing legality.
Output: corrected sql + params.
Drill set (clean, unique, no repetition)
Drill 1: clarification path
Drill 2: count path
Drill 3: unsupported write intent
Drill 4: retry recovery
Final checklist
Read this page and you should now be able to
| Capability | Outcome |
|---|---|
| Trace one question end-to-end | Know exactly where meaning and SQL were decided |
| Debug wrong output quickly | Jump to the right stage/table without guesswork |
| Explain ownership clearly | Java orchestrates, LLM interprets/compiles, DB constrains |
| Justify final SQL | Tie each SQL clause to intent + mapping + guardrails |
Lock this model
Java orchestrates. LLM interprets and compiles. DB semantic tables constrain and validate. Postgres executes. The answer comes from rows, not guesswork.