#Progressive Domain Architecture
An architectural hypothesis for AI-mediated systems in high-stakes domains.
##What It Is
Progressive Domain Architecture is an architectural hypothesis for AI-mediated systems that scale domain expertise to people who are not domain experts — in domains where knowledge is partially formalized, coverage is uneven, and the cost of a wrong answer justifies architectural discipline.
This is not an established standard or a proven production framework. It is a hypothesis being tested through a working prototype.
The hypothesis comes from a recurring pattern: you start with RAG + tools. You add guardrails. Then you realize you need an ontology. Then scenario contracts. Then a scoring model for clarification questions. Then partial outcomes. Then an audit trail. Each step is another structural piece that a general-purpose AI layer cannot provide on its own.
PDA is an attempt to name that recurring architecture upfront — rather than discover it one piece at a time.
##The Problem
Hallucination is a real problem. But in high-stakes domains there is a deeper structural risk:
AI can talk wider than the system knows.
A general-purpose model deployed in a domain-specific system can blend verified system knowledge with its own training data. It has no built-in mechanism to distinguish between the two. A domain expert might catch the difference. But the target users of these systems — client-facing staff, hiring managers, operations teams — are not domain experts. They have no reliable way to distinguish a grounded answer from a plausible-sounding confabulation. The architecture must catch it instead.
This risk is especially costly in domains like:
- cross-border workforce compliance
- KYC and customer onboarding
- investment suitability and wealth management
- tax and payroll compliance
- underwriting and eligibility
- trade, logistics, and sanctions screening
These domains share the same tension:
- rules are formalizable but distributed across jurisdictions, dates, and categories
- experts exist but do not scale
- non-experts need access to bounded answers
- the cost of a wrong answer is not trivial
- complete coverage is unrealistic at first
##What PDA Adds on Top of Modern Tooling
Modern AI tooling has matured. Structured outputs make tool calls reliable. State machines like LangGraph give explicit control over agent flow. Policy-as-code and runtime validation catch many classes of unsafe behavior. Constrained decoding eliminates malformed outputs entirely.
PDA assumes all of this is in place. It is not an alternative to modern orchestration — it is what comes next when the technical layer is solid but the semantic layer still produces structurally coherent answers to the wrong questions.
Even with the best current tooling, several limitations remain at the semantic and domain level. PDA addresses them not by validating tool calls after the fact, but by restructuring which decisions the model is allowed to make in the first place:
- Tool selection and parameters. Even when tools are scoped per workflow node, the model still decides which allowed tool to call and with which parameters. A wrong choice can lead to a technically valid call that is semantically about the wrong question. The call is visible in a trace, but detecting that the parameters were wrong requires domain knowledge that output-level guardrails typically do not have.
- Premature scenario commitment. "We need to move Anna to Germany" could mean a permanent relocation, a temporary assignment, or a remote work arrangement. These are different scenarios with different legal, tax, and immigration consequences. Without structured ambiguity management, the system picks one silently. Everything downstream is internally coherent but potentially about the wrong problem.
- Clarification question prioritization. In systems where users are not domain experts, they cannot know which clarifications matter — and neither can an unprompted model. "What is Anna's citizenship?" may unlock an immigration module — or change nothing. A domain expert knows which question to ask next; an effective system must encode that knowledge as a dependency graph with impact scoring, not rely on conversational instinct.
- Partial outcomes. The system knows tax rules for Germany (verified) but immigration rules are unknown. Producing a partial outcome — "here is what we know, here is what is missing, here is where you need an expert" — requires structural knowledge of which domain modules apply, which have sufficient coverage, and which do not. This is the kind of domain structure that output-level validation is not designed to carry.
- Two kinds of refusal. "We know enough to say no" (e.g., a hire prohibited under current sanctions rules) and "we do not know enough to answer safely" (e.g., immigration rules for this corridor are missing) are fundamentally different outcomes. They require different explanations, escalation paths, and audit posture. Most current guardrail approaches collapse both into a generic refusal.
- Deterministic reproducibility. Once the governed conversation has resolved a scenario and collected the required facts at sufficient fidelity, the deterministic engine produces the same outcome every time — in replays, audits, and regression tests. The conversational path that led to the scenario and the natural-language explanations are not guaranteed to be identical (they depend on the model), but explanations are verified against the deterministic computation results. For regulated domains, this separation makes reproducibility of core execution a verifiable property rather than a probabilistic hope.
- Business-readable audit trail. Not "which prompt was sent to the model," but: which scenario ran, which modules executed, which were blocked, why, on the basis of which facts at which fidelity.
Any production-grade system in this class of domains will likely encounter each of these limitations. The question is whether they emerge ad hoc — or start as architecture.
##The Central Claim
The model must not say more than the system knows — and must not do more than it is trusted to do.
This is not a guarantee about model behavior. It is an architectural constraint PDA is designed to enforce.
The same fact may be low-risk in one scenario and high-risk in another: citizenship is background context for a cost comparison but a critical routing fact for an immigration assessment. Trust boundaries must be enforced per scenario and per module, not as global rules.
This constraint unfolds into five properties:
- Determinism guarantees reproducibility of execution, not truth of inputs.
- Runtime constraints define the minimum basis on which execution is allowed.
- Transparent traces make routing and reasoning mismatches visible.
- Authority boundaries prevent analysis from silently becoming system reality.
- Context boundaries prevent AI from supplementing system knowledge with its own.
PDA is not primarily about correctness. It is about preserving epistemic integrity under AI mediation: for every materially relevant claim the system produces, the source, basis, and reliability should be known and traceable — not delegated to user verification.
##Five Commitments
1. Progressive Knowledge
Coverage is not binary. A system may know some things well, other things only partially, and some things not at all.
PDA tracks knowledge quality as a matrix:
- by jurisdiction
- by knowledge element
- by applicability scope
- by effective date
- by fidelity tier (unknown, estimated, curated, verified)
Instead of "we support Germany," the system says which knowledge elements exist, for which scope, for which date range, at which quality.
The same progressive model applies to scenarios themselves. A scenario may be recognized, cataloged, partially implemented, or fully executable. The system must represent this gap honestly rather than hiding it behind binary "supported / not supported."
Facts have quality too. A deterministic engine can reproduce a result exactly and still be wrong if the case facts are weak. PDA tracks fidelity for both knowledge and facts.
Progressive coverage does not need to start with full answerability. A system that can reliably identify blocking conditions — sanctioned jurisdictions, product prohibitions, policy violations — already delivers value before it can fully evaluate permissible alternatives.
Progressive coverage and deterministic execution are not opposing forces. Case facts do not change the inherent quality of knowledge — they select which knowledge applies. A relocation case resolves a corridor, a worker path class, and a timing window. These selections compose the applicable knowledge view from base layers and overlays. The engine then checks whether that view meets the module's minimum fidelity, freshness, and scope requirements, executes what is met, and blocks the rest.
2. Deterministic Execution
Only the engine decides whether a scenario can run. The model may propose a scenario. It does not decide executability.
The engine checks more than "do we have a value?" It checks whether the available knowledge and facts meet declared minimum requirements for fidelity, freshness, applicability, and scope. A verified rule from 2024 is not sufficient for a scenario effective in 2026.
This produces three honest outcomes:
- FULL — all mandatory checks passed on sufficient basis
- PARTIAL — some checks passed, some were blocked (visible why)
- REFUSED — cannot proceed safely
REFUSED splits into two cases:
- Deterministically blocked: the system has sufficient basis to conclude the path should not proceed
- Authority-limited: the system lacks sufficient basis to answer safely
These require different explanations, escalation paths, and audit meaning.
3. Governed Conversation
Users bring a situation, not a scenario. They do not know the domain map. The system must manage ambiguity rather than silently resolving it.
PDA organizes pre-case semantics into three layers:
- Mentions: what the user actually said, without forcing interpretation
- Hypotheses: plausible interpretations — explicitly non-authoritative
- Resolved domain objects: confirmed entities the workflow can bind to
The system does not commit to a scenario until it has sufficient basis. This is not intent classification — it is a structured process of narrowing ambiguity across multiple plausible interpretations until a specific scenario is justified.
The conversation moves through explicit phases — discovery, clarification, fact collection, execution, and explanation. Each phase strictly limits what the model can see, what it can do, and what must remain a non-authoritative hypothesis.
4. Authority Boundary
The model never publishes durable truth directly. When it proposes a fact or decision that the current scenario classifies as high-risk, the system presents it as a structured approval card — outside the LLM, with explicit confirm and reject controls. Only after user confirmation does the fact become durable truth.
Not every fact requires this gate. If a fact is deterministically derived from previously approved facts, or if the scenario's fidelity requirements accept a lower tier (e.g., estimated is sufficient for this module), the system can publish it automatically. The gating level is determined by the scenario contract and the risk tier of the specific fact in the current context — not applied uniformly to everything.
A compliance-sensitive determination may additionally require qualified expert approval. Authority basis is distinct from confidence: a fluent answer is not publication authority.
5. Bounded Agent Context
The primary defense against AI knowledge bleed is not instruction, and it is not a larger context window. More context means more information available — but not better focus. A model with a million tokens of domain knowledge in context can still pick up the wrong fragment, miss the critical one, or blend relevant and irrelevant material without signaling which is which.
The defense is context boundary: the model receives a compiled, focused context slice for each step, containing only what the system has determined is relevant for the current phase and scenario. Less noise, more signal, fewer opportunities for the model to supplement from the wrong source.
This is consistent with progressive disclosure patterns recommended by major model providers and supported by research on attention degradation in long contexts.
Explanation is subject to verification: user-facing output must reference system-owned artifacts, not model pretraining memory. If narration introduces unsupported claims, the system catches it or falls back to the deterministic explanation core.
##Where It Applies
PDA is relevant wherever domain expertise exists but does not scale to the people who need it. Three common situations:
- Client-facing staff. A relationship manager needs to answer a client's cross-border tax question but is not a tax expert. The expertise exists in the organization — it just cannot be in every conversation.
- Self-service. An HR manager wants to check relocation feasibility without filing a request to the global mobility team. The knowledge exists — it is just locked behind a specialist queue.
- Auxiliary domain. A hiring manager needs to understand immigration implications of a job offer, but global mobility is not their primary discipline. The domain is important to get right but peripheral to their daily work.
The pattern applies most naturally in domains where:
- rules are formalizable but distributed across jurisdictions, dates, and categories
- expert knowledge exists but does not scale to operational demand
- non-experts need access to bounded, justified answers
- partial coverage is more valuable than silence
- the cost of a wrong answer justifies architectural discipline
PDA is a weaker fit where outcomes depend on open-ended judgment, where one jurisdiction dominates and rules are stable, or where human discretion is the product itself.
The architecture is domain-agnostic. What changes between domains is the ontology, the scenario contracts, the knowledge schemas, and the projection rules. What stays the same is the authority model, the execution semantics, the conversation governance, and the fidelity framework.
##Not All-or-Nothing
Not every system needs the full rigor of PDA. But even lighter applications benefit from the same underlying ideas — canonical domain vocabulary, explicit authority boundaries, deterministic readiness checks, case truth separated from transcript memory.
The commitments described above can be adopted incrementally. A system that starts with a domain ontology and structured tool boundaries is already stronger than raw RAG + tools — even without full scenario contracts or immutable execution traces. PDA describes the strict end; the direction is useful at any point along the way.
This gradual path matters even more now, because stronger models change not only what systems can answer, but also what teams can realistically model in the first place.
Better models do not remove the need for a semantic layer — they change its economics. A model can extract candidate entities, distinctions, and scenario structures from policies, procedures, and historical workflows much faster than manual modeling alone. The value comes when those outputs become explicit, reviewable artifacts rather than implicit runtime interpretations. In that sense, better models do not replace ontology work — they make it more feasible to do well.
##Honest Limits
PDA does not eliminate all problems. It gives them explicit architectural shape.
- Domain modeling risk. The system is only as good as its domain model. If scenarios, distinctions, or knowledge boundaries are modeled incorrectly, the system will confidently navigate a poorly mapped world. This is the deepest risk and can only be addressed by expert domain modeling.
- Complexity risk. PDA introduces significant conceptual complexity. The architecture may be justified, but the user experience atop it must demonstrate that complexity can be hidden behind natural interfaces — without collapsing the governance properties that justify it.
- Human limits. PDA can govern uncertainty and guide clarification. It cannot guarantee correct outcomes when the user cannot accurately describe the situation. The system depends on the quality of collaborative fact gathering, not just on the quality of its own reasoning.
- Maintenance economics. Scenario contracts and the knowledge they reference must be kept current as regulations change, new jurisdictions are added, and effective dates expire. This is real ongoing cost. PDA is worth building only when the expected cost of wrong answers — fines, litigation, reputational damage, operational losses — exceeds the cost of maintaining the architecture. In low-stakes domains, simpler stacks win on economics. In high-stakes regulated domains where errors are expensive and recurring, PDA shifts cost from unpredictable downside to predictable upfront investment. And when maintenance does fall behind, PDA at least makes the staleness explicit through fidelity and freshness metadata, instead of hiding it behind confident answers.
Last updated April 2026 · v0.1 · [email protected]