# Decision-Grade AI — Full Framework
> A framework for executives, technology leaders, and strategy functions working with AI in 2026. Built around verification: what to demand from AI vendors, what to build inside your organization, and what to watch over the next eighteen months.
This file is the full text of all six pages of the decision-grade.ai framework, assembled into a single document for AI-assisted reading. The canonical reading experience is at https://decision-grade.ai. Source is at https://github.com/DavidVALIS/decision-grade.
Published by VALIS Systems. Content licensed under Creative Commons Attribution 4.0 International (CC BY 4.0). Reference: https://valissystems.com.
The framework starts from a single observation: AI production cost has fallen by a factor of one hundred to one thousand against human equivalents, while the cost of verifying that the output reasons correctly has not moved. The gap between those two cost curves is the operational risk that most executive AI guidance does not yet address.
The framework is published openly because the Zero Trust posture it advocates extends to the doctrine itself. You should not have to trust the publisher. You can verify the framework, contest it, fork it, or implement it elsewhere.
---
# Introduction
**What this site is:** A framework for executives, technology leaders, and strategy functions navigating AI in 2026.
Is the guidance you receive about AI still scoped to facts and hallucinations? Will the model make things up? Can we catch it when it does?
These are reasonable questions. They are not where the operational risk lives.
The risk that matters sits one layer down: AI generates polished-looking analytical output at a production cost that has fallen by a factor of 100 to 1,000 relative to human equivalents. The output reads well, fact-checks cleanly, and contains no obvious hallucinations. It still reasons badly. It still makes load-bearing causal claims with no mechanism. It still omits the boundary conditions that would let a careful reader weigh it.
Your existing controls do not catch this. They were not designed to. Style guides formalize prose. Performance reviews reward speed and confidence. Disclosure frameworks check what tool was used, not whether the reasoning is sound. The verification deficit was already inside your organization before any model was deployed. AI did not create it. AI revealed it.
The 2026 executive question is not how to defend against hallucination. It is how to operate when polished output and sound reasoning have decoupled, and when the cost of being wrong about that decoupling has begun to compound.
## Who this is for
Three audiences. The framework serves all three because the underlying problem is universal.
This framework helps you tell decision-grade output from polished output that looks the same.
This framework maps the Zero Trust posture you already understand from security onto AI verification, and gives you a buyer's checklist for vendor selection.
This framework names what you have probably been feeling for months without language for.
## How to read this
This site is a reference, not a primer or an essay. Read it in order if you want the full architecture. Jump to any page if you have a specific question.
What the problem actually is. Why current controls miss it.
Zero Trust as the meta-principle. The three layers: Independence, Doctrine, Accountability.
Seven procurement questions to put to AI verification vendors. Red flags. Scoring grid.
Decision-grade vs. volume-grade. Classification, routing, failure modes.
Dated signals that will tell you whether the framework holds. Updated as signals resolve.
**The doctrine is verifiable.** Every page is published in Markdown source at the linked GitHub repository. You can ask any AI to read the entire framework. An `llms.txt` index is published at the site root for that purpose. The site is verifiable. The doctrine is forkable. The framework is contestable. That is part of the posture, not a side feature.
## What the framework is not
This is not a list of AI tools. It is not a survey of vendor capabilities. It is not a "future of work" thesis or a "ten ways AI will transform your business" primer. There are dozens of those. They are scoped to the questions the AI conversation was asking in 2025. The conversation has moved.
Three inoculations against the most common misreadings.
No vendor survey. No "top 10 platforms." If you came here for tool selection, this is the wrong site.
No predictions about which jobs disappear. The framework is scoped to verification, not labor substitution.
The questions executives asked in 2024 and 2025 produced reasonable 2024-2025 answers. The questions have moved. So has this framework.
This framework is scoped to the question executives will face in 2026 and 2027: when the cost of producing polished analytical output has collapsed, what does it mean to verify the reasoning underneath, and what should you demand from the systems and vendors you depend on? The framework is scoped to one question: when the cost of producing polished analytical output has collapsed, what does it mean to verify the reasoning underneath, and what should you demand from the systems and vendors you depend on.
## Where to start
Start with **The Frame**. Each page builds on the last. The full architecture in roughly thirty minutes.
Skip to **The Doctrine** if you want the conceptual spine before the diagnosis.
Skip to **The Buyer's Checklist** if you have an AI vendor evaluation this quarter.
---
# The Frame
The verification deficit is the operational problem underneath AI-augmented analysis. The executive AI conversation has been organized around facts and hallucinations. The deficit is about reasoning. This page walks from the misframing most organizations are operating under to the diagnosis the rest of the framework is built on.
**Three minute read.** Most executive AI guidance is scoped to the wrong problem. Hallucinations are detectable. The real risk has moved one layer down: AI output that reads cleanly, fact-checks, and still reasons badly. Different problem, different fix. The historical parallel is pre-2008 credit ratings.
## Most executive AI guidance is solving last year's problem
The questions executives are asking about AI in 2026 are the questions that mattered in 2024 and 2025. They were the right questions then. They are the wrong questions now.
- Hallucinations (detectable, mostly fixed)
- Citation grounding (improved across major models)
- Disclosure frameworks (NIST AI RMF, EU AI Act, ISO 42001)
- AI use policies inside organizations
- Designated AI risk functions
- Reasoning that holds up under expert challenge
- Unsupported causal claims dressed as analysis
- Missing boundary conditions on confident predictions
- Phantom precision in numbers presented as data
- The fact that polished output and sound reasoning are now decoupled
The misframing is not anyone's fault. The remediation followed the visible failure mode. The visible failure mode changed. The remediation did not.
**The kind of sentence that passes every 2025 control and still fails:**
*"Mid-market companies that deployed generative AI tools in 2025 saw 18% productivity gains in their go-to-market functions, with the largest effects concentrated among sales development reps and account executives."*
Every fact in the sentence checks out. AI deployment is happening. Mid-market is a real segment. Productivity gains have been reported. SDRs and AEs use these tools. A fact-checker passes it.
A reasoning-checker does not.
- **"18% productivity gains"** is phantom precision. Productivity measured how? Calls dialed, pipeline created, revenue closed? Each one is a different 18%.
- **"Mid-market companies that deployed"** is self-selection. The companies that deployed gen AI were also the companies investing in better tooling, better hires, and better playbooks. The deployment took credit for the whole improvement.
- **"Concentrated among SDRs and AEs"** is an observational claim with no comparison group, no baseline, no methodology, and no source.
The sentence didn't lie. It left out everything that would let you weigh it. That is the failure no fact-checker catches.
## The verification deficit, in one comparison
Production cost has fallen by a factor of one hundred to one thousand against human equivalents. Verification cost has not moved. The gap is the operational risk.
| Cost layer | 2020 | 2026 | Change |
|---|---|---|---|
| **Producing 1,000 words of polished analytical prose** | $5 to $15 (mid-career analyst time) | $0.001 to $0.02 (mainstream model API) | ~250x to 15,000x cheaper |
| **Verifying that the 1,000 words reasons correctly** | Hours of expert review, plus access to source data | Hours of expert review, plus access to source data | Unchanged |
**The asymmetry, plainly.**
Drafting queues are empty. Review queues are full. Throughput metrics still point at the wrong door.
The scarce resource is no longer the analyst who writes. It is the reviewer who catches what the writing assumes.
The production layer has been industrialized. The verification layer has not. The result inside any organization that has adopted AI-assisted drafting:
Often 10x to 40x the volume of two years ago.
The slow checks were the first thing cut under throughput pressure.
The probability that any specific deck contains an unverified load-bearing claim has not fallen with the cost. It has risen with the volume.
The verification deficit was always there. AI did not create it. AI revealed it. Before AI, the speed of human production limited the rate at which unverified claims could circulate. That speed limit was not a filter. It was a throttle. Any individual claim was not made more rigorous by the throttle. There were simply fewer claims in flight at any given moment. AI removed the throttle. The deficit became visible.
**The verification deficit was always there.**
In 2015, the [Open Science Collaboration](https://doi.org/10.1126/science.aac4716) tried to replicate 100 peer-reviewed psychology findings.
~36% replicated.
Before AI. Before models could draft. Before any of the production-cost collapse this framework describes.
AI did not create the deficit. AI made it harder to ignore.
## Why your existing controls do not catch it
The diagnosis is not that one thing is broken. It is that four things compound.
```mermaid
flowchart TD
L1["Layer 1
Drafting costs collapse
$0.001 to $0.02 per 1,000 words"]
L2["Layer 2
Review gates check style and coherence,
not structural validity"]
L3["Layer 3
Analysts rewarded for speed to briefing,
not forecast accuracy"]
L4["Layer 4
Clients never see the claim-source map
behind the polished deliverable"]
L1 --> L2 --> L3 --> L4
L4 -.->|Feedback: opacity hides damage,
incentives intensify| L1
style L1 fill:#1E293B,color:#fff
style L2 fill:#1E293B,color:#fff
style L3 fill:#1E293B,color:#fff
style L4 fill:#7F1D1D,color:#fff
```
Each layer degrades quality on its own. Together they form a self-reinforcing loop: cheap drafting floods the queue, reviewers lack criteria to catch unanchored claims, incentives punish the slowdown that rigor demands, and opacity hides the damage from buyers. Fix one layer and the others compensate. A monocausal diagnosis produces a single-point fix. A stacked system laughs at single-point fixes.
Review systems formalize prose, presentation, and process. They do not formalize structural-reasoning checks. The asymmetry is rational, not accidental, and it explains why disclosure and review regimes do not close the gap.
| What review systems formalize | What they do not formalize |
|---|---|
| Clarity of prose | Whether causal claims are supported |
| Citation format and registration | Whether assumptions are stated |
| Data sharing and availability | Whether boundary conditions are named |
| Plagiarism detection | Whether conclusions are testable |
| Statistical method correctness | Whether the mechanism is specified |
The reason is mechanical:
**The four-second rule.**
A reviewer can check whether a sentence reads well in roughly four seconds.
A reviewer can check whether the causal claim it makes is supported in roughly four hours, plus domain expertise, plus access to source data.
When volume rises and staffing stays flat, the slow checks are the first to go.
The incentive system inside organizations reinforces the asymmetry. The failure compounds through a feedback loop:
```mermaid
flowchart LR
A[Performance reviews
reward confidence] --> B[Analysts avoid
falsifiable claims]
B --> C[Reviewers never see
falsifiable claims to check]
C --> D[Verification capacity
atrophies]
D --> A
style A fill:#1E293B,color:#fff
style B fill:#1E293B,color:#fff
style C fill:#1E293B,color:#fff
style D fill:#1E293B,color:#fff
```
When analysts are publicly named and legally exposed, the rules themselves push toward hedged language. [FINRA Rule 2241](https://www.finra.org/rules-guidance/rulebooks/finra-rules/2241), which governs U.S. equity research analysts, is a representative example. Hedged language survives legal review, keeps the client comfortable, and cannot be demonstrated to be wrong. The system selects for prose that sounds authoritative while committing to nothing testable.
**Disclosure as theater.**
Disclosure frameworks ([NIST AI RMF](https://www.nist.gov/itl/ai-risk-management-framework), [EU AI Act](https://eur-lex.europa.eu/eli/reg/2024/1689/oj), [ISO/IEC 42001](https://www.iso.org/standard/81230.html)) address a different problem.
They check what tool was used.
They do not check whether the reasoning is sound.
The completed disclosure form is what makes everyone feel comfortable. Without anyone checking the underlying work.
Even the most disciplined formalized verification systems are partial. The U.S. intelligence community's Structured Analytic Techniques, codified in the [CIA Tradecraft Primer](https://www.cia.gov/resources/csi/static/Tradecraft-Primer-apr09.pdf) and [ICD 203](https://www.dni.gov/files/documents/ICD/ICD-203.pdf), force analysts to surface assumptions through explicit protocols. Recent scholarship questions whether these techniques reliably eliminate reasoning errors in field conditions. If the most rigorous formalized verification system in the world is partial, a checkbox disclosure regime cannot close the structural gap.
## The pattern has played out before
Cheap production. Unchanged review processes. Proxy-based trust. The architecture is not new. It produced the largest single financial collapse of the post-war era.
| | 2000-2008 credit ratings | 2024-2026 AI-augmented analysis |
|---|---|---|
| **Cheap production** | Quantitative models scoring securities faster than any human team | LLMs drafting 1,000 words for less than one cent |
| **Unchanged review** | Methodology documents and a century-old brand | Style guides, fact-checkers, disclosure labels |
| **Trust proxy** | AAA stamp | Polished prose |
| **Volume** | ~30 mortgage securities rated triple-A every working day in 2006 | Unbounded |
| **Visible until failure** | No | No |
| **Cost when it failed** | Trillions | TBD |
(Financial Crisis Inquiry Commission, [*The Financial Crisis Inquiry Report*](https://www.govinfo.gov/content/pkg/GPO-FCIC/pdf/GPO-FCIC.pdf), 2011, Chapter 7.)
**The Moody's numbers, plainly:**
- **2006:** ~30 triple-A mortgage ratings issued. Every working day.
- **2000-2007:** ~45,000 mortgage-related securities rated triple-A in total.
- Many defaulted within months of issuance.
Same architecture, different decade.
**Another pattern, same shape: 1898 Hearst and the USS Maine.**
In February 1898, editors at Hearst's *New York Journal* published accounts of the USS Maine explosion that passed every editorial standard of the day. Attribution. Sourcing. Narrative coherence.
The standard itself was the problem. Reviewers caught typos and unclear antecedents. They did not catch unanchored causal claims, absent boundary conditions, or unfalsifiable predictions about Spanish responsibility.
The gate was perfectly maintained. It was facing the wrong direction. Public consequence: war.
Same pattern as the analytical content market now. Cheap production (penny papers then, AI drafts now). Unchanged review criteria. Standards that look rigorous and check the wrong things.
The mechanism that produced the failure is the same mechanism operating in analytical content now:
```mermaid
flowchart TD
A[Direct verification is expensive] --> B[Market adopts a proxy]
B --> C[Proxy correlates with quality
at low production volume]
C --> D[Production volume surges]
D --> E[Correlation breaks]
E --> F[Market does not notice
until failure event]
style B fill:#1E293B,color:#fff
style F fill:#7F1D1D,color:#fff
```
The fix in adjacent domains has been consistent. When failure becomes visible enough, buyers demand proof artifacts.
**How the correction arrives, in adjacent domains:**
- **2008 banking** → U.S. regulators required institutions to validate every material model assumption ([SR 11-7](https://www.federalreserve.gov/boarddocs/srletters/2011/sr1107.htm), 2011; updated by SR 26-2 in 2026).
- **Cybersecurity procurement** → vendors moved from self-reported compliance to penetration-test evidence and SOC 2 attestations. The 2026 SOC 2 criteria emphasize continuous risk assessment and earlier security artifacts in procurement.
- **ESG reporting** → in the middle of the same shift right now.
The pattern: when the cost of being wrong exceeds the cost of demanding proof, buyers force the correction.
The analogy has limits. Credit ratings carried regulatory force and directly triggered capital requirements. A consulting deck does not trigger margin calls. The structural mechanism, however, is identical: proxy-based trust substitutes for verification, and the substitution is invisible until it fails.
The framework that follows on the rest of this site is built on that pattern. The next pages walk through what proof artifacts look like for analytical content, what posture executives should adopt toward AI verification, and what to demand from the systems and vendors they depend on.
## Where this goes next
Zero Trust as the meta-principle. The three layers (Independence, Doctrine, Accountability) and what they rule out.
Seven procurement questions to put to any AI verification vendor. Red flags, scoring, the buyer's lever.
Decision-grade vs. volume-grade output. How to classify at point of production. How to prevent slippage.
---
# The Doctrine
The Frame names the problem. The Doctrine names the posture organizations should adopt in response. The posture is Zero Trust, applied to AI verification.
**In short:** Zero Trust in security means never trust by default, always verify. Applied to AI verification, it means the customer should not have to trust the verifier. Every claim a verification system makes about its own behavior should be independently checkable. The doctrine has three layers: Independence (no AI verifies its own work), Doctrine (rules enforced architecturally), Accountability (decisions survive challenge).
## The security parallel that maps directly onto AI
Zero Trust is a familiar concept in security architecture. It was articulated over the past decade as a response to a specific failure mode: perimeter-based trust models assume the inside is safe, and they fail catastrophically when the inside is breached. Security stopped relying on the perimeter and started requiring verification on every transaction.
AI verification is at the same inflection. The same shift is required.
| Domain | Default trust model | Failure mode | Fix |
| --- | --- | --- | --- |
| **Network security (pre-2015)** | Perimeter trust ("inside is safe") | Breach inside the perimeter = total loss | Zero Trust: verify every transaction |
| **AI verification (now)** | Trust the verifier ("their brand is sound") | Verifier fails = silent corruption of decisions | Zero Trust: verify the verifier's math |
**The customer should not have to trust the verifier.**
1. Every claim the verifier makes about its own behavior should be independently verifiable, by the customer, by a third party, or by a regulator.
2. The reputation of the founder, the team, the company, the doctrine, and the methodology are not inside the trust model.
3. The trust model is the math, the cryptographic anchors, the public commitments, and the records that the verifier cannot quietly alter.
Once that statement is articulated, every architectural choice that follows stops being a feature decision and starts being a consequence. The doctrine has three layers, each applying Zero Trust to a different part of the verification stack.
Zero Trust applied to the **verification layer**. No single AI family verifies its own work.
Zero Trust applied to the **analytical layer**. Rules are enforced by architecture, not by operator preference.
Zero Trust applied to the **audit layer**. Every decision survives independent challenge.
## 1. Independence: no single AI verifies its own work
The first layer is about who does the verifying. The Zero Trust commitment: never the same family that produced the output.
```mermaid
flowchart LR
subgraph antipattern["Anti-pattern: same family"]
direction LR
A1[Model A generates] --> A2[Model A checks]
A2 -.->|Same blind spots,
same biases| A1
end
subgraph pattern["Zero Trust pattern"]
direction LR
B1[Model A generates] --> B2[Model B checks]
B1 --> B3[Model C checks]
B2 & B3 --> B4[Recorded verdict
+ dissent]
end
```
When a single AI family verifies its own output, the customer is back inside the perimeter trust model. The same model family has the same blind spots, the same training-data biases, and the same failure modes. Verification by the same family is the cognitive equivalent of a single auditor signing off on their own books.
The Zero Trust commitment: verification requires independent agreement across model families with different training data, different objectives, and different failure modes. When multiple independent providers agree, that agreement carries information no single provider can replicate. When they disagree, the disagreement is also informative, and the disagreement is recorded.
**What Independence rules out:**
- A single model issuing a verdict on its own output, even with a different prompt
- A vendor claiming "we verify our work"
- A "human in the loop" who only reviews what the same model has already approved
Same family, same blind spots. A self-assessment is not a verdict.
## 2. Doctrine: rules enforced architecturally
The second layer is about where the rules live. The Zero Trust commitment: rules are enforced by the architecture, not by operator preference.
| | Anti-pattern | Zero Trust pattern |
| --- | --- | --- |
| **Where the rule lives** | In a style guide, runbook, or PDF | In code that executes deterministically |
| **What enforces it** | Reviewer memory, policy, deadline pressure | A gate that cannot be bypassed |
| **What happens when convenient to skip** | The rule is skipped | The rule fires anyway |
| **Verification claim** | "Our process is to..." | "The system cannot ship without..." |
| **Audit answer** | "We have a policy" | "Here is the code path" |
The standard failure mode for analytical processes is that the rules exist in documentation but not in execution. A style guide says reviewers must check causal claims. The reviewer is under deadline pressure. The check does not happen. The output ships, and the documentation is silent on whether the check was actually performed. The rule existed; the enforcement did not.
The Zero Trust commitment generalizes beyond evidence gates. Any rule the verification system claims to enforce should be enforced architecturally. Refusals that the system claims to log should be logged automatically, not on operator discretion. Rubric versions that the system claims to apply should be applied by hash-binding, not by operator selection. Doctrine that lives only in documentation is not doctrine. Doctrine that the architecture enforces is.
**What architectural enforcement rules out:**
- A style guide that says reviewers must check causal claims, with no mechanism that prevents a deck from shipping when the check is skipped
- A vendor saying "we require evidence for every citation" when the evidence requirement can be turned off for a particular client
- A monthly review cadence that happens when someone remembers, on a calendar that someone controls
- A doctrine that exists in a PDF on a SharePoint somewhere
If the only thing standing between the rule and a violation is operator memory or operator discretion, the rule is aspirational.
## 3. Accountability: every decision survives independent challenge
The third layer is about the record. The Zero Trust commitment: every decision the verification system makes is logged in a form the verifier cannot alter without breaking the record, and the integrity of the record is verifiable by parties outside the verifier's control.
```mermaid
flowchart LR
subgraph closed["Anti-pattern: closed log"]
direction TB
C1[Verification decision] --> C2[Vendor-hosted log]
C2 --> C3[Vendor confirms
authenticity]
C3 -.->|Trust required| C4[Customer]
end
subgraph anchored["Zero Trust pattern"]
direction TB
A1[Verification decision] --> A2[Hash committed to
public chain]
A2 --> A3[Anyone can verify
without vendor]
A3 --> A4[Customer, third party,
regulator]
end
```
The standard mechanism for "outside the verifier's control" is cryptographic anchoring: hashes of the decision ledger committed to a public chain (or equivalent infrastructure) that the verifier does not control, cannot quietly alter, and will not lose access to even if the company changes hands.
The architectural consequence is that any verification system worth taking seriously publishes commitments anyone can independently verify. The public hash of a rubric version. The public hash of a source document. The cryptographic certificate that binds an output to the specific model board, the specific rubric, and the specific evidence set that produced it. None of these require trust in the verifier. All of them produce checks the verifier cannot evade.
The accountability principle extends to internal organizational use. A C-suite reader should not have to trust the analyst, the desk lead, or the chief of staff to forward the right version. The reader should be able to verify the cryptographic match between the document on screen and the certificate attached to it. The trust model is the hash, not the messenger.
**What Accountability rules out:**
- An audit log that the verifier hosts and could rewrite without anyone noticing
- A "trust us, our methodology is sound" claim with no third party that can independently check
- A certificate that says "approved" without anchoring the approval to the specific inputs, the specific rules, and the specific reviewers
- A version of a document on a CEO's screen that the desk team can quietly substitute for a different version
If the integrity of the record depends on the verifier behaving well, the integrity of the record is not verifiable.
## What the three principles produce, taken together
The three principles produce a set of architectural commitments any serious verification system carries. The list below is general, not specific to any vendor's implementation. Each commitment is a consequence of the Zero Trust posture. None of them is a feature. Removing any of them is a violation of the constitutional posture, not a product trade-off.
Click to expand each commitment.
No single AI family verifies its own output. Verdicts require agreement across independent providers with different training data, different objectives, and different failure modes. Disagreement is informative and is recorded, not hidden.
**What to look for:** A vendor that names which model families participate in verification, what happens when they disagree, and how dissent is logged.
Rules the system claims to enforce are enforced by deterministic gates, not operator discretion. If the system requires evidence before a citation reaches the analytical layer, the gate cannot be turned off, even by the vendor, even when commercially convenient.
**What to look for:** A vendor that can demonstrate the rule fires deterministically, not on policy. "We require X" is not a doctrine. "X cannot ship without Y, here is the code path" is.
Every verification decision is committed to a tamper-evident record. The integrity of the record is verifiable by parties outside the verifier's control. Standard implementation is a public chain (blockchain, transparency log, or equivalent infrastructure) the verifier does not control and cannot quietly alter.
**What to look for:** A vendor that can show you the public anchor for any given decision, and that anyone, including you, can independently verify the anchor without going through the vendor.
Refusals are logged automatically, not at operator discretion. The log is regularly reviewed and queryable. Over time, the refusal pattern becomes a discriminating signal anyone can examine, and that signal cannot be quietly curated by the vendor.
**What to look for:** A vendor that publishes the refusal log structure and review cadence, and that lets you audit specific refusals against the published policy.
The rules used to grade outputs are public-hash-committed for each customer. Customers can verify they are being graded against the rubric version they were sold, not a quietly updated one.
**What to look for:** A vendor that publishes a public hash of the active rubric version per customer, and a change log showing every rubric update with the date and the reason.
The cryptographic match between the document an end-reader sees and the certificate that attests to its provenance is verifiable without going through the verifier. A C-suite reader does not have to trust the analyst, the desk lead, or the chief of staff to forward the right version.
**What to look for:** A vendor whose certificate format includes a hash of the source document, and where the verification of that hash can be performed independently.
Certificates issued before any future acquisition, merger, or change of control remain verifiable against the public chain. New certificates issued after a change of control carry a different signature visible in the chain. Customers can detect a regime change without the verifier having to disclose one.
**What to look for:** A vendor whose public chain entries include a stable issuer identity that cannot be silently replaced. If the issuer key changes, the change is visible in the public record.
## Why the posture is more durable than methodology
A methodology-based verification claim is contestable. A Zero Trust posture is not contestable in the same way. The doctrine produces checks that are mathematical, not interpretive. Domain experts can challenge a methodology. They cannot challenge a hash.
That durability has consequences across every audience the verification system serves.
_Why should I trust your verdict?_
"You should not have to. Here is the verification you can run yourself."
_How do we audit verifiers at scale?_
"You do not have to audit the verifier. You audit the math the verifier published."
_Where is the moat?_
"In cryptographic enforcement of doctrine. A methodology can be quietly softened. A commitment to the public chain cannot."
_What changes if we buy the company?_
"Certificates issued before the acquisition still validate. New ones carry a different signature visible in the chain. The doctrine cannot be repealed silently."
The doctrine is, in a meaningful sense, a constitutional posture rather than a corporate policy. It cannot be repealed without the repeal being visible.
## Where this goes next
Seven procurement questions that translate the doctrine into specific commitments to demand from AI vendors.
How the doctrine plays out inside your own organization: decision-grade vs. volume-grade routing.
Dated signals over the next 18 months that will tell you whether the framework holds.
---
# The Buyer's Checklist
If you only read one page on this site, read this one. It translates The Doctrine into the specific questions you should put to any AI vendor claiming to verify analytical output, what a serious answer looks like, and what to walk away from.
**The single sentence test:**
> Can I verify your verdicts without having to trust you?
If the answer requires trusting the vendor, the vendor is selling perimeter security. If the answer is "yes, here is how," you are talking to a Zero Trust verifier. The seven questions below unpack what that single sentence means in procurement language.
**How to use this page.** Take the seven questions to a vendor evaluation. Each one corresponds to one of the seven architectural commitments in The Doctrine. Score each answer on a five-point scale:
- **0** No answer.
- **1** Marketing answer.
- **2** Process answer.
- **3** Architectural answer with limitations named.
- **4** Architectural answer with public commitments.
- **5** Architectural answer with public commitments and cryptographic verification you can run yourself.
A vendor that scores below 2 on any question is not a Zero Trust verifier. They may still be useful for volume-grade work. They should not be in your decision-grade lane.
## Why seven questions, not fewer
The failure mode this checklist addresses is stacked. Cheap drafting compounds with style-only review compounds with speed incentives compounds with buyer opacity. A vendor that addresses one layer while the others persist is not a verifier. They have fixed one of four broken things.
The seven questions test whether a vendor's architecture spans the full stack. A vendor that scores 5 on cryptographic anchoring but 0 on independent verification is solving one problem while the others compound. The score sum tells you the overall band. The weakest answer tells you where the architecture fails.
A stacked failure needs a stacked response.
## The seven questions at a glance
Which model families verify your output? What happens when they disagree?
Show me a rule that fires deterministically. Can it be turned off?
How do I verify a decision without going through you?
Where is your refusal log? Show me a specific refusal.
What rubric version am I being graded against? Show me the change log.
How does my CEO know they're looking at the version you certified?
What happens to my certificates if you are acquired?
---
# Lane Discipline
The Buyer's Checklist tells you what to demand from vendors. Lane Discipline tells you what to build inside your own organization. It is the operational practice that separates the decision-grade lane (slow, expensive, verified) from the volume-grade lane (fast, cheap, unverified) and prevents content from crossing between them without re-verification.
If you take only one operational practice from this framework, take this one. Lane discipline is the difference between an organization that benefits from AI-augmented analysis and one that quietly poisons its own decision-making with it.
**Three minute read.** Verification is expensive. Demanding it on every output is absurd. The fix is segmentation: a decision-grade lane where buyers pay for verification, and a volume-grade lane where speed dominates. The failure mode is content sliding between lanes without re-verification. The single most expensive mistake: a volume-grade memo becoming the basis for a board decision.
## How the two lanes operate
```mermaid
flowchart TD
A[Analytical output
created] --> B{Cost of being
wrong is high?}
B -->|Yes| C[Decision-grade lane]
B -->|No| D[Volume-grade lane]
C --> E[Verification
process]
E --> F[Decision-grade
certified output]
D --> G[Ship as-is,
clearly labeled]
F --> H[Suitable for board,
capital, regulatory use]
G --> I[Suitable for internal,
reversible use]
style C fill:#14532D,color:#fff
style D fill:#1E293B,color:#fff
style F fill:#14532D,color:#fff
style G fill:#1E293B,color:#fff
```
## Why two lanes
Verification is genuinely expensive and slow. That is precisely why it was the first thing cut under throughput pressure (see [The Frame](/the-frame)), and it is why demanding it everywhere would be absurd. Most analytical work does not need to be audited. Most internal synthesis is reversible, exploratory, or context-setting. Forcing verification on those outputs would collapse cycle time without producing proportional value.
The market segments. A decision-grade lane, where buyers pay for verification and producers invest in it. A volume-grade lane, where speed and cost dominate and everyone understands what they are getting. Two lanes can coexist. The danger is not that they exist. The danger is that organizations fail to separate them, letting volume-lane outputs slide into decision-grade use.
**Gresham's Law for reasoning.**
When all documents look equally polished (because AI-generated prose is uniformly fluent), decision-makers cannot distinguish decision-grade from volume-grade output without explicit labeling. The absence of labeling creates a market in which cheap, unverified analysis crowds out expensive, verified analysis because they look identical.
The unverified version is cheaper to produce, easier to ship, and indistinguishable on the surface. Without labels, it wins.
**The asymmetry the lanes are responding to.**
Drafting queues are empty. Review queues are full. Throughput metrics in most organizations still point at the wrong door.
Lane discipline is the operational response to this asymmetry. Without it, the lane that crowds out the other is the one that runs at the speed of drafting, not the speed of verification.
## What goes in which lane
The decision criterion is the cost of being wrong, not the importance of the topic.
**Cost of being wrong is high.** Capital allocation, M&A targets, regulatory submissions, board memos, crisis response briefs, public-facing analytical claims, anything where being wrong moves money, lives, policy, or reputation.
**Audience includes external parties.** Regulators, board, investors, partners, courts.
**Decision is binding or hard to reverse.** Once acted on, you cannot quietly walk it back.
**Reasoning will be challenged.** Litigation, audit, board pushback, regulatory review, journalist inquiry.
**Cost of being wrong is low.** Internal context-setting, first-draft synthesis, meeting prep, learning material, brainstorming output, weekly market summaries.
**Audience is internal.** Your team, your function, an internal working group.
**Decision is reversible.** Whatever the output prompts, you can adjust without external consequence.
**Reasoning is not the deliverable.** The synthesis is the value, and the synthesis is provisional.
Most output produced inside an organization is volume-grade. That is fine. The error is treating any of it as decision-grade by default, or letting it slide there without re-verification.
## How to classify at point of production
Lane assignment has to happen when content is created, not after. If classification happens after the fact, the classifier is usually the same person who would benefit from the content being treated as decision-grade. That is a corrupting incentive.
```mermaid
flowchart TD
A[Author creates output] --> B{Could cost of
being wrong exceed
cost of verification?}
B -->|Yes| C[Tag: decision-grade]
B -->|No| D[Tag: volume-grade]
B -->|Unsure| E[Tag: volume-grade]
E -.->|Re-verify before
any decision-grade use| C
C --> F[Routed to
verification process]
D --> G[Routed to
volume workflow]
style C fill:#14532D,color:#fff
style D fill:#1E293B,color:#fff
style E fill:#7C2D12,color:#fff
```
The practical rule: every analytical artifact carries a lane tag at the moment of creation. The tag is metadata, not decoration. It travels with the file, the deck, the memo, the briefing note.
**The diagnostic question for the author:**
> Could the cost of being wrong about this output exceed the cost of having it verified?
If yes, decision-grade. If no, volume-grade. If unsure, treat as volume-grade and require re-verification before any decision-grade use.
The classification needs to be visible to every downstream reader. A volume-grade memo that ends up on a CEO's desk should be obviously volume-grade. Not because the content is less rigorous (it might be perfectly rigorous), but because the reader needs to know what verification posture was applied.
## Routing rules
Three rules govern movement between lanes.
Without re-verification. The labeling rules out the lazy path: pulling last week's volume-grade synthesis and using it as the foundation for a board memo because it is "already written."
If you want to use volume-grade content in a decision-grade context, it goes through the verification process. Otherwise it does not get used.
For volume-grade use. Re-verification is not required. The verification you paid for once was sufficient; using the content in a lower-stakes context does not retroactively raise the bar.
The lane label can be downgraded by anyone. Upgrading requires a verification step.
Every excerpt, every quoted line, every screenshot in a downstream document inherits the lane label of the source. A board memo that quotes a volume-grade analysis is, at that quoted moment, importing volume-grade reasoning into a decision-grade context.
Either the quoted material was re-verified before inclusion (it becomes decision-grade for this purpose) or the board memo is now downgraded for the portions that depend on the quoted material. There is no third option.
## Four ways lane discipline fails
Each failure is invisible in the moment and only obvious in the post-mortem. Knowing the failure modes in advance is most of the defense.
Volume-grade synthesis passed up the chain arrives in a decision-grade context with no label. Decision-makers treat it as decision-grade because it looks like everything else they read.
**Fix:** Labels mandatory at point of creation. Unlabeled content defaults to volume-grade. Quotes inherit source labels.
Everything gets labeled "decision-grade" because labeling something volume-grade looks like the author is not taking the work seriously. The lane distinction collapses.
**Fix:** Decision-grade must carry a real verification cost. If verification is not happening, the label is theater. The label must correspond to a process difference.
Decision-grade verification becomes so slow that nothing makes it through. The organization defaults to volume-grade for decision-grade purposes because the alternative is missing the deadline.
**Fix:** Verification has to fit real cycle times. A verification system that adds three weeks to every board memo is a bottleneck, not a verifier.
Volume-grade content gets more leadership attention than decision-grade because there is more of it. What gets rewarded gets repeated: speed to inbox, zero stakeholder friction, confident language. What gets ignored: forecast scoring, postmortem accuracy, explicit uncertainty. The decision-grade lane becomes vestigial.
**Fix:** Decision-grade outputs need clear routing to the decision-makers. Volume-grade outputs need clear routing away from them unless explicitly requested. Performance reviews need to score accuracy alongside speed.
## What lane discipline looks like in practice
The simplest implementation is a metadata tag, a routing rule, and a periodic audit.
File naming convention, document header field, content management system tag, or watermark. Form does not matter as long as it is mandatory, visible, and travels with the content.
Decision-grade outputs go through verification before they can leave the analytical layer. Volume-grade outputs do not. Software that routes content between systems respects the lane.
Sample recent board decisions, capital allocation memos, regulatory submissions, public statements. Trace the analytical content underneath. What fraction was decision-grade at the moment of decision?
**The single board-level metric:**
> Of the analytical content that informed your last ten board-level decisions, what percentage carried a decision-grade label at the moment of decision?
| Score | Interpretation |
|---|---|
| Below 50% | Lane discipline is failing. Slippage is the norm. |
| 50% to 80% | Lane discipline is partial. Audit the gaps. |
| Above 80% | Lane discipline is working. Audit periodically. |
| Exactly 100% | Either exceptional or theater is winning. Audit the verification, not the labels. |
## What this is not
Three inoculations against common misreadings.
The volume-grade lane is where most AI-augmented analysis appropriately lives. Forcing decision-grade verification onto everything is a different failure mode with the same downstream effect.
The two reinforce each other. The Buyer's Checklist makes the verification you buy real. Lane Discipline makes the verification you bought useful.
What counts as decision-grade in 2025 may not in 2027. Revisit the lane criteria annually as AI capabilities, regulation, and competitive context shift.
## Where this goes next
Dated signals over the next 18 months that will tell you whether the framework holds.
The architectural commitments that make decision-grade verification real.
The seven procurement questions that determine what your decision-grade lane is buying.
---
# 2026 Watchlist
A framework that does not specify how it could be wrong is not a framework. It is a marketing claim. This page lists the concrete signals over the next twelve to eighteen months that test the analysis on this site. Each scenario carries an observable. Each observable resolves one way or the other.
Read it as a stress test, not as a forecast you can outsource. The framework is directional. Timing is uncertain. The point of this page is to make the uncertainty visible.
**Three minute read.** Five scenario trajectories. Two levers that move the system this quarter. One decision window (July 1, 2026). One thesis falsifier (AI volume rises, unanchored-claim rate does not).
## The decision window: July 1, 2026
There is one date worth treating as a deadline.
```mermaid
timeline
title The 2026 decision window
section Now (May 2026)
Two levers active : Release gate (0 to 60 days)
: Procurement artifacts (0 to 90 days)
section Jul 1 2026
The door : Structural validity gate in production
: Or faster cadence becomes culture
section 12-18 months
Trajectory locks in : Polished Flood Thin Spine (base case)
: Or one of four alternatives
```
By July 1, 2026, any pipeline that labels output "decision-grade" should have a structural-validity release gate in place. Miss the date and the team will normalize the faster cadence. Later gates feel like sabotage rather than quality control. The retrofit cost rises sharply once the new cadence is calendared, staffed, and expected by clients.
**The no-regret action.** Require every decision-grade brief to include:
1. A numbered list of core claims.
2. A source or computation for each claim.
3. Explicit assumptions named, not assumed.
4. At least one observation that would make the conclusion wrong.
Start with one product line. Scale once the gate proves it can hold under cycle-time pressure.
## The two levers active this quarter
The system has two enforcement points with enough friction to alter the trajectory inside ninety days. Everything else is downstream of these.
Editorial and production teams formally require assumption registers, boundary conditions, and claim-source maps before any output carries a decision-grade label.
**Confirms:** Procurement of Proof trajectory.
**Disconfirms if:** The deadline passes without a gate in production.
At least one procurement cycle includes structural transparency artifacts as acceptance criteria, not just deliverable templates.
**Confirms:** Procurement of Proof trajectory.
**Disconfirms if:** Procurement accepts narrative-only deliverables without objection.
## Five scenario trajectories
The framework predicts five plausible futures. They are not mutually exclusive; the market can split across them. Each has a named mechanism, a named observable, and a named falsifier.
**Trajectory:** The market drifts into endless glossy deliverables whose core claims lack traceable support.
**Mechanism:** Cheap drafting pushes volume up. Fixed reviewer capacity skims rather than tests. Throughput incentives treat "reads well" as "is solid."
**Observable:** Review cycles lengthening, style checks substituting for evidence checks, release gates optimized for speed rather than structural validity.
**Falsifier:** Independent sampling shows stable anchoring of core claims despite higher AI volume. If unanchored-claim rates stay at pre-AI baselines while output rises, the drift is not happening.
**Trajectory:** A minority of producers build verification artifacts. They form islands of trust. The rest of the market becomes a narrative ocean.
**Mechanism:** Some producers sell to high-downside buyers willing to pay for traceability. Everyone else competes on speed and polish.
**Observable:** Visible split in deliverable standards. Some firms ship assumption registers and claim-source maps. Others ship slides only.
**Falsifier:** Verification artifacts become table stakes across most major producers within twelve months. Fragmentation gives way to a new market norm.
**Trajectory:** Purchasing departments start buying traceability, not just slides. The market shifts from brand trust to auditability.
**Mechanism:** Buyers require claim-source maps and assumption registers as acceptance criteria. Vendors fund verification capacity to protect revenue.
**Observable:** At least one major procurement cycle includes structural transparency artifacts as acceptance criteria within ninety days.
**Falsifier:** Buyers keep selecting vendors primarily on reputation and turnaround. Procurement does not pull the system toward proof.
**Trajectory:** Badges and disclosures stack on top of the same thin verification layer. Compliance gets read as truth-testing.
**Mechanism:** Regulators demand visible action about AI-generated content. Rules focus on labeling AI use and content provenance (who made this, with what tool), not on whether the claim is true. Organizations optimize for the check-the-box audit.
**Observable:** New review boards, policies, or labels reference quality or governance but do not specify enforceable artifacts (no assumption registers, no claim-source maps, no discriminating tests).
**Falsifier:** Regulators bind certification to measurable accuracy claims, or impose real liability for false analytical claims. The trajectory changes character.
**Trajectory:** Verification gets cheap enough that the bottleneck flips. Teams scale output without hollowing it out.
**Mechanism:** Validator tools cut reviewer-minutes per claim by automatically tracing assertions, surfacing missing premises, and flagging conflicts. The cost curve for verification finally bends.
**Observable:** At least one buyer pays a premium or extends timeline specifically for validated outputs within ninety days. Organizations that build or buy genuine validation capacity gain a measurable competitive edge.
**Falsifier:** Validator tools fail on false negatives, or raise review time through noise. Organizations do not trust them for decision-grade work.
## The thesis falsifier
One observation, if true, breaks the entire analysis on this site.
**What would prove the framework wrong:**
Independent audits show AI-assisted volume rose, but the rate of unanchored core claims in released work did not rise.
Define "unanchored" narrowly: a core claim lacks a traceable source, a checkable computation, or a named primary witness.
Three observable proxies:
- **Stable rejection rates** for missing sourcing
- **Stable correction or retraction rates** after publication
- **Stable client-reported accuracy scores** over time
If those metrics stay at pre-AI baselines while volume climbs, the verification-gap story is wrong. Existing review gates are absorbing the shock. The framework should be revised, not defended.
## Regulatory signals (in force, monitor for enforcement)
These signals are already live as of May 2026. Track them for implementation severity, waiver applications, and the first material enforcement actions. The pattern: when these regimes show teeth, the buyer-side correction this framework predicts accelerates.
| Signal | Date in force | What to watch for | What it tells you |
|---|---|---|---|
| **[SR 26-2](https://www.federalreserve.gov/supervisionreg/srletters/SR2602.pdf)** Revised Guidance on Model Risk Management | April 17, 2026 | First material enforcement action against a major bank for inadequate AI/model validation | Procurement-of-proof dynamic is real and propagating from banking outward |
| **GENIUS Act** implementation | July 18, 2025 onward | First OCC enforcement against a stablecoin issuer for inadequate attestation; whether BDO attestations for federally-supervised stablecoins are substantive or perfunctory | Federal coalition is enforcing the perimeter, not rubber-stamping it |
| **SOC 2 2026 criteria** | 2026 | Enterprise procurement teams pushing 2026 SOC 2 criteria into AI-vendor RFPs | The crossover from security procurement to AI verification procurement is happening |
| **[EU AI Act](https://eur-lex.europa.eu/eli/reg/2024/1689/oj)** high-risk provisions | Staggered 2026-2027 | First regulator-level fine against an AI verification provider for inadequate transparency or oversight | European enforcement typically leads U.S. enforcement by 12-18 months in adjacent domains |
## Indicators worth instrumenting now
If you want to run the framework's tests on your own pipeline, these are the measures to collect. They distinguish a real verification deficit from a phantom one.
- Documents per reviewer per period
- Average review lag from draft to signoff
- Output volume change since AI-assisted drafting was adopted
- Structural-defect rate per document
- Share of documents with at least one coded defect
- Factual-error rate (separate from structural)
- Attribution time from defect flag to evidence path
- Correction latency from detection to documented fix
- Post-publication reversal rate
- What gets rewarded in performance reviews (speed, satisfaction, accuracy)
- Whether analysts are scored on forecast or claim accuracy
- Whether postmortems are run on released analysis
## How to use this page
If your organization labels output decision-grade, the structural-validity release gate should be in production by that date. Treat the date as a planning deadline tied to adoption lock-in, not as a hard event.
Release gate activation (60 days) and procurement artifact mandates (90 days) are the only enforcement points with enough friction to alter the trajectory in the short term.
Each of the five scenarios has a named observable. As one or more resolve, the market trajectory becomes clearer. Multiple scenarios can be true simultaneously; the market can fragment.
If you can measure unanchored-claim rates over time inside your own pipeline, you have a direct test of the framework's core claim. The framework predicts the rate rises. If yours does not, the framework is wrong for your context.
## An epistemic note
This framework, including this Watchlist, is directional rather than precise. The core mechanism (drafting costs collapse faster than verification capacity scales) is well-supported by available evidence. The timing, the actor sequencing, and the relative probability of each scenario are less well-grounded.
Treat the scenarios as a way to stress-test your process, not as a timer with an alarm you can set. The framework's own grounding rate sits below where you would want it for a definitive forecast. The point of publishing it openly is that it can be contested, refined, and corrected.
The doctrine improves when it is contested. Substantive disagreements through the [repository](https://github.com/DavidVALIS/decision-grade) are welcome.
## Where this goes next
The diagnosis: why current AI controls miss the real problem.
The posture: Zero Trust applied to AI verification.
The action: seven questions to put to AI vendors.
---
# About
**Disclosure first.** This site is published by VALIS Systems. The author is the founder. VALIS builds AI verification infrastructure in the category this framework describes. That commercial interest is acknowledged in every direction this document can be read.
The doctrine itself is independent of any specific product. The framework is published openly because the Zero Trust posture it advocates extends to the doctrine itself. You should not have to trust the publisher.
## Why this framework exists
The arguments on this site come from a specific path. The three pieces that produced the framework, in order.
```mermaid
timeline
title How the framework emerged
section 2024
Yahoo AI protocol : Authored an AI usage protocol for Yahoo
: Scoped to facts, hallucinations, citation grounding
: Right answer for the 2024 question
section 2024 - 2026
Building VALIS : Two years designing and building verification infrastructure
: Running real analytical work through it
: Watching which checks held and which were theater
section 2026
The realization : The verification problem was human to begin with
: AI did not create it. AI exposed it.
: Published as this framework
```
### Yahoo, 2024: the protocol that was right for its moment
In 2024, I authored an AI usage protocol for Yahoo. It was scoped to what the AI conversation was actually about at the time: making sure models did not fabricate facts, that citations grounded back to sources, that AI use was disclosed, that human review was in the loop. It was a reasonable response to the AI landscape of 2024.
It was also the wrong frame for where things were going. The realization came over the following months as models improved on the hallucination axis faster than the protocols I had written assumed. The remaining failure mode was no longer getting the facts wrong. It was producing fluent, well-cited, hallucination-free output that still reasoned badly. The 2024 toolkit was solving the visible problem. The problem was changing underneath it.
### 2024 to 2026: building VALIS
From 2024 to 2026, I designed and built VALIS. The work was equal parts engineering and analysis. We ran real verification through the system. We saw which architectural commitments held under pressure and which were performative. We saw what verification at scale actually requires when you cannot quietly soften it for a deadline or a difficult client.
Three observations crystallized over those two years.
Verification does not get cheaper at the rate generation does. It runs on a different cost curve. That asymmetry is the operational risk most organizations have not yet priced.
A product that asks the customer to trust the verifier is a different category from one that produces independently checkable verification. The architectural commitments define the category.
AI did not create the verification deficit. AI made it impossible to ignore. The same gap existed in human-produced analytical content for decades. We just could not see it.
### 2026: the realization that drives this framework
By early 2026, the central observation crystallized.
The verification problem was human to begin with. AI exposed it. Anything we build to address AI verification has to address the deeper deficit underneath it.
That observation reframed everything. The framework on this site is the distillation of that reframing: the doctrine, the architecture, the operational practice. Published openly because the doctrine should survive the publisher, the company, and the founder.
## What this framework is, and is not
The framework is a directional reading of where AI verification is heading. It is not a guarantee, not legal advice, not investment guidance.
The procurement and contracting recommendations on this site are framing, not legal counsel. Have your counsel review any specific contract language before signing.
References to capital market signals in the [Watchlist](/watchlist) are framework-test signals, not investment recommendations. Apply your own diligence.
The framework predicts a market correction is likely within 18 months. The Watchlist names the dated signals that will test the prediction. The prediction could be wrong, and the framework specifies how it would fail.
The framework is general. Your situation is specific. Use the doctrine, the buyer's checklist, and the lane discipline practices as inputs to your own thinking, not as a substitute for it.
## What the framework owes the reader
The Zero Trust posture extends to the framework itself. Four commitments.
The source is public. The framework is published in AI-readable form (see [llms.txt](https://decision-grade.ai/llms.txt) and [llms-full.txt](https://decision-grade.ai/llms-full.txt)). Anyone can audit the arguments.
Licensed under [Creative Commons Attribution 4.0](https://creativecommons.org/licenses/by/4.0/). Use, adapt, build on it, with attribution. Implement the doctrine elsewhere if you want to.
Substantive disagreements are welcome via [issues and pull requests](https://github.com/DavidVALIS/decision-grade) on the repository. The doctrine improves when it is contested.
The [2026 Watchlist](/watchlist) specifies dated signals that will tell you (and me) whether the framework holds. A framework that does not specify how it could be wrong is not a framework.
## Author
David Lundblad. Founder of VALIS Systems. Previously authored Yahoo's AI usage protocol (2024). Two years designing and building VALIS (2024-2026). Publishing this framework as the distillation of that work.
Reach me through:
Issues and pull requests on the repository. Open the framework, contest it, fork it.
VALIS Systems. The reference implementation of the doctrine on this site.
## Where this goes next
Start with the diagnosis: why current AI controls miss the real problem.
The Zero Trust posture, in three layers.
Seven procurement questions to put to AI vendors.
---
# MCP Server
The framework exposes a Model Context Protocol server at `https://decision-grade.ai/api/mcp`. Any MCP-aware AI client can connect, list available tools, and read the framework directly from the user's chosen AI.
This is the on-doctrine surface for AI access to the framework. The user controls which AI runs the queries. The framework just publishes data with a typed interface.
**Three minute read.** Endpoint: `https://decision-grade.ai/api/mcp`. Transport: HTTP + JSON-RPC 2.0. Four tools: `list_pages`, `get_page`, `search`, `get_full_framework`. Two readable resources: `llms.txt`, `llms-full.txt`. No authentication, no rate limit beyond Cloudflare's defaults.
## Why an MCP server
The framework's [Zero Trust posture](/the-doctrine) says the customer should not have to trust the verifier. A hosted chat agent inserts the framework's hosting choices into the trust path. An MCP server keeps the user's AI of choice in that path. The framework is just published data.
Connect from Claude Desktop, Cursor, Windsurf, or any MCP-aware client. The AI you trust does the reading and reasoning.
The server returns markdown content and search results. It does not synthesize answers, refuse questions, or modify what the AI sees.
## Tools
Returns all framework pages with id, title, num, and description. Use this first to discover what is available.
**Arguments:** none.
**Returns:** JSON array of page summaries.
Fetch the full markdown content of one page.
**Arguments:** `slug` (string). One of `introduction`, `the-frame`, `the-doctrine`, `buyers-checklist`, `lane-discipline`, `watchlist`, `about`.
**Returns:** the full page markdown including callouts, tables, and mermaid diagrams.
Text search across the entire framework. Returns matching sections grouped by page, with snippets.
**Arguments:** `query` (string), optional `limit` (number, default 8, max 20).
**Returns:** JSON with `total` count and `results` array of section matches.
Return the entire framework as a single document (the `llms-full.txt` bundle).
**Arguments:** none.
**Returns:** ~70 KB of plain text. Use when you want the whole framework in one fetch.
## Connect from Claude Desktop
Edit your Claude Desktop config (Settings → Developer → Edit Config), then add:
```json
{
"mcpServers": {
"decision-grade": {
"url": "https://decision-grade.ai/api/mcp"
}
}
}
```
Restart Claude Desktop. The decision-grade tools appear in the tools menu. Ask Claude anything about AI verification and it will fetch from the framework.
## Connect from Cursor
In Cursor settings, add an MCP server entry:
- **Name:** `decision-grade`
- **Type:** `HTTP`
- **URL:** `https://decision-grade.ai/api/mcp`
The four tools become available in your Cursor agent.
## Connect from any other client
The transport is plain HTTP + JSON-RPC 2.0. Any client that speaks MCP over HTTP can connect.
```bash
# List available tools
curl -X POST https://decision-grade.ai/api/mcp \
-H "content-type: application/json" \
-d '{"jsonrpc":"2.0","id":1,"method":"tools/list"}'
# Search the framework
curl -X POST https://decision-grade.ai/api/mcp \
-H "content-type: application/json" \
-d '{"jsonrpc":"2.0","id":2,"method":"tools/call","params":{"name":"search","arguments":{"query":"verification deficit","limit":5}}}'
```
## What this is not
The server does not generate text or answer questions. It returns data. The AI you connect does the reasoning.
Public read-only endpoint. No accounts, no API keys. The framework is published openly.
Each request stands alone. There is no session, no memory, no personalization.
## Verifying the server
The same Zero Trust posture applies to this server. You can verify it operates as documented:
- **Source is public.** The Cloudflare Pages Function code is at [github.com/DavidVALIS/decision-grade/blob/main/functions/api/mcp.ts](https://github.com/DavidVALIS/decision-grade/blob/main/functions/api/mcp.ts).
- **Output is deterministic.** Same query yields the same result.
- **Content is canonical.** All tool responses derive from `llms-full.txt` and `search-index.json`, which are also served at the site root.
- **No hidden behavior.** The server does not call external LLMs or transform content; it only reads from the published bundle.
If you observe a discrepancy between what the MCP server returns and what is published on the site, [open an issue](https://github.com/DavidVALIS/decision-grade/issues).
## Where this goes next
The diagnosis: why verification is the operational problem now.
Zero Trust applied to AI verification.
Seven procurement questions to put to AI vendors.