AI Execution Standards
Organizational Expectations Policy
The operating principle behind these standards is the Universal Translation Rule: replace "human produces artifact" with "human defines spec → system produces artifact."
1. Core Principle
AI is treated as an autonomous worker, not a chatbot.
All work assigned to AI must be executable without real-time supervision.
2. Mandatory Work Layers
Every AI-enabled workflow must define all four layers.
Layer 1 — Prompt Craft (baseline skill)
Employees must:
- write clear instructions
- specify format
- include examples when useful
- resolve ambiguity upfront
Minimum bar: AI output should require ≤20% correction.
Layer 2 — Context Engineering
Each team must maintain a structured context file containing:
- goals
- constraints
- terminology
- quality standards
- relevant documents
- tool access rules
Requirement: AI tasks must load this context before execution.
Layer 3 — Intent Engineering
Every workflow must define:
- objective hierarchy
- tradeoff rules
- escalation conditions
- what AI may decide vs must escalate
No agent may run without defined intent.
Layer 4 — Specification Engineering (highest standard)
All non-trivial tasks must have a written specification.
Required spec components:
- problem statement
- scope
- inputs
- constraints
- acceptance criteria
- failure conditions
- success tests
- completion definition
Rule: If success cannot be verified objectively, the task is not spec-ready.
2b. Specification Primitives (learnable skills)
Specification engineering is built from five primitives. Each is a distinct skill to practice.
Primitive 1 — Self-Contained Problem Statements
State the problem with enough context that the task is solvable without the agent fetching more information. Surface hidden assumptions. Articulate constraints you normally leave implicit.
Training exercise: Take a request you'd normally make conversationally and rewrite it as if the recipient has never seen your project, doesn't know your terminology, and has access to nothing beyond what you include.
Primitive 2 — Acceptance Criteria
Define what done looks like so that an independent observer can verify the output without asking questions. If you can't write three sentences that verify completion, you don't understand the task well enough to delegate it.
Primitive 3 — Constraint Architecture
Define four categories for every task:
- Must — non-negotiable requirements
- Must not — forbidden actions or outputs
- Prefer — guidance when multiple valid approaches exist
- Escalate — conditions where the agent must stop and ask
Primitive 4 — Decomposition
Break tasks into components that can be executed independently, tested independently, and integrated predictably. Target granularity: subtasks of ≤2 hours with clear input/output boundaries, each verifiable on its own.
Primitive 5 — Evaluation Design
For every recurring AI task, build 3-5 test cases with known-good outputs. Run them after model updates to catch regressions. Outputs are judged by metrics, not appearance.
3. Specification Quality Rules
A valid spec must be:
Self-contained — No external knowledge required.
Testable — An independent reviewer can verify output quality.
Constrained — Must / Must Not / Prefer / Escalate rules defined.
Decomposable — Task can be split into independently verifiable subtasks.
4. Delegation Readiness Checklist
Before assigning work to AI, employees must confirm:
- I understand the task completely
- I can define success objectively
- I can list failure cases
- I can describe constraints
- I can verify results without interpretation
If any answer = no → do not delegate yet.
5. Evaluation Standard
Every recurring AI workflow must include evaluation tests.
Minimum:
- 3 known-good test cases
- measurable scoring criteria
- regression checks after model updates
Outputs judged by metrics, not appearance.
6. Failure Responsibility Model
Failure is attributed by layer:
| Failure Type | Root Cause |
|---|---|
| Bad output | Prompt issue |
| Irrelevant output | Context issue |
| Wrong direction | Intent issue |
| Incomplete output | Spec issue |
Teams must fix the responsible layer, not retry prompts.
7. Organizational Roles
Each production AI system must have:
- Spec Owner
- Context Owner
- Evaluation Owner
One person may hold multiple roles initially.
8. Documentation Standard
All internal documents must be written as if an agent will execute them.
Documents must:
- state assumptions
- define terms
- specify outcomes
- include constraints
- avoid implicit knowledge
9. Prohibited Practices
Employees may not:
- delegate vague tasks to AI
- rely on iterative correction loops
- assume shared context
- accept outputs without verification criteria
- run agents without specs
10. Performance Expectation
Employees using AI are evaluated on:
- clarity of instructions
- completeness of specs
- reliability of outputs
- reduction of rework
- quality of evaluation design
Not on prompt cleverness.
Executive Summary Rule
Clear thinking precedes AI execution.
If you cannot specify it, you cannot automate it.
← Back to home · The reference framework · The AI Lab · Glossary