C08 Layer 2 - Runtime Enforcement

Prompt and Content Injection Defense

Structural separation of instruction and data channels at the gateway boundary. Schema validation, guard-model scanning at higher tiers, quarantine of suspect inputs.

Why

  • Injection attacks cause the model to treat untrusted content as instructions.
  • Indirect injection (retrieved docs) is especially dangerous because it arrives “inside the context.”

What

  • A layered defense that: classifies content sources (trusted vs untrusted)
  • normalizes/strips untrusted markup and instruction-like patterns
  • enforces instruction hierarchy separation
  • escalates high-risk content to additional verification or HITL

How

  • strict input contracts: tools accept structured JSON, not free text
  • isolate system instructions from untrusted content in separate channels/fields
  • apply content normalization for HTML/PDF/email sources before model ingestion
  • optional guard scanning for known exploit patterns; throttle probing behavior

Evidence

  • detection metrics (block rate, false positives)
  • exploit success rate from adversarial test suite (C15)
  • provenance logs for retrieved sources

Failure modes

  • feeding raw HTML/PDF into the model
  • allowing untrusted content to modify system prompt
  • assuming internal documents cannot be malicious
  • no regression tests for injection scenarios

Treating C08 as a quality gate for retrieved content. C08 defends against adversarial inputs and instruction injection, not against stale, low-confidence, or unverified information. See the Memory flow scope note in the Reference Architecture and C18 (Data Quality Gates) for the boundary that covers content quality and freshness.

NIST AI RMF alignment

C08 maps to MANAGE and MEASURE. See the framework paper for the specific subcontrol mappings.

ISO/IEC alignment

C08 maps to ISO/IEC 27001. Typical evidence: see the Evidence section above.

Contents
On this page
All controls