HELP

+40 722 606 166

messenger@eduailast.com

EU AI Act Compliance Lab: Classify Systems & Write Tech Docs

AI Ethics, Safety & Governance — Intermediate

EU AI Act Compliance Lab: Classify Systems & Write Tech Docs

EU AI Act Compliance Lab: Classify Systems & Write Tech Docs

Classify your AI system and produce EU AI Act-ready documentation.

Intermediate eu-ai-act · ai-governance · compliance · technical-documentation

About this compliance lab

This course is a short technical book disguised as a hands-on lab: you will take one AI system (real from your organization or a realistic case) and walk it from “What are we building?” to an audit-friendly EU AI Act documentation package. The focus is practical execution—classification, obligations mapping, controls, and technical documentation structure—so you can collaborate effectively with legal, security, product, and engineering without getting lost in abstract policy.

Instead of treating the EU AI Act as a wall of text, you’ll learn a repeatable workflow that turns requirements into concrete artifacts: a risk classification memo, a control checklist with owners, an evidence register, and a technical file index that makes audits and internal reviews faster. By the end, you’ll have a blueprint you can reuse for future systems, model updates, and supplier-integrated components.

Who this is for

This lab is built for product managers, ML engineers, compliance owners, risk teams, and startup leaders who need to move from “we’ve heard about the EU AI Act” to “we can demonstrate compliance readiness.” It assumes you can describe an AI system’s purpose and deployment context, but it does not require a legal background.

What you will build (deliverables)

  • A clearly bounded system definition (intended purpose, users, environments, interfaces)
  • A documented EU AI Act risk classification (including uncertainty handling and escalation notes)
  • An obligations-to-controls plan (owners, timelines, evidence pointers)
  • A provider-style technical documentation outline with an evidence index
  • Human oversight measures and operator/user instructions aligned to the intended use
  • A post-market monitoring and incident workflow you can operationalize

How the 6 chapters fit together

Chapter 1 sets the foundation: the definitions that drive everything else (roles, intended purpose, boundaries) and the documentation discipline you’ll need. Chapter 2 applies a structured decision path to classify your system and capture the rationale you’ll later defend. Chapter 3 converts obligations into a manageable control framework with ownership, versioning, and traceability.

With the plan in place, Chapter 4 focuses on the technical documentation spine—architecture, data governance, evaluation evidence, robustness, logging, and how to organize a technical file for review. Chapter 5 makes the system deployable responsibly: human oversight design, transparency touchpoints, and instructions that operators can actually follow. Finally, Chapter 6 prepares you for real operations: post-market monitoring, incidents, corrective actions, and an audit-ready packaging approach.

Learning approach

Every chapter ends with milestones that produce artifacts. You’ll be encouraged to write in plain language, link every claim to evidence, and maintain a single source of truth for assumptions and version changes. This is the same style used by teams that need to move quickly while staying defensible under scrutiny.

Get started

If you want to practice the workflow immediately, Register free and begin with the Chapter 1 scoping exercises. To explore related governance and safety material, you can also browse all courses.

What You Will Learn

  • Determine whether an AI use case is prohibited, high-risk, limited-risk, or minimal-risk under the EU AI Act
  • Map roles and responsibilities across provider, deployer, importer, distributor, and product manufacturer scenarios
  • Build a risk classification record with clear assumptions, boundaries, and evidence links
  • Draft EU AI Act-style technical documentation structure and an evidence-ready index
  • Define and document data governance, model development controls, and evaluation results
  • Design human oversight measures and user instructions aligned to the intended purpose
  • Create a post-market monitoring and incident reporting plan with actionable triggers
  • Prepare an audit-friendly package that supports internal review and conformity assessment readiness

Requirements

  • Basic understanding of how ML/LLM systems are built and deployed
  • Ability to read product requirements and describe an AI system’s intended purpose
  • Access to a sample AI use case from your work (or a provided fictional case) to use throughout the lab
  • Comfort working with checklists, templates, and structured documentation

Chapter 1: EU AI Act Fundamentals for Builders

  • Milestone: Define your system’s intended purpose and boundaries
  • Milestone: Identify where your system sits in the AI value chain
  • Milestone: Create an obligations map you can maintain
  • Milestone: Set up a compliance evidence workspace and naming conventions
  • Milestone: Establish a reusable documentation template pack

Chapter 2: Classify the AI System by Risk Category

  • Milestone: Screen for prohibited practices and document the rationale
  • Milestone: Run the high-risk decision tree and record outcomes
  • Milestone: Classify transparency duties for limited-risk systems
  • Milestone: Produce a signed-off risk classification memo

Chapter 3: Build the Compliance Plan and Control Framework

  • Milestone: Convert obligations into a control checklist with owners
  • Milestone: Define your quality management and change control workflow
  • Milestone: Create a traceability matrix from requirements to evidence
  • Milestone: Draft a gap analysis and remediation plan
  • Milestone: Prepare an internal review packet for sign-off

Chapter 4: Draft the Technical Documentation (Provider-Style)

  • Milestone: Write the system description and intended purpose section
  • Milestone: Document data governance and dataset lineage
  • Milestone: Capture model development, evaluation, and performance evidence
  • Milestone: Produce the technical documentation index and cross-references
  • Milestone: Run a completeness check against your control checklist

Chapter 5: Human Oversight, Transparency, and User Instructions

  • Milestone: Specify human oversight measures and intervention points
  • Milestone: Draft user instructions and operational constraints
  • Milestone: Create transparency notices and disclosure artifacts
  • Milestone: Validate usability: can operators follow the instructions?
  • Milestone: Finalize a deployer handoff pack

Chapter 6: Post-Market Monitoring and Audit-Ready Packaging

  • Milestone: Design post-market monitoring KPIs and drift triggers
  • Milestone: Draft an incident reporting and corrective action workflow
  • Milestone: Assemble an audit-ready evidence package with an index
  • Milestone: Run a mock audit and produce an improvement backlog
  • Milestone: Create a 90-day compliance maintenance plan

Sofia Chen

AI Governance Lead & Compliance Documentation Specialist

Sofia Chen leads AI governance programs for product teams operating in regulated markets. She specializes in translating EU AI Act obligations into practical engineering workflows, technical documentation, and audit-ready evidence. Her work focuses on risk classification, post-market monitoring, and human oversight design.

Chapter 1: EU AI Act Fundamentals for Builders

The EU AI Act is not “an ethics checklist.” It is a product-and-process regulation that asks builders to define what they are shipping, who is responsible for which obligations, what risks it creates in real use, and what evidence proves you did the required work. This course is a compliance lab, so we’ll treat the Act like an engineering spec: clarify boundaries, map roles, classify risk, and then build a documentation set that can survive external scrutiny.

This chapter establishes the builder’s workflow you will repeat throughout the lab. You will (1) define your system’s intended purpose and boundaries; (2) identify where you sit in the AI value chain (provider, deployer, importer, distributor, product manufacturer); (3) create an obligations map you can maintain as the system evolves; (4) set up a compliance evidence workspace with naming conventions; and (5) establish a reusable documentation template pack that mirrors EU AI Act expectations.

Two practical principles will guide you. First, classification is only as good as your scoping assumptions—if you don’t write them down, you will keep re-litigating the same decisions. Second, “documentation” is not prose; it is an evidence index that links claims (what you say is true) to artifacts (what proves it). You will learn to write like an auditor will read: quickly, skeptically, and with a bias toward traceability.

Practice note for Milestone: Define your system’s intended purpose and boundaries: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Milestone: Identify where your system sits in the AI value chain: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Milestone: Create an obligations map you can maintain: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Milestone: Set up a compliance evidence workspace and naming conventions: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Milestone: Establish a reusable documentation template pack: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Milestone: Define your system’s intended purpose and boundaries: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Milestone: Identify where your system sits in the AI value chain: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Milestone: Create an obligations map you can maintain: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Milestone: Set up a compliance evidence workspace and naming conventions: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Sections in this chapter
Section 1.1: What the EU AI Act regulates (systems, models, components)

The EU AI Act regulates the placing on the market, putting into service, and use of AI systems in the EU, with obligations that depend on risk and on your role. For builders, the first practical task is to identify what you are actually delivering: an AI system, a component of a larger product, or a more general model used downstream. This is not wordplay; it changes which technical documentation, instructions, and lifecycle controls are expected and who must maintain them.

In engineering terms, think in layers. A “system” is the end-to-end capability as used in a context (inputs, processing, outputs, and integration points). A “model” is a trained artifact (or family of artifacts) that may be embedded in many systems. A “component” is a module—possibly AI-driven—within a larger product (for example, an AI-based risk scoring component inside a loan origination platform). Your scoping milestone here is to draw the boundary: what is inside your responsibility (training, evaluation, configuration, monitoring hooks) and what is outside (customer data pipelines, business rules you do not control, downstream fine-tuning).

Common mistake: treating a model card as “the documentation.” Model cards are useful, but the Act expects system-level thinking: intended purpose, foreseeable misuse, integration constraints, and operational controls. Another mistake is assuming that because your product is “just an API,” you are exempt from system obligations. If you supply an API that determines or meaningfully influences decisions in regulated contexts, you will still need to document the intended purpose, performance limits, and safe integration requirements.

  • Practical outcome: a one-page system boundary note that lists: (a) delivered artifacts (model weights, service, SDK), (b) required inputs and allowed sources, (c) outputs and how they must be interpreted, (d) deployment constraints (latency, confidence thresholds, human review requirements), and (e) explicit non-intended uses.
Section 1.2: Core definitions (provider, deployer, intended purpose)

EU AI Act compliance starts with vocabulary. Teams fail audits not because they lack controls, but because they cannot consistently answer “who is the provider here?” and “what is the intended purpose?” across versions, markets, and customer deployments. Your second milestone—identify where your system sits in the AI value chain—depends on these definitions.

Provider is typically the entity that develops an AI system (or has it developed) and places it on the market or puts it into service under its own name or trademark. If you ship a hosted service under your brand, you are usually the provider. If you white-label a vendor system under your brand, you can still become the provider. Deployer is the entity using the system under its authority (often your customer). Importer and distributor matter when systems enter the EU supply chain; they inherit specific duties around verification, traceability, and cooperation with authorities. Product manufacturer matters when AI is part of a regulated product ecosystem (e.g., machinery, medical devices), where AI Act obligations interact with existing conformity regimes.

Intended purpose is your anchor definition: the specific use for which the system is meant, as described by the provider in instructions and marketing. Builders should treat intended purpose like a requirements baseline: it drives risk classification, required controls, evaluation design, and the content of user instructions. If you keep it vague (“improves productivity”), you will struggle to justify why you are not in a higher-risk category, and you will be unable to specify appropriate human oversight.

  • Practical outcome: a role-and-responsibility matrix that states: who is provider/deployer in your default sales motion, what changes in an OEM/white-label deal, and which party maintains (a) post-market monitoring, (b) incident reporting pathways, (c) customer-facing instructions, and (d) change control approvals.

Common mistake: writing intended purpose from a product marketing perspective rather than an operational one. The intended purpose should specify the decision or workflow it supports, the target users, the deployment environment, and what the output should and should not be used for.

Section 1.3: Risk-based approach overview and why it matters

The Act uses a risk-based approach: prohibited practices (unacceptable risk), high-risk systems (stringent requirements), limited-risk systems (primarily transparency duties), and minimal-risk systems (few explicit obligations, but still subject to general legal requirements). Your job is not to “pick the lowest risk label.” Your job is to produce a defensible classification record with assumptions, boundaries, and evidence links—the kind of record you can maintain through product evolution.

Start classification from use case + context, not from model type. The same underlying model can be minimal-risk in one context and high-risk in another. For example, a text generator used for internal drafting may be limited-risk with transparency measures, while a system used to screen job applicants or evaluate creditworthiness can trigger high-risk obligations depending on how it influences decisions and whether it falls into listed high-risk domains.

Engineering judgement matters most in three places. (1) Foreseeable misuse: what users will realistically do, not what your terms of service hope they will do. (2) Decision influence: whether outputs materially shape outcomes in sensitive domains. (3) System boundaries: whether you control deployment configurations, thresholds, and monitoring—or whether customers can repurpose the system in ways that change the risk.

  • Practical outcome: a risk classification record containing: (a) intended purpose statement, (b) deployment scenarios considered, (c) exclusions and non-intended uses, (d) mapping to risk tier rationale, and (e) “evidence pointers” to artifacts such as evaluations, UI screenshots, user instructions, and contractual controls.

Common mistake: relying on a single sentence like “not for high-stakes use” without enforcement mechanisms. If you claim a use is excluded, you should show how you prevent or discourage it (technical constraints, contractual terms, user prompts, access controls, customer vetting, monitoring triggers).

Section 1.4: Timeline, enforcement signals, and practical readiness

Compliance readiness is a scheduling problem as much as a legal one. The EU AI Act obligations phase in over time, and organizations that treat compliance as a “last month before launch” effort usually fail because evidence cannot be manufactured retroactively. Training data provenance, evaluation baselines, and change logs must exist when the work happens.

Practical readiness means watching enforcement signals: regulator guidance, harmonized standards, and the behaviors of large buyers who will demand documentation in procurement. Even before formal deadlines, enterprise customers often require a “compliance posture” package: role mapping, intended purpose statement, risk classification record, and a documentation index. If you cannot provide these, sales cycles slow down and security reviews expand.

Translate the timeline into engineering milestones. Add “compliance gates” to your product lifecycle: (1) concept gate—intended purpose and boundary; (2) data gate—data governance plan and dataset inventory; (3) model gate—evaluation report and known limitations; (4) release gate—user instructions, human oversight measures, and evidence index; (5) post-release gate—monitoring plan, incident intake, and change control.

  • Practical outcome: a readiness roadmap aligned to your release train, showing which artifacts must exist by which sprint (e.g., evaluation protocol before model selection; user instructions before beta access; monitoring dashboards before GA).

Common mistake: assuming that “we will document later” is acceptable. Under audit pressure, teams discover gaps like missing dataset licenses, no record of why certain thresholds were chosen, or no traceability from a known issue to a corrective action. The earlier you design for evidence, the cheaper compliance becomes.

Section 1.5: Documentation mindset: evidence, traceability, audit trails

Documentation under the EU AI Act should be treated like an engineering control system: it provides repeatable traceability from requirements to implementation to verification. The milestone in this chapter is to set up a compliance evidence workspace and naming conventions, because good evidence is discoverable. If evidence cannot be found quickly, it effectively does not exist.

Think in three layers. (1) Claims: statements you make about the system (intended purpose, risk tier, performance, limitations, human oversight). (2) Controls: processes and technical measures that make those claims true (data governance procedures, evaluation pipelines, access controls, review workflows). (3) Artifacts: concrete outputs (dataset inventories, evaluation reports, model version logs, UI screenshots, incident records). Your documentation pack should connect these via stable IDs and links.

Set up an evidence index—a single table that lists artifact name, owner, version, location, and which requirement/claim it supports. Use naming conventions that survive time and team changes, such as: AIAC-TECHDOC-01-SystemDescription-v1.2.pdf, AIAC-EVAL-05-BiasAudit-2026-02-15.md, and AIAC-DATA-03-DatasetRegister.xlsx. Store immutable snapshots for releases, and keep working documents separately to avoid overwriting historical proof.

  • Practical outcome: a reusable documentation template pack: system description, intended purpose & boundaries, risk classification record, data governance plan, model development & change control, evaluation report, human oversight plan, and user instructions—plus an evidence index that links them.

Common mistake: mixing “policy statements” with “evidence.” A policy that says “we test for bias” is not evidence; an evaluation report with methodology, results, and sign-off is. Another mistake is scattering artifacts across personal drives and chat threads. Centralize and control access; you will need to show an audit trail, not just final PDFs.

Section 1.6: Lab case selection and scoping rules

This lab works best when you pick a concrete case and keep it stable while you learn the mechanics. Your final milestone for Chapter 1 is to select a lab case and apply scoping rules so your classification and documentation remain coherent. Choose a system you can describe end-to-end in one page, with a clear user, a clear workflow, and at least one measurable output (score, label, recommendation, generated text) that affects an action.

Use scoping rules to prevent “compliance sprawl.” Rule 1: define one primary intended purpose and up to three supported use scenarios; list everything else as non-intended. Rule 2: freeze the version under assessment (model version, prompts, thresholds, UI flow, integration points). Rule 3: specify data boundaries: what training data you used, what runtime data you expect, and what you explicitly prohibit (e.g., special category data unless justified and controlled). Rule 4: identify human roles: who sees outputs, who can override, and what happens when the system is uncertain.

As you scope, also identify where you sit in the value chain for this case. Are you the provider of the full system, or are you a component supplier? Are you also the deployer in a managed service, or does the customer operate it? These answers determine what your obligations map will include and which documents must be customer-facing (instructions, transparency notices) versus internal (development logs, evaluation protocols).

  • Practical outcome: a lab “system dossier” starter: intended purpose paragraph, boundary diagram (text is fine), role mapping for your scenario, and a first draft of your evidence workspace structure (folders, naming, owners).

Common mistake: picking a case that is too abstract (“a general chatbot”) or too broad (“the whole platform”). Pick one deployable capability and one deployment context. You can expand later, but you cannot classify what you cannot scope.

Chapter milestones
  • Milestone: Define your system’s intended purpose and boundaries
  • Milestone: Identify where your system sits in the AI value chain
  • Milestone: Create an obligations map you can maintain
  • Milestone: Set up a compliance evidence workspace and naming conventions
  • Milestone: Establish a reusable documentation template pack
Chapter quiz

1. Why does the chapter emphasize that the EU AI Act is not “an ethics checklist”?

Show answer
Correct answer: Because it is a product-and-process regulation requiring defined scope, assigned responsibilities, real-use risk thinking, and evidence of required work
The chapter frames the Act as an engineering-style regulation focused on what is shipped, who is responsible, what risks occur in real use, and what evidence demonstrates compliance work.

2. What is the main purpose of defining your system’s intended purpose and boundaries at the start of the workflow?

Show answer
Correct answer: To prevent repeated re-debates of classification decisions by making scoping assumptions explicit
The chapter states classification is only as good as scoping assumptions; writing them down avoids re-litigating decisions.

3. Which set of roles reflects the AI value chain positions the chapter says you must identify for your system?

Show answer
Correct answer: Provider, deployer, importer, distributor, product manufacturer
The chapter lists these specific value-chain roles to locate where you sit and therefore what obligations apply.

4. What does the chapter mean by saying “documentation” is not prose?

Show answer
Correct answer: Documentation is an evidence index linking claims to artifacts that prove them, optimized for traceability
The chapter defines documentation as an evidence index that ties assertions to proof, written for skeptical, fast auditor-style review.

5. Which sequence best matches the repeatable builder workflow established in Chapter 1?

Show answer
Correct answer: Define intended purpose/boundaries → identify value-chain role → create maintainable obligations map → set up evidence workspace & naming conventions → establish reusable documentation templates
The chapter explicitly lists this five-step workflow in order as the process repeated throughout the lab.

Chapter 2: Classify the AI System by Risk Category

This chapter is your working method for classifying an AI system under the EU AI Act and producing a record you can defend in an audit, procurement review, or internal risk committee. The goal is not only to label a system as prohibited, high-risk, limited-risk, or minimal-risk, but to show how you reached that conclusion: what you assumed, what evidence you checked, where the boundaries are (what is “in scope” vs. “out of scope”), and who must act (provider, deployer, importer, distributor, product manufacturer).

In practice, classification is a sequence of gates. First you screen for prohibited practices and document the rationale (Milestone: prohibited screening). If you pass, you run a high-risk decision tree and record outcomes (Milestone: high-risk decision tree). If not high-risk, you assess whether transparency duties apply (Milestone: limited-risk transparency). Finally, you produce a signed-off risk classification memo that captures the decision, evidence links, and approvals (Milestone: signed memo).

Engineering judgement matters. Many systems are “almost” high-risk because they are used in a high-impact context, integrated into a regulated product, or influence a decision without being the final decision-maker. Common mistakes include: classifying based on the vendor’s marketing name rather than intended purpose; ignoring downstream integration; forgetting that “who is the provider” can shift when a deployer makes a substantial modification; and treating transparency notices as optional UX copy rather than compliance artifacts tied to user instructions and human oversight.

Use the six sections below as a repeatable workflow. Treat your classification record like a technical artifact: versioned, evidence-linked, and signed by accountable roles.

Practice note for Milestone: Screen for prohibited practices and document the rationale: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Milestone: Run the high-risk decision tree and record outcomes: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Milestone: Classify transparency duties for limited-risk systems: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Milestone: Produce a signed-off risk classification memo: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Milestone: Screen for prohibited practices and document the rationale: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Milestone: Run the high-risk decision tree and record outcomes: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Milestone: Classify transparency duties for limited-risk systems: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Milestone: Produce a signed-off risk classification memo: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Sections in this chapter
Section 2.1: Prohibited AI practices screening checklist

Section 2.1: Prohibited AI practices screening checklist

Your first gate is to screen for prohibited practices. This is not a “quick sanity check”; it is a documentable decision. The EU AI Act bans certain uses outright, so your compliance posture starts by proving you are not building, supplying, or deploying one of those uses.

Work from the system’s intended purpose and real deployment context, not from model type. A generic classifier can become prohibited if deployed to manipulate vulnerable groups or to enable covert scoring. Your screening deliverable should be a short rationale with evidence links: product requirements, user stories, screenshots, contracts, and deployment policies.

  • Manipulation/deception causing harm: Does the system use subliminal techniques or deceptive interactions to materially distort behavior in a way likely to cause harm? Check growth experiments, A/B test plans, and persuasive UI patterns.
  • Exploitation of vulnerabilities: Is it targeted at people due to age, disability, or socio-economic situation in a way that is likely to cause harm? Verify target audience definitions and segmentation logic.
  • Social scoring: Are you creating or using generalized “trustworthiness” scores that lead to unjustified or disproportionate treatment? Review feature sets, risk scores, and cross-context reuse.
  • Biometric categorisation and sensitive inference: Are you classifying people by sensitive attributes from biometric data? Confirm what biometric signals exist and whether sensitive categories are inferred.
  • Real-time remote biometric identification in public spaces (law enforcement): If relevant, confirm you are not enabling prohibited identification scenarios.

Milestone: Screen for prohibited practices and document the rationale. Your output is a one-page “Prohibited Practices Screen” in your risk classification record: each prohibited bucket, “Applicable? (Y/N)”, rationale, and evidence links. Common mistake: writing “Not applicable” without specifying the operational boundary (e.g., “No use in public spaces; no law-enforcement customers; contract prohibits use for identification”). Make your boundary explicit.

Section 2.2: High-risk categories and Annex-style triggers (practical mapping)

Section 2.2: High-risk categories and Annex-style triggers (practical mapping)

If the system is not prohibited, you run the high-risk decision tree. High-risk classification commonly comes from two practical routes: (1) the AI is a safety component of a regulated product (or itself a regulated product), or (2) the AI is used in a listed high-impact domain (the “Annex-style” use cases such as employment, education, essential services, law enforcement, migration, or administration of justice).

Start with a mapping worksheet that ties intended purpose → decision influenced → domain → user group → impact. Then evaluate triggers. For example, an AI that screens job applicants, ranks candidates, or predicts performance is typically in the employment domain even if the system is branded as “productivity analytics.” Likewise, an AI that determines eligibility for credit or housing can fall into essential services.

  • Product route: Is the AI embedded in, or does it control, a product subject to EU product safety rules (medical devices, machinery, vehicles, aviation, etc.)? If yes, treat it as potentially high-risk and coordinate with product compliance.
  • Domain route: Does the system make or materially influence decisions in Annex-style high-impact areas (admissions, hiring, termination, creditworthiness, benefits eligibility, border control workflows, court-related decision support)?
  • Decision significance: Is the output used as a gate, ranking, recommendation, or risk score that a human typically follows? Capture how often outputs are overridden and under what conditions.

Milestone: Run the high-risk decision tree and record outcomes. Your record should include: the branch taken, the trigger (domain/product), the exact system function in that context, and evidence (process diagrams, SOPs, integration architecture). Common mistakes: assuming “human in the loop” prevents high-risk classification (it does not), and under-describing influence (e.g., “advisory only” when the UI presents a single recommended action). Write the operational truth.

Practical outcome: a clear yes/no high-risk determination plus the list of compliance obligations you will need next (technical documentation structure, risk management, data governance, logging, transparency, human oversight, accuracy/robustness, post-market monitoring). Even if you are not writing the full high-risk file yet, you should identify the missing evidence now.

Section 2.3: General-purpose AI and downstream integration considerations

Section 2.3: General-purpose AI and downstream integration considerations

Many teams build on general-purpose AI (GPAI) models or offer capabilities that are reused across multiple products. Classification must account for downstream integration: a base model may not itself be deployed into a high-impact decision, but a downstream system might be.

Operationally, treat your system as a chain: data → base model → adapters/fine-tunes → orchestration/prompting → tools → UI → business process. Determine who is the provider at each step. A vendor providing a base model may be the provider of the model, while your organization becomes the provider of the integrated system if you place it on the market or put it into service under your name. If you make a substantial modification (e.g., change intended purpose, materially affect compliance characteristics, or materially change performance in a regulated context), responsibilities can shift.

  • Document integration boundaries: What is supplied by the GPAI vendor vs. what you control (prompt templates, retrieval sources, tools, fine-tuning data, guardrails)?
  • Downstream use restrictions: Are there contractual and technical controls to prevent prohibited or high-risk deployments you do not support?
  • Evidence inheritance: What evidence can you rely on from the GPAI provider (model card, safety evaluations, known limitations) and what must you re-validate in your context (task performance, bias, robustness, security)?

A common mistake is to classify only the “model” and ignore the “system.” Under the EU AI Act approach, the relevant object is typically the AI system as deployed for an intended purpose. Your risk classification record should therefore include a downstream-use statement: “This classification applies to deployment in X process for Y users; if integrated into Z domain (e.g., hiring), classification must be re-run.” This is essential for teams who ship platforms and APIs.

Section 2.4: Limited-risk transparency obligations and user-facing notices

Section 2.4: Limited-risk transparency obligations and user-facing notices

If the system is not prohibited and not high-risk, it may still have transparency duties. Limited-risk obligations are frequently triggered by how the system interacts with people: users must be informed they are interacting with AI in certain contexts, synthetic content may need labeling, and outputs that could be mistaken for authentic content may require disclosures.

Milestone: Classify transparency duties for limited-risk systems. Do this as a set of concrete UX and documentation requirements, not abstract legal notes. Start by listing every user interaction surface: chat UI, email generation, call center scripts, image/video generation, voice output, and API responses. Then specify what notice appears, to whom, when, and how it is logged.

  • Human-facing interaction notice: If users might reasonably believe they are dealing with a human, provide a clear AI interaction disclosure at the start and in persistent UI areas.
  • Synthetic content disclosure: If you generate or materially manipulate audio/video/image/text that could mislead, add a label/watermark/metadata approach plus user instructions on permitted use.
  • Decision-support clarity: If the system provides recommendations, ensure the UI and user instructions describe limitations, confidence indicators (if used), and appropriate verification steps.

Common mistakes: burying notices in Terms of Service; using vague language (“powered by AI”) without stating what the system does; and failing to align notices with actual behavior (e.g., the system drafts a denial message that looks final, while policy says it’s only a draft). Practical outcome: a “Transparency Obligations Table” in your record: trigger, notice text, placement, owners, and evidence (mockups, screenshots, localization plan).

Section 2.5: Minimal-risk good practices and voluntary documentation

Section 2.5: Minimal-risk good practices and voluntary documentation

Minimal-risk systems still benefit from disciplined documentation because classification can change as scope expands. The best teams treat minimal-risk as “low regulatory burden,” not “no governance.” Your goal is to keep a lightweight, evidence-ready package that can scale if the system moves into a high-impact domain.

Adopt a voluntary documentation set aligned with EU AI Act-style technical documentation structure, but sized to your system. A practical minimal set includes: intended purpose statement, system architecture diagram, data sources and licenses, evaluation summary, known limitations, user instructions, and an incident/reporting pathway. This also supports procurement, security reviews, and customer trust.

  • Intended purpose & boundaries: What the system is for, what it is not for, and prohibited downstream uses.
  • Data governance snapshot: Data origin, consent/rights basis where relevant, quality checks, and retention.
  • Evaluation summary: Basic accuracy/quality metrics, red-team findings, and failure modes relevant to users.
  • Change log: Versioning of model, prompts, retrieval corpora, and guardrails.

Common mistake: skipping evaluation because “it’s only internal.” Internal deployments can still cause harm or become high-risk if used for employment decisions, access control, or customer eligibility. Practical outcome: a minimal-risk dossier you can reuse when you later draft full technical documentation and an evidence index.

Section 2.6: Handling uncertainty: assumptions, edge cases, and escalation

Section 2.6: Handling uncertainty: assumptions, edge cases, and escalation

Classification is rarely binary on the first pass. You will face ambiguity: mixed-use platforms, customers who can configure workflows, and systems that sit adjacent to regulated decisions. The correct response is not to guess; it is to document uncertainty, define assumptions, and escalate to the right governance forum.

Build your risk classification record as a set of testable statements. Example: “Assumption A1: Outputs are not used to make final hiring decisions; they are used only to draft interview questions.” Then define how A1 is enforced (permissions, product UI, contracts, training, monitoring). If you cannot enforce an assumption, it is not an assumption—it is a risk.

  • Edge-case inventory: List plausible misuses and adjacent deployments (e.g., customer uses your sentiment tool to screen employees). Mark which are prevented, detected, or merely discouraged.
  • Role mapping: Identify whether you are acting as provider, deployer, importer, distributor, or product manufacturer for the specific deployment. Role determines obligations and who signs off.
  • Escalation path: Define when Legal/Compliance must review (e.g., any employment/education/benefits use, any biometric element, any safety-related integration), and when security and data protection reviews are mandatory.

Milestone: Produce a signed-off risk classification memo. This memo should include: final risk category, rationale, evidence links, assumptions and enforcement controls, open questions, and named approvers (product owner, engineering lead, compliance/legal, and—where relevant—deploying business owner). Common mistake: treating the memo as static. Re-run classification upon major changes: new markets, new customer segments, new integrations, retraining/fine-tuning, or expanded intended purpose.

Chapter milestones
  • Milestone: Screen for prohibited practices and document the rationale
  • Milestone: Run the high-risk decision tree and record outcomes
  • Milestone: Classify transparency duties for limited-risk systems
  • Milestone: Produce a signed-off risk classification memo
Chapter quiz

1. Which sequence best matches the chapter’s recommended “gates” for classifying an AI system under the EU AI Act?

Show answer
Correct answer: Screen for prohibited practices → run high-risk decision tree → assess transparency duties (if not high-risk) → produce signed-off risk classification memo
The workflow is explicitly described as a sequence of gates in that order, ending with a signed memo.

2. Beyond assigning a risk label (prohibited/high/limited/minimal), what must the classification record demonstrate to be defensible in an audit or review?

Show answer
Correct answer: How the conclusion was reached, including assumptions, checked evidence, in-scope vs. out-of-scope boundaries, and who must act
The chapter emphasizes documenting the rationale: assumptions, evidence, scope boundaries, and accountable actors.

3. Which situation is highlighted as a reason a system can be “almost” high-risk and requires careful engineering judgment?

Show answer
Correct answer: It is used in a high-impact context or influences decisions even if it is not the final decision-maker
The chapter notes borderline cases: high-impact use, regulated product integration, or influencing decisions without being final.

4. Which is identified as a common mistake when classifying an AI system’s risk category?

Show answer
Correct answer: Classifying based on the vendor’s marketing name rather than intended purpose
The chapter lists mistakes such as relying on marketing labels instead of intended purpose.

5. How does the chapter advise treating transparency notices for limited-risk systems?

Show answer
Correct answer: As compliance artifacts tied to user instructions and human oversight, not optional UX copy
It warns against treating transparency notices as optional; they must connect to instructions and oversight.

Chapter 3: Build the Compliance Plan and Control Framework

In Chapters 1–2 you classified the use case and drafted the outline of technical documentation. This chapter turns that classification into an execution plan: a control framework that tells you what you must do, who must do it, when it must happen in the lifecycle, and what evidence proves it happened.

The EU AI Act is obligation-heavy by design, and most teams fail not because they disagree with the obligations, but because they never translate them into an engineerable workflow. The goal here is to convert obligations into a control checklist with owners, define a minimal but credible quality management and change control workflow, create a traceability matrix from requirements to evidence, draft a gap analysis and remediation plan, and finally prepare an internal review packet for sign-off.

Think of your compliance plan as a “control plane” over your product lifecycle: requirements flow into controls; controls flow into procedures; procedures generate evidence; evidence supports technical documentation and sign-off. If you build that pipeline early, audits become a retrieval task instead of a fire drill.

  • Output of this chapter: a role-mapped control checklist, a lightweight AI QMS workflow, a risk management routine, a change-control process, supplier controls, and an evidence register with approvals.
  • Common failure mode: writing policy text without owners, triggers, versioning, or evidence links.

Use the sections below as building blocks. Each section ends with practical outcomes you can lift directly into your lab deliverables.

Practice note for Milestone: Convert obligations into a control checklist with owners: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Milestone: Define your quality management and change control workflow: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Milestone: Create a traceability matrix from requirements to evidence: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Milestone: Draft a gap analysis and remediation plan: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Milestone: Prepare an internal review packet for sign-off: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Milestone: Convert obligations into a control checklist with owners: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Milestone: Define your quality management and change control workflow: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Milestone: Create a traceability matrix from requirements to evidence: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Milestone: Draft a gap analysis and remediation plan: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Sections in this chapter
Section 3.1: Obligations-by-role: provider vs deployer responsibilities

The EU AI Act responsibilities change sharply depending on your role. Start your compliance plan by mapping the real-world actors to statutory roles: provider (develops or places on the market), deployer (uses under its authority), and sometimes importer, distributor, or product manufacturer. Do not assume “we are the provider” just because you built a model; if you integrate a third-party model into a product and place the system on the market under your name, you may still be the provider of the system.

Convert this role mapping into a control checklist with owners. A practical way is to create a table where each obligation (risk management, data governance, technical documentation, logging, transparency, human oversight, post-market monitoring) is a row, and each role is a column. Then assign a single accountable owner (a person or team) for each control in your organization, even if multiple teams contribute.

  • Provider-heavy controls: design-time risk management, training/validation data governance, model evaluation, technical documentation, conformity assessment preparation, and instructions for use.
  • Deployer-heavy controls: operational monitoring, ensuring input data matches intended purpose, applying instructions, maintaining human oversight, incident reporting pathways, and keeping logs when required.
  • Shared controls: transparency notices, user training, access control, and change management—especially when the deployer can configure prompts, thresholds, or workflows.

Engineering judgement matters when a control seems “shared.” Example: if your customer (deployer) can change prompts or decision thresholds, you as provider should specify allowed configuration ranges and test boundaries; the deployer should document the actual chosen configuration and operational checks. A common mistake is leaving configurability ungoverned and later discovering the deployer’s configuration invalidates your performance claims.

Practical outcome: produce a one-page role map plus an obligations-by-role checklist with an owner, a frequency/trigger (e.g., “before release,” “each model update,” “quarterly”), and the evidence artifact (link placeholder) for each row. This becomes the spine of your compliance plan and feeds later sections on traceability and internal sign-off.

Section 3.2: Quality management system essentials for AI (practical minimal set)

A Quality Management System (QMS) can sound heavyweight, but a minimal, credible set is enough if it is specific, used, and evidenced. For AI, the QMS must cover not only software quality but also data and model lifecycle. Your aim is to define a workflow that turns obligations into repeatable practice—then make it easy to follow.

Start with five “minimum viable QMS” procedures, each with a template and a routing rule:

  • Document control: where policies, specs, evaluations, and instructions live; how they are versioned; who can approve; and how superseded versions are archived.
  • Requirements management: how you capture EU AI Act requirements and internal product requirements; how changes are proposed and accepted.
  • Design & development: model/dev standards, data handling rules, evaluation gates, and release criteria (including safety and bias checks where relevant).
  • Nonconformity & CAPA: how issues (failures, incidents, audit findings) are recorded, triaged, corrected, and prevented from recurring.
  • Supplier management: onboarding checks, contractual clauses, monitoring, and exit/contingency plans for critical third-party components.

Integrate change control as the operational heartbeat of the QMS (expanded in Section 3.4). The best practice is to define “gates” aligned with your SDLC/ML lifecycle: design review, pre-release evaluation, release approval, and post-release monitoring review. These gates generate the evidence you will later index in the technical documentation.

Common mistake: writing a QMS that mirrors generic ISO language without AI specifics. For instance, a software-only change procedure won’t capture model retraining, dataset refreshes, prompt template updates, or evaluation drift. Another mistake: having no explicit owner for QMS artifacts; if “the company” owns it, no one does.

Practical outcome: define a minimal QMS workflow diagram (even a simple box-and-arrow) and a RACI for the five procedures. Then draft your first “internal review packet” outline: what documents are required at release gate (risk report, evaluation report, instructions, logging plan, evidence index) and who signs each.

Section 3.3: Risk management process: hazards, harms, likelihood, severity

Risk management is where compliance becomes product thinking. The EU AI Act expects a continuous risk management process, not a one-time checklist. Your job is to define a routine that is specific to your intended purpose and operating context, and that produces traceable outputs.

Use a structured chain: hazard → hazardous situation → harm → affected stakeholder. A hazard is a source of potential harm (e.g., hallucinated medical advice, biased scoring, data leakage). The hazardous situation is how the hazard manifests in context (e.g., a user treats a generated answer as clinical instruction). The harm is the consequence (injury, discrimination, financial loss, rights infringement).

Score each risk using likelihood and severity, but define what those terms mean in your domain. Teams often copy a 1–5 matrix without calibrating it. Calibrate with examples: “Severity 5 = irreversible harm or major rights impact,” “Likelihood 5 = expected weekly in typical usage.” If you cannot justify a score, treat it as unknown and plan data collection.

  • Risk controls (mitigations): data filtering, prompt constraints, confidence thresholds, abstention, policy checks, human review, UI friction, access control, logging, and user training.
  • Verification: tests and evaluations that show the control works (red teaming, bias evaluation, robustness checks, privacy tests, simulation).
  • Residual risk acceptance: who can accept, under what rationale, and what monitoring is required.

This is where you create a traceability matrix: each identified risk links to (1) a control, (2) an implementation artifact (design doc, code module, configuration), and (3) an evaluation artifact (test plan, results). Do not let mitigations live only in narrative form—make them a set of testable requirements.

Common mistake: listing only model-centric hazards. Many real harms come from workflow: poorly designed UI, unclear instructions, or incentives that push users to misuse the system. Another mistake: not updating the risk register after changes or incidents. Risk management must be tied to change control and post-market signals.

Practical outcome: establish a risk register template with columns for hazard chain, scores, mitigations, verification evidence links, residual risk, and monitoring signals. Then add a monthly/quarterly review cadence and triggers (new release, supplier change, incident, drift).

Section 3.4: Change management: model updates, prompts, data shifts, rollbacks

AI systems change in more ways than traditional software. Your change management process must cover: model version changes, prompt template changes, retrieval corpus updates, feature engineering changes, data pipeline modifications, threshold and routing changes, and even user instruction updates. Treat each as a potentially material change to performance and risk.

Define a change taxonomy with three tiers:

  • Minor change: no impact expected on intended purpose, performance claims, or risk controls (e.g., typo fixes in UI text). Requires lightweight review and evidence update.
  • Material change: plausible impact on performance, bias, robustness, transparency, or oversight (e.g., prompt changes that alter refusal behavior; new retrieval dataset). Requires re-evaluation against release criteria.
  • Major change: changes intended purpose, target users, or operating environment; introduces new capabilities or decision logic. Requires full risk reassessment and updated technical documentation package.

Then create a change control workflow: request → impact assessment → required tests → approvals → deployment → monitoring → closure. The impact assessment should explicitly ask: Does this change alter the intended purpose? Does it change input data assumptions? Does it affect any documented limitations? Does it require updated instructions for use?

Rollbacks are part of compliance. If you cannot quickly roll back a model or prompt configuration, you cannot credibly claim ongoing risk control. Maintain a rollback plan with: last-known-good version, deployment toggles, data migration considerations, and a communication plan to deployers/users when behavior changes.

Common mistakes include: updating prompts in production “quietly,” not versioning retrieval data, and not rerunning evaluations when the environment shifts (e.g., new user population, new language, seasonal data). Another frequent gap is failing to connect change tickets to evidence updates—your technical documentation index becomes stale.

Practical outcome: implement change tickets that automatically require links to the risk register items affected and the evidence artifacts produced. Add explicit sign-off gates for material/major changes, and define monitoring “watch metrics” to confirm the change behaves as expected post-release (error rates, drift indicators, complaint types).

Section 3.5: Supplier and third-party component controls (APIs, models, data)

Most AI systems are composites: a foundation model API, open-source libraries, hosted vector databases, labeling vendors, or external datasets. Under the EU AI Act, you cannot outsource accountability; you must manage supplier risk. Your control framework should identify critical suppliers—components whose failure could cause harm, compliance failure, or inability to provide evidence.

Start by building a supplier inventory with: component name, purpose, data flows, where processing occurs, versioning method, SLAs, and substitution options. Then apply tiered controls:

  • Due diligence: review supplier documentation (model cards, security whitepapers, data provenance statements), known limitations, and update policies.
  • Contractual controls: incident notification timelines, change notification, access to logs/metrics needed for monitoring, and audit/support rights where feasible.
  • Technical controls: wrapper layers that enforce your policies (rate limits, content filters, PII redaction), and “circuit breakers” if supplier behavior degrades.
  • Evaluation controls: baseline tests when onboarding; regression tests when supplier versions change; and periodic re-validation.

Engineering judgement is required when the supplier is a black box. If you cannot see training data or internal evaluations, compensate with stronger external testing, tighter constraints on use, narrower intended purpose, and more robust monitoring. Document these compensating controls explicitly; auditors are looking for reasoned decisions, not perfect visibility.

Common mistake: treating third-party model updates as “their problem.” If the provider changes behavior, your system’s risk profile can change overnight. Another mistake is failing to record which supplier version was in use for a given decision—without that, incident investigation and CAPA become guesswork.

Practical outcome: add supplier controls to your control checklist with clear owners (e.g., Vendor Manager + ML Lead), define a supplier change trigger that feeds your change management workflow, and create a standard onboarding packet that produces evidence (due diligence notes, test results, approved use constraints).

Section 3.6: Evidence register: what to collect, how to version, how to approve

Compliance lives or dies by evidence. An evidence register is not a folder of PDFs; it is a curated index that maps each obligation and each control to a versioned artifact and an approval record. This section is where you operationalize “evidence-ready” technical documentation.

Define your evidence register with three layers:

  • Requirement layer: EU AI Act requirement IDs (or your internal requirement IDs) and the system scope assumptions.
  • Control layer: the control that satisfies the requirement, with owner, frequency, and trigger.
  • Artifact layer: the actual evidence: risk register entries, evaluation reports, dataset datasheets, model cards, logging specifications, human oversight procedures, user instructions, training records, incident/CAPA records, and release approvals.

Versioning rules should be explicit. At minimum: every artifact has a unique ID, semantic version, date, owner, and status (draft/in review/approved/retired). For artifacts tied to releases, also record the product/model version and deployment environment. Store immutable snapshots for approved versions; do not rely on “latest” links.

Approvals should mirror your QMS gates. Define what requires sign-off (e.g., risk report, evaluation results, instructions for use, post-release monitoring plan), who signs (product owner, ML lead, compliance/legal, security/privacy where applicable), and what constitutes acceptance. This is how you prepare an internal review packet for sign-off: a consistent bundle of artifacts, each with its evidence register entry, ready to approve.

Now perform a gap analysis and remediation plan. Compare your current artifacts against the evidence register and mark each item as: available, incomplete, missing, or not applicable (with justification). For each gap, assign an owner, a remediation task, a due date, and the test/evidence that will close it. Common mistakes include calling items “not applicable” without a scope rationale, and “closing” gaps with plans instead of results.

Practical outcome: create your evidence register spreadsheet (or GRC tool equivalent) and populate it with at least one complete end-to-end trace: requirement → control → artifact → approval. When you can trace a single obligation all the way to approved evidence, you have a working compliance machine—not just documents.

Chapter milestones
  • Milestone: Convert obligations into a control checklist with owners
  • Milestone: Define your quality management and change control workflow
  • Milestone: Create a traceability matrix from requirements to evidence
  • Milestone: Draft a gap analysis and remediation plan
  • Milestone: Prepare an internal review packet for sign-off
Chapter quiz

1. What is the primary purpose of Chapter 3 in the course workflow?

Show answer
Correct answer: Turn the AI Act classification into an execution plan that defines controls, owners, timing, and evidence
Chapter 3 focuses on translating obligations into an engineerable control framework with owners, lifecycle triggers, and evidence.

2. Which sequence best represents the “control plane” pipeline described in the chapter?

Show answer
Correct answer: Requirements → controls → procedures → evidence → technical documentation and sign-off
The chapter frames compliance as a flow: requirements into controls, controls into procedures, procedures produce evidence that supports documentation and sign-off.

3. Why do teams commonly fail at EU AI Act compliance, according to the chapter?

Show answer
Correct answer: They translate obligations into policy text but not into owned, triggered, versioned controls with evidence
The common failure mode is creating policy without owners, triggers, versioning, or evidence links—i.e., not an engineerable workflow.

4. What is the role of a traceability matrix in the Chapter 3 framework?

Show answer
Correct answer: It maps requirements to the evidence proving controls were executed
The traceability matrix connects requirements to evidence, making it possible to demonstrate compliance and support sign-off.

5. What outcome best matches the chapter’s goal of making audits “a retrieval task instead of a fire drill”?

Show answer
Correct answer: Maintaining an evidence register with approvals produced by defined procedures and controls
If procedures consistently generate and approve evidence linked to controls, audits become about retrieving existing artifacts rather than scrambling to create them.

Chapter 4: Draft the Technical Documentation (Provider-Style)

This chapter turns your classification work into provider-grade technical documentation: a technical file that can survive scrutiny by a notified body, a regulator, and your own incident response team months later. The goal is not to produce “pretty” narrative text; the goal is to produce an evidence-ready record that explains what you built, why it is allowed, what it is intended to do, how it can fail, and how you will detect and control those failures. Think of the technical documentation as a map: it lets an independent reader trace from an AI Act obligation to a concrete control, then to an artifact (policy, dataset lineage, evaluation report, logging configuration), and finally to a named owner and a date.

Provider-style documentation is different from a generic product spec. It must be bounded: clear intended purpose, defined target users, known operating environment, and explicit “out of scope” use. It must be reproducible: dataset versions, model versions, configuration, and test procedures must be identifiable. It must be auditable: evidence links, decision records, and change history. In this chapter you will complete five practical milestones: write the system description and intended purpose; document data governance and dataset lineage; capture model development, evaluation, and performance evidence; produce the documentation index with cross-references; and run a completeness check against your control checklist.

The engineering judgment here is in choosing the right level of detail. Too little detail makes the file un-auditable. Too much detail (like dumping raw training data or internal secrets) creates security and privacy risk and becomes unmaintainable. The safe middle is: document what a competent third party needs to assess compliance and safety, and provide controlled references to sensitive artifacts (with access controls) rather than embedding them directly.

As you draft, keep one discipline: every claim needs an artifact. If you write “the model is robust,” point to robustness tests and acceptance criteria. If you write “human oversight is provided,” point to UI designs, training materials, and escalation procedures. The rest of this chapter gives you a structured template and the common mistakes to avoid.

Practice note for Milestone: Write the system description and intended purpose section: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Milestone: Document data governance and dataset lineage: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Milestone: Capture model development, evaluation, and performance evidence: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Milestone: Produce the technical documentation index and cross-references: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Milestone: Run a completeness check against your control checklist: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Milestone: Write the system description and intended purpose section: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Milestone: Document data governance and dataset lineage: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Sections in this chapter
Section 4.1: Technical file structure: what goes in, what stays out

The technical file should read like a controlled dossier, not a wiki. Start with an index that mirrors your control checklist: system description and intended purpose, role mapping, risk classification record, data governance, model lifecycle evidence, human oversight and user instructions, robustness/cybersecurity, logging/traceability, and post-market monitoring hooks. Treat this as the milestone where you “produce the technical documentation index and cross-references.” Build it as a table with: document ID, title, version, owner, storage location, and which AI Act requirement(s) it supports.

What goes in: concise descriptions, process summaries, acceptance criteria, test results, and decision records. Include diagrams (architecture, data lineage, model pipeline), configuration manifests (model version, dependencies), and references to controlled repositories (e.g., an internal GRC tool, ticketing system, or model registry). What stays out: raw personal data, full training corpora, secrets (API keys), exploit details, and anything that would materially increase security risk if leaked. Instead, include hashes, dataset IDs, and access-controlled links.

Use a two-layer pattern: (1) a narrative “front matter” that makes the intended purpose and boundaries unmissable, and (2) an evidence layer with artifacts. For the milestone “write the system description and intended purpose,” keep the main text crisp: intended purpose, target users, deployment setting, automation level, and prohibited uses. Common mistake: describing capabilities instead of intent (e.g., “can rank applicants” without stating whether it is intended to rank applicants). Regulators care about intended purpose, not what the model could be repurposed to do.

Practical outcome: by the end of this section you should have a document tree and an index that can be handed to an auditor, where each folder and file has a reason to exist. If a file doesn’t support a requirement, an assumption, or a risk control, it likely belongs in product documentation—not the technical file.

Section 4.2: System architecture, components, and interfaces documentation

Architecture documentation is where you make the system legible. Start with a component diagram that distinguishes: user interface (or API), orchestration layer, model serving, data stores, feature pipelines, monitoring/logging, and human oversight touchpoints. Then add interface contracts: what data enters, what outputs leave, and what guardrails sit on those boundaries (validation, rate limiting, content filtering, policy checks). This section supports the milestone “write the system description and intended purpose” because architecture must align to what you claim the system is for and how it is used.

Document each component with four fields: purpose, inputs/outputs, failure modes, and owners. For example, an “ingestion service” might accept CSV uploads; failure modes include schema drift and missing consent metadata; owner is the data engineering team. Include deployment topology (cloud region, on-prem nodes, mobile edge) because it affects privacy, cybersecurity, and logging retention. If you rely on third-party models or APIs, document them as dependencies with versioning and contractual constraints; don’t hide them behind “vendor service.”

Interfaces are where most compliance gaps occur. Typical mistakes include: not specifying whether prompts, uploaded documents, or telemetry are used for retraining; not documenting where personal data is stored; and not describing human-in-the-loop decisions (who can override, when, and how). Be explicit about the automation boundary: what the system recommends vs. what it decides. If the deployer can configure thresholds or policies, list every configurable parameter that changes risk (e.g., decision threshold, class labels, confidence cutoffs, escalation routes) and how changes are governed.

Practical outcome: a reviewer should be able to trace from an input (e.g., a user record) through transformations to a model output and onward to a human decision, with clear control points. If you cannot trace the path, you cannot credibly claim traceability, oversight, or data governance later.

Section 4.3: Data governance: sourcing, labeling, representativeness, privacy

This section is your data story: where data came from, what it contains, why it is appropriate, and how it is controlled. It directly covers the milestone “document data governance and dataset lineage.” Start with dataset lineage tables: dataset ID, source, collection window, legal basis/permissions, preprocessing steps, labeling method, quality checks, and retention rules. Include a diagram showing how raw sources become training/validation/test sets, and how each split is versioned and frozen.

Sourcing: document whether data is first-party, customer-provided, scraped, purchased, or synthetic. For each source, record the usage rights and constraints (e.g., “no model training,” “only internal testing,” “delete after 30 days”). Labeling: explain the label ontology, instructions, rater qualifications, inter-rater agreement metrics, and adjudication process. Common mistake: treating labels as ground truth without documenting ambiguity. If labels are subjective (toxicity, suitability, risk), write down how disagreement is resolved and how uncertainty is represented.

Representativeness: define the target population and compare it to your dataset. You do not need perfect representativeness, but you must show awareness of gaps and mitigations. Provide slice definitions relevant to your intended purpose (e.g., language, region, device type, job category) and evidence that you checked performance and error rates across slices. Privacy: record data minimization decisions, PII handling, de-identification/pseudonymization steps, access controls, and DPIA/legitimate interest assessments where applicable. If personal data is used, describe how you prevent re-identification and how you honor deletion requests.

Practical outcome: a reader should be able to answer “What data trained this model?” without guessing, and also answer “Should this data have been used?” without hunting for legal or policy artifacts.

Section 4.4: Model lifecycle documentation: training, tuning, testing, validation

Model lifecycle documentation is where you “capture model development, evaluation, and performance evidence.” Treat the lifecycle as a controlled pipeline: design → training → tuning → validation → release → change management. For each stage, document inputs, outputs, gates, and sign-offs. A practical format is a release record per model version: training code commit, environment (libraries, hardware), dataset versions, hyperparameters, evaluation suite version, and approval ticket.

Training and tuning: specify model type, objective function, feature set or prompting strategy, and any constraints (fairness regularization, calibration, safety filters). Record why key decisions were made; auditors look for reasoned trade-offs, not just numbers. Testing and validation: define metrics aligned to the intended purpose (accuracy, F1, calibration error, false positive/negative costs) and include confidence intervals or repeated runs where variability matters. Add decision thresholds and how they were chosen (e.g., cost-based optimization, policy-based minimum recall). If you use foundation models, document adaptation method (prompting, RAG, fine-tuning), safety evaluations, and limitations.

Include negative testing: adversarial prompts, out-of-distribution inputs, and stress tests. Common mistakes: evaluating only on a “clean” test set; not freezing the test set; or reporting aggregate metrics while hiding failure clusters. Also document human oversight integration tests: does the UI present uncertainty, reasons, and escalation options? If humans can override, is it logged and fed into monitoring (without silently becoming training data)?

Practical outcome: you can reproduce results for a given model version and demonstrate that release decisions were gated by pre-defined acceptance criteria rather than informal approval.

Section 4.5: Robustness, cybersecurity, and failure mode documentation

This section translates “the system is safe” into concrete failure modes and defenses. Start with a failure mode catalog (often an FMEA-style table): failure mode, cause, effect, severity, likelihood, detectability, and mitigations. Tie mitigations to architecture controls (input validation, sandboxing), model controls (confidence thresholds, refusal behavior), and process controls (incident response, patching SLAs). If the AI system supports a high-stakes workflow, include fallbacks: manual processing, graceful degradation, or safe defaults.

Robustness evidence should include: distribution shift checks, perturbation tests (noise, missing fields, formatting changes), and resilience to data quality issues. Cybersecurity should cover: threat model, attack surfaces, dependency management, access control, secrets handling, and vulnerability monitoring. For ML-specific threats, document mitigations for prompt injection, data poisoning, model inversion, membership inference, and supply-chain compromise (malicious model artifacts). Common mistake: listing security policies without linking to system-specific controls and test results.

Also document “known limitations” plainly. A provider-style file should not oversell. If performance degrades in certain languages, or the model is not suitable for certain user groups or contexts, state it and route it into user instructions and deployer guidance. This section should connect to post-market monitoring by specifying what signals indicate degradation and what triggers a rollback or re-validation.

Practical outcome: you can show that failures were anticipated, tested where feasible, and bounded with controls—reducing both regulatory and operational risk.

Section 4.6: Logging, traceability, and record-keeping requirements

Logging is the backbone of traceability, incident investigation, and regulatory defensibility. Define what events are logged, at what granularity, and for how long—then align it to privacy and minimization. At minimum, log: model/version identifier, configuration parameters that affect outputs, timestamp, input metadata (not necessarily full content), output(s), confidence/uncertainty, user action (accept/override/escalate), and system warnings or safety filter decisions. If you cannot log raw inputs due to sensitivity, log hashed fingerprints, feature summaries, or redacted excerpts with a reproducible redaction method.

Traceability means you can reconstruct “why did the system behave this way?” Create an end-to-end trace ID that links UI/API requests to downstream model calls, retrieval results (for RAG), and final responses. Maintain a record of changes: dataset updates, threshold changes, model swaps, prompt template changes, and policy/config updates. This is where you “run a completeness check against your control checklist”: verify that every required record is actually captured by your telemetry pipeline and that retention and access controls are enforced.

Common mistakes include: logging too much sensitive data; logging too little to debug; missing version identifiers; and having logs that exist but are not searchable or exportable for audits. Define operational playbooks: how to retrieve logs for an incident, who can access them, how you handle deletion requests, and how you produce an audit package.

Practical outcome: your technical file can point to specific log schemas and dashboards as evidence, and your organization can investigate complaints or anomalies without guesswork.

Chapter milestones
  • Milestone: Write the system description and intended purpose section
  • Milestone: Document data governance and dataset lineage
  • Milestone: Capture model development, evaluation, and performance evidence
  • Milestone: Produce the technical documentation index and cross-references
  • Milestone: Run a completeness check against your control checklist
Chapter quiz

1. What is the primary goal of provider-style technical documentation in this chapter?

Show answer
Correct answer: Create an evidence-ready record that explains what was built, why it is allowed, intended use, failure modes, and controls
The chapter stresses an evidence-ready technical file that can withstand scrutiny and documents intent, risks, and controls.

2. Which set of characteristics best distinguishes provider-style documentation from a generic product specification?

Show answer
Correct answer: Bounded, reproducible, and auditable (with evidence links and change history)
Provider-style docs must define scope, enable reproduction (versions/tests), and support audits (evidence/decision records).

3. When deciding the level of detail to include, what “safe middle” does the chapter recommend?

Show answer
Correct answer: Include what a competent third party needs, and reference sensitive artifacts with controlled access rather than embedding them
Too little detail is unauditable; too much creates risk—so document necessary detail and link to sensitive materials securely.

4. What does the chapter mean by the discipline “every claim needs an artifact”?

Show answer
Correct answer: Each assertion (e.g., robustness or human oversight) should point to concrete evidence like tests, criteria, UI designs, or procedures
The chapter requires traceable evidence for claims, such as robustness tests or oversight training and escalation procedures.

5. Which activity best reflects the “map” function of the technical documentation described in the chapter?

Show answer
Correct answer: Tracing from an AI Act obligation to a control, to an artifact, then to a named owner and date
The documentation should let an independent reader follow obligations to controls to evidence, including ownership and timing.

Chapter 5: Human Oversight, Transparency, and User Instructions

This chapter turns your classification work into operational reality. Under the EU AI Act, it is not enough to label a system “high-risk” (or “limited-risk”) and produce a tidy technical file. You must show that people can supervise the system, understand what it is doing, intervene effectively, and use it within safe boundaries. In practice, this means designing intervention points (human oversight measures), drafting instructions that constrain use to the intended purpose, producing transparency notices, and validating that real operators can follow the guidance.

A useful mindset is: oversight, transparency, and instructions are not “documentation tasks.” They are control mechanisms. They help prevent foreseeable misuse, reduce over-reliance on model outputs, and make responsibility assignment credible across provider and deployer roles. Throughout this chapter you will draft artifacts that can be handed off to a deployer as an evidence-ready pack: an oversight plan, user instructions, disclosure artifacts, usability validation notes, and a deployment readiness checklist with clear go/no-go criteria.

The most common failure mode is writing generic policies (“a human will review outputs”) without specifying who, when, with what information, how quickly, and with what authority to override or stop the system. The second failure is writing user instructions that are correct in theory but unusable in the field: too long, too ambiguous, or missing the exact steps operators need at the moment of decision.

Practice note for Milestone: Specify human oversight measures and intervention points: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Milestone: Draft user instructions and operational constraints: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Milestone: Create transparency notices and disclosure artifacts: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Milestone: Validate usability: can operators follow the instructions?: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Milestone: Finalize a deployer handoff pack: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Milestone: Specify human oversight measures and intervention points: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Milestone: Draft user instructions and operational constraints: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Milestone: Create transparency notices and disclosure artifacts: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Milestone: Validate usability: can operators follow the instructions?: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Sections in this chapter
Section 5.1: Human-in-the-loop design patterns and anti-automation bias

Human oversight is an engineered workflow, not a slogan. Start by mapping the end-to-end decision chain: input capture → model inference → output presentation → operator action → downstream effect. Then mark intervention points where a human can (a) prevent harmful use, (b) detect model failure, or (c) limit impact. Your milestone in this section is to specify human oversight measures and intervention points with enough detail that another team could implement them.

Common human-in-the-loop (HITL) patterns include: pre-approval (model suggests, human must approve before any action), exception review (human reviews only high-risk/confidently negative/low-confidence cases), dual control (two-person sign-off for sensitive actions), and circuit breaker (operators can pause the system, revert to a safe baseline, or switch to manual processing). Select patterns based on severity, reversibility, and time-to-harm. If an outcome is hard to reverse (e.g., denial of a service, reporting to authorities), use pre-approval or dual control; if time-critical but reversible, exception review may be acceptable.

Design explicitly against automation bias: people tend to over-trust confident outputs, especially under time pressure. Mitigations include: showing uncertainty and key contributing factors (where appropriate), presenting alternatives (top-2 options), requiring an operator rationale for high-impact decisions, and forcing “active confirmation” rather than one-click acceptance. Avoid UI patterns that nudge acceptance, such as defaulting to “approve,” hiding dissenting evidence, or using authoritative language (“the system determined…”). A practical control is to log acceptance rates and investigate teams with unusually high “accept-without-edit” behavior.

Document oversight in a short “Oversight Control Sheet”: roles (operator, supervisor), required checks, escalation thresholds, maximum time to intervene, tools available (audit view, data provenance, explanation view), and stop conditions (e.g., drift alerts, anomaly spikes, policy changes). Pair it with a simple RACI for override authority—who can block a decision, who can pause deployment, and who owns incident response.

Section 5.2: Operator competence, training, and access control documentation

Oversight only works when the people doing it are competent and empowered. Treat operator competence as a safety requirement: define what an operator must know, how they prove it, and how access is restricted if they are untrained. This section’s practical output is a training-and-access appendix suitable for technical documentation and for a deployer handoff pack.

Start with a role taxonomy: operator (uses outputs), supervisor (reviews escalations), admin (configures thresholds/workflows), and auditor (reads logs). For each role, specify: prerequisites (domain knowledge, legal constraints), training modules (system overview, limitations, bias/failure modes, data handling, escalation process), and competency checks (scenario-based assessment, minimum passing score, periodic re-certification). Include training frequency triggers: model updates, policy updates, or detected performance drift.

Access control should reflect the risk profile. Use least-privilege principles: operators should not be able to change thresholds that determine escalation; admins should have change control; auditors should have read-only access. Document authentication and authorization mechanisms (SSO, MFA, role-based access control), and log what matters: who accessed the system, what inputs were processed, what output was shown, what action was taken, and whether an override occurred. If your system supports “shadow mode” or limited pilots, document how access is segmented and how test outputs are prevented from affecting real decisions.

Common mistakes: “training available” without proof of completion; sharing admin credentials; and failing to train for edge cases (e.g., missing data, conflicting evidence, adversarial prompts). A practical check is to run a tabletop exercise: give operators five borderline cases and observe whether they follow escalation rules, identify limitations, and avoid over-reliance.

Section 5.3: User instructions: intended use, limitations, and known risks

User instructions are where you lock the system to its intended purpose. Your milestone here is to draft user instructions and operational constraints that a deployer can actually enforce. Write them as if they will be used on a busy day by someone who did not build the model.

Begin with an “Intended Use” block: domain, user group, supported decisions, and where the system may be used (channels, jurisdictions, languages). Then add “Not Intended For” items that prevent foreseeable misuse (e.g., “not for autonomous decisions,” “not for diagnosing medical conditions,” “not for selecting candidates without human review”). Follow with “Required Inputs and Preconditions”: data freshness, minimum fields, acceptable data sources, and steps when data is missing or suspected incorrect.

Make limitations concrete. Instead of “may be inaccurate,” state conditions: “Performance degrades on out-of-distribution documents (scanned images with heavy compression),” or “The system may underperform for minority dialects; escalate when confidence is below X or when the user disputes the result.” List known risks and their mitigations: hallucination risk, proxy discrimination risk, privacy leakage risk, and security risks (prompt injection, data poisoning). Provide operational constraints: maximum allowed automation level, mandatory review for certain categories, and prohibited prompts or data types.

Include step-by-step procedures: (1) verify input data, (2) run inference, (3) review output with supporting evidence, (4) decide/override with rationale, (5) record action, (6) escalate if thresholds met. Add “Intervention Playbooks”: what to do if the system outputs disallowed content, contradicts authoritative sources, or shows drift indicators. The most common mistake is burying these steps in prose. Use short numbered procedures and keep “what to do now” visible.

Section 5.4: Transparency for AI interactions, content, and assistance

Transparency is not just a banner that says “AI is used.” It is a set of disclosures tailored to who needs to know what, when they need to know it, and in what format. Your milestone here is to create transparency notices and disclosure artifacts that can be deployed across UI, policies, and customer communications.

Draft three layers of transparency. Layer 1: Point-of-use notice (in the interface): concise disclosure that the user is viewing AI-assisted output, the output’s role (recommendation vs. decision), and a link to more detail. Layer 2: Operator disclosure: what signals the system uses at a high level, key limitations, confidence/uncertainty meaning, and how to challenge or override results. Layer 3: External/user-facing disclosure (when the AI affects individuals): an explanation of the AI’s involvement, contact channels, and how to contest outcomes where applicable.

For AI-generated content, add a “Content Provenance” artifact: how generated text/images are marked, when they are stored, and how downstream recipients are informed. If the system provides assistance (e.g., drafting, triage, summarization), specify boundaries: “assistant output is a draft; operator must verify against source records.” Avoid misleading anthropomorphic language. Use neutral phrasing: “The system generated a summary based on the provided documents.”

Engineering judgment matters in what you disclose: too little increases misuse; too much can expose security-sensitive details. A practical compromise is to disclose categories of features and data sources, not exact weights or exploit-relevant thresholds. Finally, connect transparency to logging: if the UI shows a confidence score or explanation, record what was displayed so later investigations can reconstruct what the operator saw.

Section 5.5: Complaint handling and user feedback capture

Complaints and feedback are not “customer support”; they are part of post-market monitoring and continuous risk control. Build a feedback loop that captures real-world failures, routes them to accountable owners, and produces evidence for corrective actions. This section’s practical outcome is a complaint-handling workflow and a feedback schema that can be implemented with minimal ambiguity.

Define intake channels: in-product “report issue” button, email/ticketing system, and (where relevant) formal channels for affected persons. For each channel, define required fields: system version, timestamp, context, input snapshot (with privacy controls), output snapshot, operator action taken, and user-reported harm type. Classify issues into categories: factual error, bias/fairness concern, unsafe content, privacy/security incident, usability/instructions confusion, and performance drift.

Create triage SLAs based on severity and reversibility. Example: potential unlawful discrimination or safety harm gets immediate escalation to compliance and incident response; minor usability complaints go to product backlog. Link complaints to a corrective action process: reproduce, root-cause (data, model, UI, workflow), mitigate (patch, retrain, threshold change, instruction update), verify, and close with evidence. Record whether an issue indicates the instructions were unclear or whether oversight failed; this is how you improve both documentation and controls.

Common mistakes include collecting feedback without enough context to reproduce, failing to version outputs, and not informing deployers when instructions change. Your handoff pack should include a one-page “How to Report Issues” guide and a decision tree for escalation.

Section 5.6: Deployment readiness checklist and go/no-go criteria

The final milestone is to finalize a deployer handoff pack and to validate usability: can operators follow the instructions? Deployment readiness is where you turn documents into a decision: ship, pilot with constraints, or stop until gaps are closed.

Build a deployment readiness checklist that ties directly to risk classification and controls. At minimum include: oversight measures implemented (not just described), operator training completed with records, access controls configured, transparency notices placed in the right user journeys, logging and audit trails verified, feedback/complaint workflow operational, and incident response contacts confirmed. Add technical gates: model/version pinned, evaluation metrics meet thresholds, drift monitoring configured, data governance controls in place, and rollback plan tested.

Define explicit go/no-go criteria. Examples of no-go: operators cannot correctly follow escalation steps in a usability test; override authority is unclear; key disclaimers are missing at point-of-use; audit logs do not capture operator actions; known high-severity failure mode lacks a mitigation. For conditional go (pilot), list constraints: limited user group, shadow mode, manual approval required, reduced scope, increased monitoring frequency.

Run a usability validation that mirrors reality: give operators realistic cases under time limits, measure completion rates, error types (missed escalation, over-acceptance, misinterpretation of confidence), and time-to-intervention. Update instructions and UI prompts based on observed failures, then re-test. Package the final handoff as a structured bundle: Oversight Control Sheet, User Instructions, Transparency Notices, Training Plan, Access Control Summary, Logging Map, Feedback Workflow, and the signed go/no-go decision with owners and dates.

Chapter milestones
  • Milestone: Specify human oversight measures and intervention points
  • Milestone: Draft user instructions and operational constraints
  • Milestone: Create transparency notices and disclosure artifacts
  • Milestone: Validate usability: can operators follow the instructions?
  • Milestone: Finalize a deployer handoff pack
Chapter quiz

1. In Chapter 5, why are human oversight, transparency, and user instructions treated as more than “documentation tasks”?

Show answer
Correct answer: They function as control mechanisms that prevent misuse, reduce over-reliance, and support credible responsibility assignment
The chapter frames these artifacts as operational controls that shape safe use and accountability, not as paperwork.

2. Which set of deliverables best matches the evidence-ready deployer handoff pack described in the chapter?

Show answer
Correct answer: Oversight plan, user instructions, disclosure artifacts, usability validation notes, and a deployment readiness checklist with go/no-go criteria
The chapter lists these specific artifacts as the pack to hand off to deployers as evidence-ready documentation.

3. Which oversight plan element most directly addresses the chapter’s critique of generic policies like “a human will review outputs”?

Show answer
Correct answer: Define who reviews, when review happens, what information they get, required response time, and authority to override/stop the system
The chapter emphasizes specifying concrete intervention points and operational details (who/when/how/authority), not vague commitments.

4. What is the primary goal of drafting user instructions and operational constraints in this chapter?

Show answer
Correct answer: Constrain use to the intended purpose and keep operation within safe boundaries
The chapter focuses on instructions as mechanisms to ensure intended-purpose use and prevent foreseeable misuse.

5. What does the chapter identify as the second most common failure mode after vague oversight statements?

Show answer
Correct answer: Instructions that are theoretically correct but unusable in the field (too long, ambiguous, or missing decision-time steps)
It warns that instructions must be usable by real operators at the moment of decision, not merely correct on paper.

Chapter 6: Post-Market Monitoring and Audit-Ready Packaging

Shipping an AI system is not the finish line under the EU AI Act—it is the moment your compliance posture starts being tested by real-world behavior, real users, and real failure modes. Post-market monitoring is where your assumptions meet operational reality: data shifts, new user strategies, emergent misuse, changed regulations, patched dependencies, and evolving cyber threats. This chapter turns the “after launch” phase into a disciplined loop you can run every week, and package every quarter, without panic.

Two habits separate teams that pass audits from teams that scramble: (1) measuring the right signals with clear triggers and ownership, and (2) writing evidence narratives that connect those signals to decisions. You will design post-market monitoring KPIs and drift triggers, draft an incident reporting and corrective action workflow, assemble an audit-ready evidence package with an index, run a mock audit to produce an improvement backlog, and end with a practical 90-day compliance maintenance plan. The goal is not to create more paperwork; it is to make sure every meaningful decision leaves a trail that is easy to follow and hard to misunderstand.

Throughout this chapter, treat “audit-ready packaging” as an engineering product: an index, versioning, traceability, and well-defined inputs/outputs. Your future auditors (or internal reviewers) should be able to answer three questions quickly: What is the system supposed to do? How do you know it’s doing that safely and fairly? What did you do when it didn’t?

Practice note for Milestone: Design post-market monitoring KPIs and drift triggers: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Milestone: Draft an incident reporting and corrective action workflow: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Milestone: Assemble an audit-ready evidence package with an index: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Milestone: Run a mock audit and produce an improvement backlog: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Milestone: Create a 90-day compliance maintenance plan: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Milestone: Design post-market monitoring KPIs and drift triggers: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Milestone: Draft an incident reporting and corrective action workflow: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Milestone: Assemble an audit-ready evidence package with an index: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Milestone: Run a mock audit and produce an improvement backlog: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Sections in this chapter
Section 6.1: Monitoring plan: performance, bias, safety, security, misuse

Start by translating your intended purpose into measurable outcomes. A monitoring plan is not a dashboard of “interesting metrics”; it is a set of KPIs tied to risk controls, plus drift triggers that force action. Build your plan as a table with: metric name, definition, data source, segmentation, threshold, trigger severity, response owner, and evidence location.

Cover five monitoring domains:

  • Performance: task success (accuracy, calibration, false positive/negative rates), latency, uptime, and failure rates. Include “business proxy” metrics only if they map to safety outcomes (e.g., appeal rates for automated decisions).
  • Bias and fairness: measure performance gaps across relevant groups and contexts. Use consistent slices (e.g., geography, device type, language, protected attributes where lawful) and justify any attributes you do not collect.
  • Safety: harmful output rate, unsafe action recommendations, or constraint violations. Define a “harm taxonomy” so reviewers label incidents consistently.
  • Security: prompt injection success rate, data exfiltration attempts detected, model inversion signals, abuse of APIs, and dependency vulnerability status.
  • Misuse: policy violation rate, anomalous usage patterns, and indications the system is being used outside intended purpose.

For the milestone “design post-market monitoring KPIs and drift triggers,” make triggers operationally specific. Avoid vague triggers like “significant drift.” Prefer: “PSI > 0.25 for two consecutive weeks on top-10 features,” “FNR increases by 20% vs baseline on segment X,” or “unsafe output rate exceeds 0.5% in any single-day window.” Define what happens next: who reviews, what gets frozen, whether you roll back, and when you notify stakeholders.

Common mistakes: monitoring only aggregate metrics (hides segment harms), collecting metrics without response playbooks, and failing to version baselines (you can’t prove drift if you can’t prove what ‘normal’ was). Practical outcome: a monitoring spec that can be handed to engineering and SRE, plus an “Evidence ID” for each metric pipeline and alert rule.

Section 6.2: Serious incident criteria and internal escalation pathways

Incident handling must be pre-decided. Under the EU AI Act, you should be prepared to identify and report serious incidents and malfunctioning that may breach obligations. Your job is to define criteria that your team can apply consistently at 2 a.m., and an escalation pathway that does not depend on a single person’s judgment.

Define serious incident criteria using a decision tree that considers: impact severity (harm to health, safety, fundamental rights, or significant economic harm), scope (number of affected persons), persistence (one-off vs recurring), and detectability (silent failure vs obvious). Include “near misses” as a separate category: events that could have caused harm but were caught by controls. Near misses are gold for prevention and auditors often ask for them because they show learning.

For the milestone “draft an incident reporting and corrective action workflow,” specify roles and timelines: L1 triage (support/on-call), L2 technical assessment (ML engineer + product), L3 compliance/legal review, and executive sign-off when needed. Add a single “Incident Coordinator” role who owns the clock and evidence capture. Your workflow should require: preserving logs, freezing relevant versions (model, prompts, data, code), capturing a minimal reproducible example, and documenting user-visible impact.

Common mistakes: over-classifying everything as “serious” (burnout and noise), under-classifying because teams fear blame, and failing to retain the exact artifacts that allow root-cause analysis. Practical outcome: an incident SOP with a clear escalation matrix (who to page, who to inform, who can approve rollback) and an incident record template that links to evidence (logs, runbooks, model card version, monitoring alerts).

Section 6.3: Corrective and preventive actions (CAPA) for AI systems

CAPA turns incidents and monitoring triggers into durable improvements. Corrective action fixes the current problem; preventive action reduces the chance it happens again. For AI systems, CAPA must cover not just code changes, but data, evaluation, human oversight, user instructions, and deployment controls.

Implement CAPA as a structured record with these fields: problem statement, impact assessment, containment action (immediate mitigation), root cause analysis, corrective action plan, preventive action plan, verification method, effectiveness check date, and closure criteria. Root cause analysis should explicitly consider: data drift, label noise, pipeline bugs, prompt/template regressions, third-party model updates, UI changes that shift user behavior, and security bypasses.

Engineering judgment matters when choosing remedies. Example: if bias gaps appear only after a new user segment arrives, retraining may not be the first step—first validate whether the segment is within intended purpose and whether you have lawful, representative data. Sometimes the correct action is to constrain usage (geo-fencing, eligibility checks) rather than to optimize metrics on out-of-scope data.

Corrective actions should be tied to measurable outcomes: “Reduce unsafe output rate from 0.8% to <0.2% on red-team suite V3,” or “Restore calibration error to within 10% of baseline.” Preventive actions often include new tests in CI/CD, stronger monitoring, better reviewer guidance, or updated user instructions.

Common mistakes: closing CAPA after deploying a patch without verifying effectiveness, and failing to update documentation (model card, risk record, user instructions) to reflect the new control. Practical outcome: a CAPA log that auditors can sample, showing traceability from detection → decision → change → verification.

Section 6.4: Audit preparation: sampling strategy and evidence narratives

An audit-ready package is an index plus stories. The index tells reviewers where evidence lives; the narrative explains why that evidence proves compliance for your intended purpose. For the milestone “assemble an audit-ready evidence package with an index,” create an “Evidence Register” with: Evidence ID, title, system version, owner, location (URL/path), confidentiality level, and mapped requirement/control.

Then design your sampling strategy. Auditors rarely review everything; they sample. You should propose your own samples to demonstrate coverage and reduce random digging. Use risk-based sampling: pick the highest-impact decisions, highest-risk segments, most recent changes, and a few routine cases. Include at least: one incident end-to-end, one monitoring alert with resolution, one model update with approval trail, and one user complaint investigation.

Write evidence narratives that connect artifacts across the lifecycle. Example narrative: “Monitoring alert A-142 detected FNR drift in segment ‘new language locale’ → triage confirmed data shift → containment applied via feature flag → CAPA-33 retrained on expanded dataset with new fairness evaluation → effectiveness verified by test suite V5 → user instructions updated with new limitations.” Each arrow should point to an Evidence ID.

For the milestone “run a mock audit and produce an improvement backlog,” simulate an auditor’s behavior: ask for a claim (“bias is monitored”), then force the team to show the exact dashboard, the threshold, the alert history, and the decision logs. Anything you cannot produce within minutes becomes a backlog item. Common mistakes: evidence scattered across tools without stable links, and narratives that describe intentions rather than executed controls. Practical outcome: a curated audit pack with a table of contents, traceability map, and a prioritized backlog of gaps.

Section 6.5: Conformity assessment readiness and stakeholder coordination

Post-market work often fails due to unclear responsibilities across provider, deployer, importer, distributor, and product manufacturer roles. Conformity assessment readiness is as much coordination as it is documentation. Make a RACI (Responsible, Accountable, Consulted, Informed) matrix for ongoing tasks: monitoring operation, incident triage, regulator communications, model updates, and user instruction changes.

Coordinate around three recurring events: (1) release approvals, (2) incident governance, and (3) evidence publication. For release approvals, require a “release note for compliance” that states what changed (model weights, prompts, training data, thresholds), the evaluation delta, and whether intended purpose or limitations changed. For incident governance, ensure deployers know how to report anomalies and what logs they must retain. For evidence publication, decide what is shared externally (to customers/partners) versus internally (full technical detail).

When third parties are involved (hosted foundation models, data providers, integrators), define contractual hooks: notification windows for upstream model changes, security incident cooperation, and access to evaluation artifacts. A common mistake is assuming that “the vendor is compliant” substitutes for your own evidence; in practice, you need vendor documentation mapped to your system’s intended purpose and integration risks.

Practical outcome: a stakeholder coordination plan that reduces surprises during conformity assessment or market surveillance—complete with named owners, recurring meeting cadence, and a shared repository structure aligned to your evidence index.

Section 6.6: Continuous compliance: updates, retraining, decommissioning

Continuous compliance is a living process: you will update models, fix vulnerabilities, retrain on new data, and sometimes retire a system. Treat each of these as a controlled change with pre-defined gates, not an ad hoc “ML refresh.” End this chapter by creating a 90-day compliance maintenance plan that includes weekly, monthly, and quarterly activities with owners and outputs.

Weekly: review monitoring alerts, review top user complaints, confirm security patches and dependency scans, and check that logging coverage meets your retention policy. Monthly: run drift reports, fairness slice reviews, red-team regressions for known misuse patterns, and a CAPA effectiveness check for recently closed actions. Quarterly: refresh risk classification assumptions, re-validate intended purpose boundaries, sample evidence narratives for audit readiness, and perform a mock audit mini-sprint to generate an improvement backlog.

For updates and retraining, define change categories (minor/major) based on impact to intended purpose and risk controls. Major changes should trigger expanded evaluation, updated user instructions, and an evidence bundle that can be shown as “before/after.” Always preserve the ability to roll back to a known-good version and document rollback criteria.

For decommissioning, include: notifying deployers/users, disabling endpoints, archiving evidence, retaining logs per policy, and documenting final known limitations and any open CAPAs. Common mistakes: retraining without updating baselines (breaks monitoring comparability) and failing to update user instructions when system behavior shifts. Practical outcome: a calendarized maintenance plan plus a change-control checklist that keeps your system compliant as it evolves.

Chapter milestones
  • Milestone: Design post-market monitoring KPIs and drift triggers
  • Milestone: Draft an incident reporting and corrective action workflow
  • Milestone: Assemble an audit-ready evidence package with an index
  • Milestone: Run a mock audit and produce an improvement backlog
  • Milestone: Create a 90-day compliance maintenance plan
Chapter quiz

1. In Chapter 6, what does “after launch” primarily represent for EU AI Act compliance?

Show answer
Correct answer: The start of continuous testing by real-world behavior, users, and failure modes
The chapter frames shipping as the moment compliance is tested in operational reality, requiring ongoing monitoring and response.

2. Which pair of habits does Chapter 6 say separates teams that pass audits from teams that scramble?

Show answer
Correct answer: Measuring the right signals with clear triggers/ownership, and writing evidence narratives that connect signals to decisions
The chapter emphasizes the right signals with triggers and ownership, plus evidence narratives that link monitoring to decisions.

3. Why does Chapter 6 emphasize designing KPIs and drift triggers for post-market monitoring?

Show answer
Correct answer: To detect when assumptions no longer match operational reality (e.g., data shifts, misuse, threats) and prompt action
KPIs and triggers are meant to surface changes like drift, misuse, dependency patches, and cyber threats so teams can respond.

4. How should “audit-ready packaging” be treated according to Chapter 6?

Show answer
Correct answer: As an engineering product with an index, versioning, traceability, and well-defined inputs/outputs
The chapter explicitly describes audit-ready packaging as an engineered artifact with structure and traceability.

5. What are the three questions your evidence package should help auditors (or internal reviewers) answer quickly?

Show answer
Correct answer: What the system is supposed to do; how you know it’s doing that safely and fairly; what you did when it didn’t
Chapter 6 defines these three questions as the core audit-readiness test for your evidence narratives and indexing.
More Courses
Edu AI Last
AI Course Assistant
Hi! I'm your AI tutor for this course. Ask me anything — from concept explanations to hands-on examples.