HELP

+40 722 606 166

messenger@eduailast.com

AI Risk Basics for Beginners: Assess, Document, Reduce Harm

AI Ethics, Safety & Governance — Beginner

AI Risk Basics for Beginners: Assess, Document, Reduce Harm

AI Risk Basics for Beginners: Assess, Document, Reduce Harm

Learn a simple, repeatable process to spot and reduce AI harm.

Beginner ai-risk · ai-safety · ai-ethics · governance

Build AI risk skills from zero—without code

AI tools can help people work faster, but they can also cause harm: private data can leak, people can be treated unfairly, and confident-looking outputs can be wrong. If you are new to AI, this can feel overwhelming—especially when you’re asked to “do a risk assessment” without clear steps.

This beginner course is a short, book-style guide that teaches AI risk from first principles. You will learn a simple workflow to assess risks, document what you found, and reduce harm in practical ways. You do not need any coding, math, or data science background. Everything is explained in plain language and backed by real-world examples.

What you will be able to do by the end

By the final chapter, you will be able to describe an AI system at a high level, list its most likely harms, prioritize what matters most, and capture your work in a clear set of documents that others can review. You will also learn how to choose realistic risk controls—like human review, operating limits, user disclosures, monitoring, and a basic incident plan—so your risk work leads to action.

  • Understand what “AI risk” means and why harm can happen even with good intentions
  • Map a system: purpose, users, data in/out, and human handoffs
  • Spot common harm types (privacy, bias, reliability, security, misuse)
  • Prioritize risks with a simple likelihood × impact approach
  • Create a beginner-friendly risk register and evidence pack
  • Plan controls and monitoring so risk reduction continues after launch

How the 6 chapters fit together

The course progresses like a short technical book. Chapter 1 builds the basic vocabulary: what AI is, what risk is, and how harm shows up in the real world. Chapter 2 teaches you to “map the system,” because you can’t assess risk if you don’t know what the system does and who it affects. Chapter 3 gives you a practical way to spot harms without needing advanced technical knowledge.

Once you can list potential harms, Chapter 4 shows you how to prioritize them using a simple scoring method so you know what to tackle first. Chapter 5 turns your assessment into useful documentation: a risk register, decision log, and lightweight evidence that supports your conclusions. Finally, Chapter 6 focuses on reducing harm and staying safe over time with controls, monitoring, and a basic incident response approach.

Who this is for

This course is designed for absolute beginners: students, individual professionals, managers, policy staff, procurement teams, and anyone who needs to understand AI risks without becoming an engineer. It is also useful for small organizations that need a clear, lightweight process before using AI tools in customer-facing or employee-facing work.

Get started

If you want a simple, repeatable way to handle AI risk—without jargon—enroll and begin building your first risk documentation pack. Register free or browse all courses to find related beginner topics.

What You Will Learn

  • Explain what “AI risk” means in plain language and why it matters
  • Map an AI system at a high level: purpose, users, data in/out, and decisions
  • Identify common harm types (privacy, bias, errors, security, misuse)
  • Use a simple likelihood × impact method to prioritize risks
  • Write a beginner-friendly AI Risk Register with clear owners and due dates
  • Document key information using lightweight templates (model card–style summary, data notes, decision log)
  • Choose practical risk controls: human review, limits, monitoring, and user disclosures
  • Create a basic launch readiness checklist and ongoing monitoring plan

Requirements

  • No prior AI or coding experience required
  • Comfort using a web browser and basic office tools (docs or spreadsheets)
  • Willingness to think through simple real-world examples

Chapter 1: What AI Risk Is (and Isn’t)

  • Define AI, automation, and “risk” using everyday examples
  • Understand why AI can cause harm even with good intentions
  • Learn the difference between mistakes, harm, and responsibility
  • Set the course goal: a repeatable beginner risk workflow
  • Checkpoint: describe one AI risk scenario in your own words

Chapter 2: Map the AI System You’re Assessing

  • Write a clear purpose statement and success criteria
  • List stakeholders and high-risk user groups
  • Trace the end-to-end workflow from data to decision
  • Find where humans interact with the system (handoffs and overrides)
  • Checkpoint: complete a one-page system map

Chapter 3: Spot Common AI Harms (Beginner Threat Modeling)

  • Identify privacy and data protection risks in simple terms
  • Recognize bias and unfair outcomes without statistics
  • Find reliability risks: errors, hallucinations, and edge cases
  • Consider security and misuse: prompt injection and abuse cases
  • Checkpoint: produce a first list of risks by harm type

Chapter 4: Assess and Prioritize Risks (Likelihood × Impact)

  • Turn risk ideas into testable risk statements
  • Score likelihood and impact using a simple scale
  • Decide severity levels and escalation triggers
  • Choose what to address first with limited time and budget
  • Checkpoint: build a prioritized risk list

Chapter 5: Document the Work (Risk Register + Evidence Pack)

  • Create a simple AI Risk Register that anyone can read
  • Assign owners, due dates, and proof of completion
  • Capture key decisions and trade-offs in a decision log
  • Collect lightweight evidence: tests, reviews, user feedback notes
  • Checkpoint: assemble a “minimum viable” risk documentation pack

Chapter 6: Reduce Harm and Monitor Over Time

  • Pick practical risk controls: prevent, detect, respond
  • Add human review and safe operating limits where needed
  • Design user-facing transparency: warnings, instructions, and consent
  • Set up monitoring and an incident response starter plan
  • Final checkpoint: complete a launch readiness checklist

Sofia Chen

AI Governance Lead and Risk Educator

Sofia Chen helps teams adopt AI responsibly by turning complex safety and governance topics into practical, beginner-friendly steps. She has supported risk reviews for AI features in customer service, hiring support, and content tools, focusing on documentation, testing, and clear decision-making.

Chapter 1: What AI Risk Is (and Isn’t)

“AI risk” sounds abstract until you connect it to ordinary work: a support chatbot that gives the wrong refund policy, a résumé screener that quietly filters out qualified candidates, or a medical note summarizer that omits an allergy. In each case, the system may be built with good intentions, but it can still cause harm. This chapter builds a plain-language foundation you can reuse across projects: what AI is (and isn’t), what “risk” means, how harm happens, who is affected, and how to write your first risk statement.

A key idea for beginners: risk work is not about proving a system is “safe” forever. It is about being able to explain the system, anticipate plausible failures and misuse, prioritize what matters most, and document decisions so improvements are repeatable. By the end of this chapter, you should be able to describe one AI risk scenario in your own words using a one-sentence template, which becomes the seed for a risk register later in the course.

We will treat risk as a practical workflow, not a philosophical debate: map the system at a high level (purpose, users, data in/out, decisions), identify common harm types (privacy, bias, errors, security, misuse), estimate likelihood and impact, assign owners, and track fixes with due dates. This is engineering judgment applied with humility: you won’t know everything, but you can know enough to reduce harm.

  • AI is pattern-based prediction or generation from inputs to outputs; it is not “magic” or “understanding.”
  • Risk means uncertainty plus consequences; good intentions do not remove risk.
  • Harm often enters through data, goals, interfaces, deployment context, or misuse.
  • Responsibility is about who can prevent or reduce harm—not just who made a mistake.
  • Your goal is a repeatable, beginner-friendly risk workflow with lightweight documentation.

As you read, keep one system in mind—something you use at work or in daily life. You will use it to practice writing a clear risk statement at the end of the chapter.

Practice note for Define AI, automation, and “risk” using everyday examples: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Understand why AI can cause harm even with good intentions: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Learn the difference between mistakes, harm, and responsibility: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Set the course goal: a repeatable beginner risk workflow: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Checkpoint: describe one AI risk scenario in your own words: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Define AI, automation, and “risk” using everyday examples: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Understand why AI can cause harm even with good intentions: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Sections in this chapter
Section 1.1: AI in plain language (inputs, outputs, patterns)

In plain language, an AI system takes inputs (text, images, clicks, sensor readings, database records), finds patterns based on past examples, and produces outputs (a label, score, ranking, recommendation, or generated content). A spam filter labels email; a credit model outputs a risk score; a routing system recommends a driver path; a generative assistant drafts text. The “intelligence” is mostly pattern-fitting: the system learns correlations that help it predict or generate outputs that look like training examples.

It helps to separate AI from automation. Automation means a process runs with limited human involvement (for example, automatically rejecting an application below a threshold). AI is one possible component inside automation. You can have automation with no AI (simple rules), and you can have AI with no automation (a tool that suggests but a human decides). Many real systems combine both: an AI score triggers an automated action unless a human intervenes.

For risk work, always map the system as a simple pipeline: (1) purpose, (2) users, (3) input data sources, (4) model behavior, (5) output format, (6) decision point, and (7) feedback loop (what gets logged or learned). This mapping prevents a common beginner mistake: focusing only on the model and forgetting the surrounding product. Most harms happen at the seams—where data is collected, how outputs are shown, and how people act on them.

  • Inputs: What the system reads (including hidden inputs like metadata and user history).
  • Outputs: What it produces (including confidence scores and explanations).
  • Decisions: What action happens because of the output (approve/deny, prioritize/deprioritize, warn/ignore).

Engineering judgment starts with asking: “If the output is wrong, who will notice, and how fast?” An AI that drafts marketing copy has different risk than one that influences hiring, lending, healthcare, or safety-critical operations. The same model can be low risk in one context and high risk in another.

Section 1.2: What “risk” means: uncertainty and consequences

Risk is not the same as a bug, and it is not the same as harm. A useful beginner definition is: risk = uncertainty × consequences. Uncertainty means you do not fully control what will happen—because the world changes, data is incomplete, people behave unexpectedly, and models can generalize poorly. Consequences mean there is something at stake: money, access, health, rights, trust, or safety.

This is why AI can cause harm even with good intentions. You can design a system to “help,” but if the model is uncertain in some situations (rare cases, new user groups, changed policies), the output can still lead to negative outcomes. A well-meaning fraud model might wrongly block legitimate customers; a content moderation model might silence certain dialects; a summarizer might omit key constraints in a contract draft.

To prioritize, use a simple likelihood × impact method. Likelihood asks: how often could this happen in the real deployment? Impact asks: if it happens, how bad is it for the affected people and the organization? Beginners often over-focus on spectacular but rare scenarios. Good practice is to list several plausible risks, then rank them consistently.

  • Likelihood (1–5): 1 = rare edge case; 3 = occasional; 5 = frequent/expected.
  • Impact (1–5): 1 = minor annoyance; 3 = meaningful harm or financial loss; 5 = severe safety, legal, or rights impact.
  • Priority score: likelihood × impact; treat high scores as “fix first.”

Common mistake: treating likelihood as “model accuracy.” Accuracy is measured on a test set; likelihood is about real life: data drift, user behavior, adversarial inputs, operational failures, and how strongly the system’s output influences decisions. Another mistake is only measuring impact in dollars. Impact should include human outcomes: privacy intrusion, unfair exclusion, reputational damage, or increased vulnerability to fraud.

As you continue the course, this simple scoring becomes the backbone of a beginner risk register: each risk gets a score, an owner, a mitigation plan, and a due date—so risk management becomes a routine practice, not a one-time document.

Section 1.3: The harm chain: where problems enter the system

Think of harm as a chain of events, not a single moment. If you only look for “model mistakes,” you will miss many sources of risk. A practical mental model is the harm chain: where problems enter, how they propagate, and where they become real-world consequences.

Start by locating entry points:

  • Problem framing: Are you optimizing the right goal? A proxy metric (clicks) can conflict with a real goal (informed decisions).
  • Data collection: Missing groups, biased labels, privacy-invasive collection, or stale data can embed problems early.
  • Model behavior: Overconfidence, poor calibration, hallucinations, spurious correlations, or weak performance on minority cases.
  • Interface and UX: Users may over-trust outputs, ignore uncertainty, or misunderstand what the system can do.
  • Deployment context: New policies, new populations, changing language, seasonal effects, or integration errors.
  • Misuse and abuse: Prompt injection, data exfiltration, model inversion, phishing content generation, or policy evasion.

Now connect these to the common harm types you will use throughout the course: privacy (exposing or misusing personal data), bias (unequal error rates or unfair outcomes), errors (wrong outputs leading to wrong actions), security (attacks on the model or pipeline), and misuse (harmful applications beyond the intended use). Most real incidents involve more than one type—for example, a security failure can become a privacy breach.

Distinguish mistakes, harm, and responsibility. A mistake is a technical or operational error (wrong label, broken data pipeline). Harm is the negative effect on people or systems (a customer loses access to funds, a candidate is unfairly rejected). Responsibility is about who had the ability to prevent or reduce that harm (product owner who chose automation level, engineering team who lacked monitoring, leadership who set unrealistic timelines). Risk management is the practice of making responsibility explicit and actionable.

Practical outcome: when you later fill a risk register, you will describe where in the harm chain you can intervene—data filters, model constraints, human review, rate limits, logging, monitoring, and user education. Interventions are rarely “just retrain the model.”

Section 1.4: Who is affected: users, non-users, and society

AI risk assessment fails when it only considers the intended user. A system can harm users (the people operating it), subjects (the people the system is about), non-users (people indirectly impacted), and society (broader effects like trust, misinformation, or inequality). Mapping stakeholders is not bureaucracy; it is a way to discover risks you would otherwise miss.

Example: a customer service assistant may be used by agents (users) but affects customers (subjects). If it suggests harsher language for certain names or regions, the harm is felt by customers, even if agents “just followed the tool.” A street-scene recognition model might be used by city staff but affects residents and visitors who never consented to being analyzed. A generative image tool might be used for fun but can enable harassment of targeted individuals (non-users).

  • Primary users: Who directly interacts with the system?
  • Decision subjects: Who is evaluated, ranked, approved, or denied?
  • Bystanders: Who appears in the data incidentally (voices in the background, contacts, location traces)?
  • Downstream organizations: Partners, vendors, or regulators affected by outputs.

Practical workflow: for any system map, write a short “who could be harmed” list before you write mitigations. This reduces a common mistake: designing controls that protect the company (e.g., disclaimers) but not the people most impacted (e.g., appeal paths, error correction, privacy choices). Another common mistake is assuming that “human in the loop” automatically solves the problem. Humans can be overloaded, may trust the tool too much, or may not have authority to override. If a human is part of the safety plan, specify what they review, how often, and what happens when they disagree with the model.

Practical outcome: stakeholder mapping will later feed your documentation templates (model card–style summary, data notes, decision log). Those documents should name the affected groups and the intended protections, so risk decisions are visible and auditable.

Section 1.5: Safety vs ethics vs compliance (simple distinctions)

Beginners often treat safety, ethics, and compliance as interchangeable. They overlap, but they are not the same, and confusing them leads to gaps.

  • Safety is about preventing unacceptable harm, especially where failures can cause serious outcomes. It focuses on reliability, robustness, monitoring, and fail-safes.
  • Ethics is about what should be done, not just what can be done. It includes fairness, respect for persons, transparency, and avoiding manipulation—even if something is technically legal.
  • Compliance is about meeting external requirements: laws, regulations, contractual obligations, and internal policies. It often requires specific documentation and controls.

A compliant system can still be harmful (meeting minimum legal requirements is not the same as being responsible). An ethical intent can still produce unsafe outcomes (good goals don’t guarantee robust operation). And a “safe” system in one context can be unethical in another (a perfectly accurate surveillance model can still violate rights).

Use this distinction to make better engineering trade-offs. If a risk is primarily safety, you might invest in monitoring, rollback plans, conservative thresholds, and escalation paths. If it is primarily ethics, you might revisit whether the use case is appropriate, add user choice, or change incentives and metrics. If it is primarily compliance, you might focus on consent flows, data retention, access controls, and documentation (and involve legal early).

Common mistake: using a single control (like a disclaimer) as a universal mitigation. Disclaimers may help with communication, but they rarely reduce likelihood, and they do little for non-users. Stronger mitigations change system behavior or decision pathways: limit automation, add friction for high-risk actions, remove sensitive inputs, or implement auditing.

Practical outcome: in later chapters, your risk register will include a “risk type” tag (safety/ethics/compliance). This helps route work to the right owners and prevents “nobody owns it” problems.

Section 1.6: Your first risk statement (one sentence template)

Risk work becomes practical when you can state a risk clearly and consistently. A good beginner risk statement is one sentence that names the system behavior, the affected party, the harm type, and the consequence. Avoid vague language like “the model might be biased.” Biased how? Against whom? Leading to what?

Use this template:

  • When [context/trigger], the AI system [does what], which could [harm type] to [who], resulting in [consequence].

Example (errors + safety): “When clinicians use the note summarizer for discharge instructions, the AI system may omit medication allergies, which could cause patient safety harm to the patient, resulting in inappropriate prescriptions.” Example (privacy + security): “When employees paste customer emails into the chatbot, the AI system may send personal data to an external service, which could create a privacy breach for customers, resulting in regulatory penalties and loss of trust.” Example (bias): “When the screening model ranks applicants, the AI system may systematically score certain groups lower due to historical label bias, which could create unfair exclusion for qualified candidates, resulting in reduced opportunity and legal exposure.”

Now write your own scenario in your own words using the template. Keep it specific enough that a teammate could propose a mitigation without asking follow-up questions. If you get stuck, revisit your high-level map from Section 1.1: inputs, outputs, and the decision point. A risk statement usually sits exactly at that decision point.

Practical outcome: your one sentence becomes the first row in an AI Risk Register later in the course, where you will add likelihood (1–5), impact (1–5), owner, mitigation steps, and a due date. That is the course goal: a repeatable workflow that turns concerns into tracked work, not endless discussion.

Chapter milestones
  • Define AI, automation, and “risk” using everyday examples
  • Understand why AI can cause harm even with good intentions
  • Learn the difference between mistakes, harm, and responsibility
  • Set the course goal: a repeatable beginner risk workflow
  • Checkpoint: describe one AI risk scenario in your own words
Chapter quiz

1. Which best matches the chapter’s plain-language definition of AI?

Show answer
Correct answer: Pattern-based prediction or generation that maps inputs to outputs
The chapter defines AI as pattern-based prediction or generation from inputs to outputs, not “magic” or human-like understanding.

2. According to the chapter, why can an AI system still cause harm even if it was built with good intentions?

Show answer
Correct answer: Because good intentions do not remove uncertainty or consequences
Risk includes uncertainty plus consequences, so good intentions don’t eliminate the possibility of harmful outcomes.

3. Which statement best describes the chapter’s view of “risk work” for beginners?

Show answer
Correct answer: A repeatable workflow to explain the system, anticipate plausible failures/misuse, prioritize, and document decisions
The chapter emphasizes practical, repeatable risk work—not permanent guarantees or abstract debates.

4. In the chapter’s framing, what is the key difference between mistakes, harm, and responsibility?

Show answer
Correct answer: Responsibility is about who can prevent or reduce harm, not just who made a mistake
The chapter distinguishes a mistake from harm and defines responsibility as tied to the ability to prevent or reduce harm.

5. Which sequence best matches the chapter’s suggested beginner risk workflow?

Show answer
Correct answer: Map the system → identify common harm types → estimate likelihood/impact → assign owners → track fixes with due dates
The chapter presents risk as a practical workflow that includes mapping, identifying harms, estimating, assigning ownership, and tracking fixes.

Chapter 2: Map the AI System You’re Assessing

Before you can reduce AI risk, you need a shared, concrete picture of what the “system” is. Beginners often jump straight to the model (e.g., “we use GPT-4” or “we trained XGBoost”), but harms usually emerge from the full workflow: data collection, feature creation, prompts, model behavior, business rules, user interfaces, human handoffs, and how decisions are acted on.

This chapter teaches you to map an AI system at a high level so you can assess risk with engineering judgment rather than guesswork. You will write a clear purpose statement and success criteria, list stakeholders and high-risk user groups, trace the end-to-end workflow from data to decision, and mark where humans interact (handoffs and overrides). By the end, you should be able to complete a one-page system map that makes later risk prioritization faster and more accurate.

As you read, keep a simple principle in mind: if you cannot describe the system in one page, you probably cannot control its risks. A good map is not about perfection; it is about clarity. You are building a reference that lets you answer practical questions like: “Where does the data come from?”, “Who sees the output?”, “What happens if it is wrong?”, and “Who can stop it?”

  • Deliverable for this chapter: a one-page system map (box-and-arrow), plus short notes for purpose, users, data, outputs, and human checkpoints.
  • Why it matters: most real-world incidents come from mismatched assumptions: unclear scope, hidden users, reused data, or outputs being treated as decisions.

We will break the map into six pieces. Each piece is small enough to fill in quickly, but together they create a durable “source of truth” for your risk register and documentation templates later in the course.

Practice note for Write a clear purpose statement and success criteria: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for List stakeholders and high-risk user groups: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Trace the end-to-end workflow from data to decision: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Find where humans interact with the system (handoffs and overrides): document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Checkpoint: complete a one-page system map: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Write a clear purpose statement and success criteria: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for List stakeholders and high-risk user groups: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Trace the end-to-end workflow from data to decision: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Sections in this chapter
Section 2.1: Scope: what’s in and out of your assessment

Start by defining what you are assessing. “The AI system” is rarely just a model; it is the combination of model(s), data pipelines, prompts/configuration, business rules, user interface, and operational procedures. Your goal is to draw a boundary that is wide enough to capture real harms but narrow enough to finish the work.

Write a purpose statement in one or two sentences. Include the user benefit and the organizational intent. Example: “This system summarizes customer support chats to help agents respond faster and more consistently.” Then add success criteria that are measurable and safety-aware, not only accuracy-focused. Example criteria might include: average handling time reduced by 15%, summary omission rate below X%, and “no sensitive personal data appears in summaries unless already present in the chat.”

  • In scope: model behavior, prompts, retrieval sources, logging, agent UI, escalation rules, and any automation that triggers customer communication.
  • Out of scope (for now): upstream product decisions unrelated to the AI, or unrelated analytics dashboards—unless they feed into the AI workflow.

Common mistake: scoping only the model and ignoring the decision pathway. If the system’s output is used to deny a benefit, prioritize a loan, or trigger an investigation, then the decision and action belong in scope. Another mistake is scoping so broadly that no one owns anything. A useful boundary aligns with owners: each major component should have a team or person who can change it.

Practical outcome: a short scope paragraph plus a list of components included/excluded. This becomes the front page of your documentation and prevents “surprise” risks from being discovered late.

Section 2.2: Users and use-cases (intended and likely use)

Next, list who interacts with the system and who is affected by it. These are not always the same. A hiring screening model may be used by recruiters, but it affects candidates. A teacher-facing tutoring tool affects students. Mapping both groups is essential for risk work because harm often shows up in the “affected but not present” population.

Make a stakeholder list with at least four categories: primary users (direct operators), secondary users (people who consume outputs indirectly), affected parties (people subject to decisions), and governance stakeholders (legal, security, privacy, compliance, audit, and leadership). For each, note what they want and what they fear. This helps you anticipate misuse and incentives.

Include high-risk user groups and contexts. High-risk does not mean “problematic users”; it means users who face higher consequences if the system fails. Examples: minors, non-native speakers, people with disabilities, people in financial distress, patients, or individuals subject to disciplinary action. Also consider settings like healthcare, education, housing, employment, and law enforcement—where errors or bias can have serious impact.

  • Intended use: the official, designed workflow (e.g., “agents use summaries as a draft”).
  • Likely use: what will happen under time pressure (e.g., “agents copy/paste summaries directly to customers”).
  • Foreseeable misuse: what a motivated actor could do (e.g., “submit prompts to extract sensitive data” or “use rankings to discriminate”).

Common mistake: documenting only intended use. Risk assessments fail when they ignore likely use—especially under incentives like speed, quotas, or cost-cutting. Practical outcome: a table of stakeholders and use-cases that you will later reference when identifying harm types (privacy, bias, errors, security, misuse) and assigning owners.

Section 2.3: Data sources: where information comes from

Data is where many risks begin. Map every input that influences the model or the decision, including “hidden” inputs like user profiles, device metadata, or retrieved documents. For each source, record: who provides it, how it is collected, whether it is optional, how often it updates, and what quality checks exist.

Separate data into three buckets: training data (what shaped model behavior), runtime data (what the system reads during operation), and feedback data (what you collect to improve the system). This distinction matters because privacy and consent rules may differ, and because feedback loops can amplify bias. For example, if a fraud model flags certain transactions, and flagged transactions get more scrutiny, you may generate more “fraud labels” in those groups—creating a self-reinforcing pattern.

  • Origin: first-party (your app), second-party (partner), third-party (data broker), public web, or user-supplied.
  • Sensitivity: personal data, health data, financial data, location, children’s data, or confidential business data.
  • Quality risks: missing values, label noise, outdated records, measurement bias, language coverage gaps.
  • Security risks: data poisoning, prompt injection via retrieved content, unauthorized access to logs.

Common mistakes: assuming “public” means “safe,” or forgetting that logs and analytics are also data sources. Another frequent error is mixing “ground truth” with “proxy labels” without noting the limitations. Practical outcome: a short data inventory that can be reused later in a model card–style summary or “data notes” template, and that clearly indicates where privacy and bias harms could enter.

Section 2.4: Outputs: predictions, rankings, text, or actions

Now map what the system produces and how those outputs are used. Output types include classifications (approve/deny), scores (risk score), rankings (top candidates), generated text (summaries, emails), or actions (auto-block, route to human review). Risk depends heavily on whether an output is advisory or automatically executed.

Describe each output with four details: format (number, label, text), recipient (which user sees it), decision role (recommendation vs. decision), and time sensitivity (real-time vs. batch). Then list what happens next: does someone click “send,” does an API trigger an email, does it update a record that downstream systems consume?

  • Error modes: false positives/negatives, hallucinated facts, misrankings, toxic language, overconfident phrasing.
  • Interpretation risks: users treating a score as certain, confusing correlation with causation, or assuming “AI approved” means “policy approved.”
  • Downstream harms: denial of service, reputational damage, financial loss, emotional distress, discriminatory outcomes.

Common mistake: stopping at “the model outputs a score” and not documenting how the score becomes an action. Another is ignoring “soft” outputs like summaries that can still drive high-stakes decisions if copied into official records. Practical outcome: a clear output-to-action chain that later helps you pick controls: warnings, confidence indicators, human review thresholds, monitoring, and rollback procedures.

Section 2.5: Context matters: environment, incentives, constraints

Two systems with the same model can have different risks because context changes behavior. Capture the operating environment: where and when the system is used, what users are optimizing for, and what constraints they face. This section is where engineering judgment becomes explicit.

Document incentives and pressure points. If agents are measured on speed, they may over-rely on AI text. If managers are rewarded for reducing headcount, automation may creep from “assist” to “auto-act.” Also note constraints: limited training time, poor UI, multilingual users, intermittent connectivity, or restricted ability to escalate. These constraints often create the conditions for misuse and errors to matter more.

  • Deployment setting: internal tool, consumer-facing product, regulated industry, or public sector.
  • Operational controls: access management, logging, monitoring, incident response, change management.
  • Policy/legal considerations: data retention limits, consent requirements, explanation obligations, appeal processes.
  • Human factors: trust calibration, automation bias, fatigue, training quality.

Common mistake: treating context as “nice to have” detail. In reality, context determines likelihood: the same hallucination is low-impact in a brainstorming tool and high-impact in a medical note generator. Practical outcome: a short context paragraph that will later help you estimate likelihood × impact realistically and argue for proportionate safeguards without overengineering.

Section 2.6: Simple diagrams: the “box-and-arrow” map

With scope, users, data, outputs, and context captured, you can produce the one-page system map. Use a simple box-and-arrow diagram—no specialized tools required. The goal is readability: a new team member should understand the workflow in two minutes.

Start left-to-right: data in (sources), processing (cleaning, feature building, retrieval, prompt assembly), model(s) (and key configuration), post-processing (business rules, thresholds, safety filters), outputs, and actions. Then add the human interaction points: where a person provides input, reviews output, approves actions, or can override. Mark these as explicit “handoff” boxes.

  • Human-in-the-loop: reviewer approves before action.
  • Human-on-the-loop: automation runs, human monitors and can intervene.
  • Human-out-of-the-loop: no routine human checkpoint (highest need for strong controls).

Include two annotations on the diagram: (1) where logs are stored (because logs can create privacy risk), and (2) where external systems connect (because integrations create security and misuse risk). If you have multiple models (e.g., a classifier plus an LLM), show them separately; mixing them into one box hides failure modes.

Checkpoint: complete your one-page system map and attach your purpose statement and success criteria at the top. If you cannot place a component on the diagram, that is a signal: either it is out of scope, or you have a “hidden dependency” that should be documented before moving on to risk identification.

Chapter milestones
  • Write a clear purpose statement and success criteria
  • List stakeholders and high-risk user groups
  • Trace the end-to-end workflow from data to decision
  • Find where humans interact with the system (handoffs and overrides)
  • Checkpoint: complete a one-page system map
Chapter quiz

1. Why does Chapter 2 emphasize mapping the full AI system rather than focusing only on the model?

Show answer
Correct answer: Because harms often emerge from the end-to-end workflow, including data, interfaces, business rules, and human handoffs
The chapter warns beginners not to stop at “we use GPT-4/XGBoost,” because real-world harms typically come from how the whole workflow operates.

2. Which set of questions best reflects what a good system map should help you answer?

Show answer
Correct answer: Where does the data come from, who sees the output, what happens if it is wrong, and who can stop it
A clear map supports practical risk questions about data sources, exposure, consequences of errors, and control points.

3. What is the main deliverable for Chapter 2?

Show answer
Correct answer: A one-page system map (box-and-arrow) with short notes on purpose, users, data, outputs, and human checkpoints
The chapter’s deliverable is a one-page system map plus brief notes that serve as a shared “source of truth” for later work.

4. In this chapter, what does it mean to identify “human interactions” with the system?

Show answer
Correct answer: Marking where humans hand off work to the system or can override/stop what it produces
Human checkpoints are about handoffs and overrides—who reviews, approves, blocks, or acts on outputs.

5. According to the chapter, what is a common root cause of real-world AI incidents that system mapping helps prevent?

Show answer
Correct answer: Mismatched assumptions such as unclear scope, hidden users, reused data, or outputs being treated as decisions
The chapter highlights mismatched assumptions (scope, users, data reuse, and decision overreach) as frequent drivers of incidents.

Chapter 3: Spot Common AI Harms (Beginner Threat Modeling)

In Chapter 2 you mapped your AI system at a high level: what it’s for, who uses it, what data goes in and out, and what decisions it influences. In this chapter you’ll use that map to do beginner threat modeling: systematically scanning for common harm types so you can name risks early, before they turn into incidents.

“Threat modeling” can sound like an advanced security practice, but the beginner version is simply: (1) list what could go wrong, (2) group issues by harm type, (3) write down who would be affected and how, and (4) capture enough detail to prioritize later. You are not trying to prove something will happen; you are trying to avoid being surprised.

A useful mindset is to treat AI harms as the result of three things interacting: data (what the system sees), decisions (what the system does with it), and people (who depends on those decisions). Most real-world AI failures are not “AI is evil”; they are normal engineering gaps—missing constraints, unclear ownership, weak documentation, or optimistic assumptions about users.

As you read, keep your system map open. After each section, add at least one candidate risk to a scratch list. At the end of the chapter you will have your first “by harm type” risk list, ready to convert into a simple Risk Register in the next chapter.

Practice note for Identify privacy and data protection risks in simple terms: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Recognize bias and unfair outcomes without statistics: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Find reliability risks: errors, hallucinations, and edge cases: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Consider security and misuse: prompt injection and abuse cases: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Checkpoint: produce a first list of risks by harm type: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Identify privacy and data protection risks in simple terms: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Recognize bias and unfair outcomes without statistics: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Find reliability risks: errors, hallucinations, and edge cases: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Consider security and misuse: prompt injection and abuse cases: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Sections in this chapter
Section 3.1: Privacy basics: personal data and sensitive data

Section 3.1: Privacy basics: personal data and sensitive data

Privacy risk starts with a simple question: what information about a person could this system collect, infer, store, or reveal? “Personal data” is any data that can identify someone directly (name, email, phone) or indirectly (device IDs, unique combinations of attributes). “Sensitive data” is personal data that can cause greater harm if exposed or misused—health details, financial data, precise location, government IDs, private messages, biometrics, and information about children.

In AI systems, privacy risk often hides in places beginners don’t expect. Inputs may contain personal data (a support chat transcript). Outputs may reveal personal data (a summary that repeats a phone number). Even if you never ask for personal data, the system may infer it (guessing pregnancy status from shopping history). Treat “inference” as a form of collection: if the model can reliably derive it, it can function as sensitive data.

Use a practical workflow: first list data in, data stored, and data out. Then annotate each item with whether it is personal, sensitive, or non-personal. Next, ask four beginner threat-model questions: (1) Could the system capture more data than necessary? (2) Could it retain data longer than needed? (3) Could someone see it who shouldn’t (internal or external)? (4) Could the model output it inappropriately (e.g., quoting raw text, exposing hidden fields)?

  • Common mistake: assuming “we don’t store data” means no privacy risk. Logs, analytics, and vendor systems may still store prompts, embeddings, or chat histories.
  • Common mistake: treating “anonymized” as automatically safe. Re-identification can happen when multiple attributes are combined.
  • Practical outcome: a short list of privacy risks like “training data may include PII,” “support agents can view sensitive chat summaries,” or “outputs may echo customer identifiers.”

Finally, write risks in human terms. Instead of “PII leak,” say “a customer’s address could be revealed in an AI-generated email draft visible to the wrong recipient,” and note who is harmed and how (embarrassment, fraud risk, regulatory exposure). This clarity makes later prioritization easier.

Section 3.2: Fairness basics: unequal outcomes and who bears costs

Section 3.2: Fairness basics: unequal outcomes and who bears costs

Fairness risk is not limited to formal statistics. Beginners can detect many unfair outcomes by asking: does the system work worse for some groups, and do those groups carry the cost? Unequal outcomes can come from uneven data coverage, biased labels, proxy variables (like ZIP code standing in for race or income), or different user contexts (non-native speakers, disability accommodations, older devices).

Start with your “users” list from the system map and expand it. Include: primary users (operators), affected non-users (people being evaluated or described), and edge users (people who interact indirectly, like call-center staff relying on a summary). For each group, write what “good outcome” and “bad outcome” looks like. In a hiring screener, a bad outcome is being rejected unfairly; in a medical assistant, it’s receiving misleading guidance; in content moderation, it’s being silenced or harassed.

A practical, non-statistical check is to compare pathways. Ask: do different users provide different kinds of input? Do they have different ability to correct mistakes? Do they get different levels of scrutiny? For example, an automated claims triage tool might be reviewed when high-value customers are involved but auto-denied for others. That is a fairness risk even if the model is “accurate.”

  • Common mistake: assuming fairness is only about protected classes. In practice, unfairness can also affect people with low bandwidth, low literacy, uncommon names, or atypical dialects.
  • Common mistake: focusing only on model bias and ignoring process bias (who gets an appeal, who gets human review, who can opt out).
  • Practical outcome: candidate risks like “summaries misinterpret non-native English,” “recommendations under-serve rural users,” or “the tool shifts burden onto people least able to contest decisions.”

Write fairness risks with a “cost bearer” line: “If wrong, who pays?” Often it’s the end user, not the organization. Capturing that explicitly helps you prioritize later, because harm severity depends not only on frequency but on who has power to recover from errors.

Section 3.3: Reliability basics: accuracy vs usefulness

Section 3.3: Reliability basics: accuracy vs usefulness

Reliability risk is about whether the system is dependable in the real world, not whether it looks impressive in demos. Beginners often over-focus on “accuracy” as a single score. In practice you need a more grounded question: when the system is wrong, can users detect it before harm occurs? A system can be “useful” while imperfect if it fails safely and predictably.

For generative AI, hallucinations (confidently wrong statements) are the headline failure mode, but edge cases are just as important: unusual inputs, ambiguous requests, long documents, mixed languages, or missing context. Map reliability risk by looking at your system’s decision points. Where does the output become an action—sending an email, denying a refund, recommending a dosage, escalating a ticket? The closer the output is to an irreversible action, the more reliability matters.

A practical workflow is to define “minimum acceptable behavior” for three scenarios: typical case, stressful case, and worst-case. Typical case is normal user inputs. Stressful case includes time pressure, incomplete data, or high volume. Worst-case includes adversarial or confusing inputs. Then ask: does the system degrade gracefully? Does it say “I don’t know,” request clarification, or route to a human?

  • Common mistake: assuming “human-in-the-loop” automatically fixes reliability. If humans are overloaded, trust the model too much, or can’t see the evidence, errors still pass through.
  • Common mistake: mixing up “fluent output” with correctness. The more polished the text, the harder it is for users to detect mistakes.
  • Practical outcome: risks like “model invents policy citations,” “summaries omit critical exceptions,” “tool fails on rare product codes,” or “confidence cues cause over-trust.”

Document reliability risks with a concrete failure story: “In the edge case where a customer mentions two accounts in one message, the system may merge them and expose details across accounts.” This forces you to specify the triggering input, the bad output, and the downstream decision that makes it harmful.

Section 3.4: Safety basics: physical, emotional, and financial harm

Section 3.4: Safety basics: physical, emotional, and financial harm

Safety risk is about harm to people, not just system performance. In beginner AI risk work, “safety” includes three broad categories: physical harm (injury, dangerous instructions), emotional harm (harassment, manipulation, distress), and financial harm (fraud enablement, bad financial guidance, wrongful charges). Your goal is to identify where your AI system could influence high-stakes decisions or sensitive moments.

Start by tagging any use case that touches health, legal status, housing, education, employment, policing, or financial decisions. Even if your product is “just content,” it may be used in a high-stakes context. A writing assistant used by a landlord to draft notices can affect housing stability; a chatbot used by a stressed user can influence mental health choices.

A simple safety scan: list the top three actions a user might take because of the output. Then ask what happens if the output is (1) wrong, (2) misread, or (3) followed too literally. For example, an AI fitness coach giving generic advice could be unsafe for users with medical conditions; a budgeting tool could recommend risky moves that trigger overdraft fees; a customer-service bot might escalate a conflict with insensitive language.

  • Common mistake: treating safety as only “self-harm content” or “weapons.” Safety includes everyday harms like coercive language, shaming tone, or advice that ignores vulnerability.
  • Common mistake: assuming disclaimers are sufficient. Disclaimers help, but design controls (guardrails, escalation, refusals, evidence display) usually matter more.
  • Practical outcome: risks like “bot provides medical-like advice,” “system generates threatening debt-collection language,” or “recommendations encourage unsafe actions without context.”

When you write safety risks, include the affected person and the plausible pathway to harm. Safety risk statements should read like short incident reports in advance: who, what trigger, what output, what action, what harm. This structure helps teams propose realistic mitigations later.

Section 3.5: Security basics: access, attacks, and data leakage

Section 3.5: Security basics: access, attacks, and data leakage

Security risk asks: can someone make the system do something it shouldn’t, or gain access to data or capabilities they shouldn’t have? In AI products, classic security issues (weak authentication, exposed databases) still apply, but there are AI-specific patterns you should learn early: prompt injection, data leakage through outputs, and tool/plugin abuse.

Prompt injection is when an attacker crafts input that causes the model to ignore instructions, reveal hidden prompts, or call tools in unsafe ways. This is especially relevant when the model can take actions (send emails, query internal docs, run code). The beginner threat-model technique is to inventory “connectors” and “tools”: databases, ticket systems, calendars, code execution, retrieval-augmented generation (RAG) over internal documents. Each connector expands the blast radius if the model is manipulated.

Data leakage can happen even without a breach. If the model has access to internal documents, it might summarize or quote sensitive content to an unauthorized user. If logs store prompts and outputs, an internal user might later access private data they shouldn’t. Treat authorization as end-to-end: user identity, retrieval filters, tool permissions, and output filtering all need to align.

  • Common mistake: relying on “the system prompt says don’t do that.” Attackers do not respect instructions, and models are not perfect enforcers.
  • Common mistake: forgetting internal threats. Over-permissioned employees, contractors, or support tooling can create major security exposure.
  • Practical outcome: risks like “prompt injection triggers retrieval of confidential docs,” “model reveals hidden system instructions,” “logs store secrets,” or “plugin calls happen without user confirmation.”

Write security risks with the asset and the attacker. Example: “An external user could use prompt injection to cause the assistant to retrieve HR policies not meant for them, leading to confidential data exposure.” This framing makes it easier to choose controls such as least-privilege permissions, output redaction, allowlists, and human confirmation for high-impact tool actions.

Section 3.6: Misuse basics: dual-use and unintended users

Section 3.6: Misuse basics: dual-use and unintended users

Misuse risk is about how the system might be used in ways you did not intend. “Dual-use” means the same capability can be helpful or harmful depending on intent. A summarization model can help customer support—or help someone scale phishing. An image generator can help design—or help create deceptive content. Beginners sometimes avoid misuse because it feels speculative, but you can make it practical by grounding it in your system’s capabilities.

Start by listing your top capabilities as verbs: generate, rewrite, classify, search, persuade, impersonate, automate actions, extract entities, translate. For each verb, ask “how could this reduce someone else’s agency or safety?” Then consider unintended users: people outside your target audience who may still access the system (public endpoints, shared accounts, leaked API keys), and insiders who might use it to bypass policy.

A practical misuse scan uses three lenses: (1) scale (does AI let a bad actor do more, faster?), (2) quality (does it make harmful content more convincing?), and (3) access (does it lower skill barriers?). If your tool improves any of these for harmful tasks, capture a misuse risk.

  • Common mistake: assuming “our users are trustworthy.” Many incidents involve compromised accounts, coercion, or repurposing by third parties.
  • Common mistake: treating misuse as only “illegal.” Harmful use can be policy-violating, deceptive, or exploitative without being clearly illegal.
  • Practical outcome: risks like “system used to draft phishing messages,” “assistant helps generate disallowed content,” “API used for spam,” or “tool enables impersonation of staff.”

End this chapter by producing your first list of risks grouped by harm type: privacy, fairness, reliability, safety, security, and misuse. Don’t worry yet about perfect wording. What matters is that each risk is concrete (who/what/how), connected to your system map (data and decisions), and ready to be prioritized in the next chapter using likelihood × impact. This is the foundation of a beginner-friendly AI Risk Register.

Chapter milestones
  • Identify privacy and data protection risks in simple terms
  • Recognize bias and unfair outcomes without statistics
  • Find reliability risks: errors, hallucinations, and edge cases
  • Consider security and misuse: prompt injection and abuse cases
  • Checkpoint: produce a first list of risks by harm type
Chapter quiz

1. In this chapter’s beginner threat modeling approach, what is the main goal?

Show answer
Correct answer: Name risks early so you’re not surprised by incidents later
The chapter emphasizes listing and grouping what could go wrong early, without trying to prove it will happen.

2. Which set best matches the four beginner threat modeling steps described in the chapter?

Show answer
Correct answer: List what could go wrong, group by harm type, note who is affected and how, capture enough detail to prioritize later
The chapter defines threat modeling (beginner version) as a simple four-step scan and documentation process.

3. The chapter suggests thinking of AI harms as an interaction of three elements. Which three?

Show answer
Correct answer: Data, decisions, and people
It frames harms as emerging from what the system sees (data), what it does (decisions), and who depends on it (people).

4. According to the chapter, why do many real-world AI failures happen?

Show answer
Correct answer: Normal engineering gaps like missing constraints, unclear ownership, weak documentation, or optimistic assumptions about users
The chapter argues failures are often due to practical engineering and process gaps, not “AI is evil.”

5. What is the recommended habit while reading each section of the chapter?

Show answer
Correct answer: Keep your system map open and add at least one candidate risk to a scratch list after each section
The chapter instructs you to use the system map as you read and build a first risk list by harm type.

Chapter 4: Assess and Prioritize Risks (Likelihood × Impact)

By now you can name common AI harm types and sketch a system at a high level. The next step is deciding what matters most, because you will never have unlimited time, budget, or access to perfect information. Risk assessment is the practical bridge between “we think something could go wrong” and “we will do these specific things next week to reduce harm.”

This chapter introduces a beginner-friendly workflow: convert risk ideas into testable risk statements, score each risk on likelihood and impact using a simple scale, translate those scores into severity levels with clear escalation triggers, and then choose what to address first. The output is a prioritized risk list you can copy into your AI Risk Register with owners and due dates.

Two reminders keep this process grounded. First, risk is about uncertainty: you’re judging what could happen, not what already happened. Second, likelihood × impact is not a prediction engine. It is a shared language for making decisions and documenting why you chose one mitigation over another.

  • Goal: a short list of the most important risks, written clearly enough that someone can test them and act on them.
  • Method: simple scoring (low/medium/high) plus confidence notes.
  • Outcome: align your team on severity, timing, and when to pause or escalate.

You will use engineering judgment throughout. The point is not to be “objectively correct”; it is to be explicit, consistent, and useful—so the next person can understand your reasoning and improve it over time.

Practice note for Turn risk ideas into testable risk statements: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Score likelihood and impact using a simple scale: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Decide severity levels and escalation triggers: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Choose what to address first with limited time and budget: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Checkpoint: build a prioritized risk list: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Turn risk ideas into testable risk statements: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Score likelihood and impact using a simple scale: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Decide severity levels and escalation triggers: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Choose what to address first with limited time and budget: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Sections in this chapter
Section 4.1: Writing clear risk statements (cause → event → harm)

Section 4.1: Writing clear risk statements (cause → event → harm)

A good risk statement is specific enough to test and specific enough to own. Beginners often write risks as vague worries (“bias,” “privacy,” “security”), which makes scoring impossible and mitigation unfocused. Convert each idea into a cause → event → harm statement. This forces you to name what triggers the risk, what happens in the system, and who is harmed.

Use this template:

  • Cause (why it could happen): data, model behavior, process gap, or attacker capability
  • Event (what happens): an AI output, decision, or exposure
  • Harm (so what): privacy loss, unfair treatment, safety incident, financial loss, legal breach, reputational damage

Example conversions:

  • Vague: “Privacy risk.” → Clear: “Because prompts are logged with user identifiers (cause), support staff can view sensitive user messages during debugging (event), leading to unauthorized disclosure of personal data (harm).”
  • Vague: “Hallucinations.” → Clear: “Because the model is not constrained to approved sources (cause), it may fabricate policy instructions (event), causing users to take incorrect actions that lead to financial loss (harm).”
  • Vague: “Bias.” → Clear: “Because training data under-represents non-native speakers (cause), the model may rate their answers as lower quality (event), leading to unfair scoring outcomes (harm).”

Make the statement testable by adding context: which user group, what decision point, what data flow, and what environment (internal tool vs. public). If you can’t imagine a test, the statement is still too abstract. A common mistake is mixing multiple risks into one sentence (“The system may leak data and be biased and be insecure”). Split them: each risk should map to a distinct mitigation and an owner.

Practical outcome: a list of 8–20 crisp risk statements, each tied to a system component (data ingestion, prompt handling, model output, human review, logging, access control). This list becomes the input to likelihood and impact scoring.

Section 4.2: Likelihood: what makes a risk more probable

Section 4.2: Likelihood: what makes a risk more probable

Likelihood is your best judgment of how often the event could happen in the real world given how the system is built and used. You are not estimating exact probabilities; you are sorting risks into an order that drives action. Use a simple three-point scale (Low/Medium/High) and keep your criteria consistent across risks.

Assess likelihood by asking “How easily can the cause lead to the event?” Consider these practical drivers:

  • Exposure: number of users and frequency of use. A daily workflow used by thousands raises likelihood.
  • Attack surface and misuse incentives: public access, valuable data, controversial topics, or known abuse patterns.
  • Controls already in place: access restrictions, rate limiting, human review, monitoring, redaction, allowlists/denylists.
  • Model and data characteristics: known failure modes (e.g., hallucinations), domain shift, noisy labels, sensitive attributes.
  • Process maturity: unclear ownership, lack of incident response, no logging, no rollback plan.

A simple scoring guide many teams find workable:

  • Low: unusual edge case; requires multiple things to go wrong; strong controls or limited access.
  • Medium: plausible under normal use; some controls but gaps remain; occurs occasionally.
  • High: likely under typical use or easy to trigger; weak controls; has happened before or is common in similar systems.

Common mistakes: scoring likelihood based on optimism (“we don’t think users will do that”) rather than observed behavior; ignoring scale (“it’s rare per user” can still be frequent at large volume); and forgetting that new features change likelihood (e.g., adding file upload or web browsing often increases it).

Practical outcome: each risk statement gets a likelihood rating with one sentence of justification (what evidence or reasoning you used). This justification is crucial for later review when the system or controls change.

Section 4.3: Impact: how bad it could be (who, how many, how long)

Section 4.3: Impact: how bad it could be (who, how many, how long)

Impact measures the severity of harm if the event occurs. Impact is not just money. For AI systems, impact often includes unfair outcomes, privacy violations, safety consequences, and erosion of trust. A helpful way to keep impact concrete is to evaluate three dimensions: who is harmed, how many people are affected, and how long the harm persists.

  • Who: vulnerable groups, customers, employees, minors, patients, or the general public. Harm to vulnerable users generally raises impact.
  • How many: single user, a cohort, all users, or downstream partners.
  • How long: temporary annoyance vs. long-lasting consequences (identity theft, denied opportunities, durable records, legal exposure).

Also consider the “blast radius” beyond the direct user: a wrong medical instruction can affect a patient; a leaked dataset can be copied indefinitely; a biased scoring model can change hiring or admissions decisions for months. Some impacts are hard to reverse, which should push the rating upward even if likelihood is uncertain.

A simple impact scale:

  • Low: minor inconvenience, easily reversible, limited to one user, no sensitive data, no meaningful unfairness.
  • Medium: measurable harm (financial, emotional, operational), affects a group, requires support effort or remediation, possible policy breach.
  • High: serious or irreversible harm, safety risk, large-scale privacy exposure, significant discrimination, major legal/regulatory breach, or reputational crisis.

Common mistakes: rating impact based only on “average” users while ignoring worst-affected groups; treating reputational damage as the only serious harm; and confusing impact with likelihood (“it probably won’t happen, so impact is low”). Keep them separate: a rare catastrophic outcome is still high impact.

Practical outcome: each risk statement gets an impact rating plus a brief note on affected stakeholders and why the harm would be hard or easy to reverse.

Section 4.4: Risk matrix: low/medium/high without math heavy steps

Section 4.4: Risk matrix: low/medium/high without math heavy steps

Once you have likelihood and impact, you need a consistent way to translate them into severity so you can prioritize. You do not need decimals or complicated formulas. A simple 3×3 matrix is enough for most beginner programs and keeps debates focused on the drivers rather than the arithmetic.

Use this rule-of-thumb matrix:

  • High severity: High impact with Medium/High likelihood, or Medium impact with High likelihood.
  • Medium severity: Medium impact with Medium likelihood, or High impact with Low likelihood (watch closely).
  • Low severity: Low impact with any likelihood, or Medium impact with Low likelihood.

Then decide what to address first. With limited time and budget, prioritize by severity, but add two practical “tie-breakers”:

  • Cost-effective fixes first: if a small change (e.g., redact logs, tighten access, add a disclaimer plus a guardrail) reduces multiple risks, do it early.
  • Dependency ordering: fix foundational controls (logging, monitoring, rollback, access control) before fine-tuning prompts, because foundations make everything else safer and easier to verify.

Common mistake: treating the matrix as a one-time classification. In reality, severity changes when you ship new features, change data sources, expand to new users, or add safeguards. Document the version/date of the assessment so you can revisit it.

Practical outcome: a prioritized risk list. At minimum, every High severity risk should have (1) an owner, (2) a near-term mitigation plan, and (3) a target date. Medium risks should have either a planned mitigation or a decision to accept/monitor. Low risks should still be recorded so they don’t disappear.

Section 4.5: Confidence and unknowns (what you don’t know yet)

Section 4.5: Confidence and unknowns (what you don’t know yet)

Beginners often feel pressure to “get the score right.” In practice, what matters is whether you understand your uncertainty. Two teams can assign different ratings and both be reasonable if they document assumptions and unknowns. Add a confidence tag to each risk (High/Medium/Low confidence) and list what information would change the rating.

Low confidence is common when:

  • you lack real usage data (early pilots, internal-only prototypes),
  • you cannot observe failures (no monitoring, no audit logs, no user feedback loops),
  • the model is a vendor black box (limited transparency into training data and safeguards),
  • the system is deployed into a changing environment (policy changes, new adversarial tactics).

Turn unknowns into actions. For each low-confidence item, write a short “learning task” with an owner and due date, such as: run a red-team session for prompt injection, sample 200 outputs for hallucinations in a key workflow, measure error rates by user segment, or verify whether logs store personal data. These tasks are often cheaper than full mitigations and can prevent wasted effort.

Common mistake: using low confidence as a reason to ignore a risk. If potential impact is high, low confidence is itself a warning sign. You may not need to fully fix the issue immediately, but you should at least instrument the system, restrict exposure, or add review gates while you learn.

Practical outcome: your prioritized list now includes not only mitigations, but also “evidence-building” tasks. This makes your AI Risk Register realistic: it reflects what you know, what you don’t, and how you plan to close the gaps.

Section 4.6: When to stop, pause, or escalate (basic decision rules)

Section 4.6: When to stop, pause, or escalate (basic decision rules)

Risk assessment is only useful if it changes decisions. You need simple decision rules that tell the team when it’s safe to proceed, when to pause a release, and when to escalate to legal, security, privacy, or leadership. These rules prevent “analysis paralysis” on one end and reckless shipping on the other.

Use basic escalation triggers tied to severity and confidence:

  • Stop/Block release: any High severity risk with no mitigation plan, or any High impact risk with Low confidence and high exposure (public launch, sensitive users).
  • Pause and reduce scope: Medium/High risks where you can temporarily limit features (disable file upload, restrict to internal users, require human review) while you gather evidence.
  • Escalate for review: risks involving regulated data (health, finance, children), potential discrimination in consequential decisions, security vulnerabilities, or contractual compliance requirements.
  • Proceed with monitoring: Low severity risks, or Medium risks with clear controls, owners, and monitoring metrics (plus a rollback plan).

Decide what “done for now” means. A mitigation is not complete because it was discussed; it’s complete when it is implemented, verified, and owned. Verification can be lightweight: a checklist item, a log review, a small test set, or a documented walkthrough. The key is to make it repeatable.

Common mistake: escalating too late, after the system is widely used. Build escalation into your process early: if a risk touches privacy, security, or discrimination, bring the right experts in while the design is still flexible.

Checkpoint outcome: you should now be able to produce a prioritized risk list (top 5–10) with likelihood, impact, severity, confidence, owners, and near-term next steps. This list becomes the backbone of your risk register and the starting point for Chapter 5’s documentation and assignment of actions.

Chapter milestones
  • Turn risk ideas into testable risk statements
  • Score likelihood and impact using a simple scale
  • Decide severity levels and escalation triggers
  • Choose what to address first with limited time and budget
  • Checkpoint: build a prioritized risk list
Chapter quiz

1. What is the main purpose of assessing and prioritizing AI risks in this chapter’s workflow?

Show answer
Correct answer: To decide what matters most so the team can take specific next steps to reduce harm with limited time and budget
Risk assessment bridges vague concerns to concrete actions, helping teams choose what to address first under constraints.

2. Why does the chapter emphasize turning risk ideas into testable risk statements?

Show answer
Correct answer: So someone can verify the risk and act on it, rather than keeping it as a vague concern
Testable statements make risks clear enough to check and to drive mitigations.

3. Which best describes how likelihood × impact should be used according to the chapter?

Show answer
Correct answer: As a shared language to support decisions and document why mitigations were chosen
The chapter warns it is not a prediction engine; it’s a practical, communicable decision tool.

4. What additional note should accompany simple likelihood/impact scores to keep the assessment grounded?

Show answer
Correct answer: Confidence notes about how sure you are in the scoring
The method pairs low/medium/high scoring with confidence notes to reflect uncertainty and imperfect information.

5. What is the key output of Chapter 4’s workflow?

Show answer
Correct answer: A prioritized risk list that can be copied into an AI Risk Register with owners and due dates
The chapter’s outcome is a short, prioritized list aligned on severity, timing, and escalation triggers.

Chapter 5: Document the Work (Risk Register + Evidence Pack)

Risk work that lives only in people’s heads does not scale. Teams change, models change, and assumptions quietly expire. Documentation is how you keep risk thinking attached to the system as it evolves—so you can show what you knew, what you decided, what you tested, and what you changed.

This chapter focuses on a practical “minimum viable” documentation pack: a simple AI Risk Register plus a lightweight Evidence Pack. The Risk Register is where you track what could go wrong, how bad it would be, and who is responsible for reducing it. The Evidence Pack is the supporting material—tests, reviews, notes, screenshots, and decision records—that proves the work happened.

The goal is not paperwork for its own sake. Good documentation reduces rework, accelerates approvals, and prevents repeated debates. It also makes it easier to respond when something goes wrong: you can find the owner, the last decision, and the last known test results quickly.

As you read, keep one guiding question in mind: “If I left the project tomorrow, could someone else understand the current risk posture and continue the work safely?” If the answer is no, your documentation pack is not done yet.

Practice note for Create a simple AI Risk Register that anyone can read: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Assign owners, due dates, and proof of completion: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Capture key decisions and trade-offs in a decision log: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Collect lightweight evidence: tests, reviews, user feedback notes: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Checkpoint: assemble a “minimum viable” risk documentation pack: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Create a simple AI Risk Register that anyone can read: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Assign owners, due dates, and proof of completion: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Capture key decisions and trade-offs in a decision log: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Collect lightweight evidence: tests, reviews, user feedback notes: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Sections in this chapter
Section 5.1: Risk register fields (ID, description, owner, status)

Section 5.1: Risk register fields (ID, description, owner, status)

A beginner-friendly AI Risk Register is a table that anyone on the team can read without needing specialized governance knowledge. Keep it short, consistent, and searchable. The register is not a brainstorm list—it is a tracking tool that turns “we should” into “we did,” with accountability.

Start with four required fields and only add more when you feel the pain of not having them:

  • ID: A unique identifier like R-001, R-002. IDs prevent confusion when rows are re-ordered or duplicated across documents.
  • Description: One clear sentence describing the harm and the mechanism. Example: “Customer support chatbot may reveal personal data from prior conversations when prompted with partial identifiers.” Avoid vague entries like “privacy risk.”
  • Owner: A specific person (not a team) responsible for driving the mitigation to completion. They can delegate tasks, but they remain accountable.
  • Status: A small set of states such as Open, In progress, Blocked, Mitigated, Accepted. Add a short note when changing status.

In practice, you will also want a few supporting fields for prioritization and clarity: likelihood, impact, risk score, target date (due date), and a link to evidence. But keep the “front door” simple so people actually use it.

Engineering judgment matters most in the description. Write it so a non-expert can picture the failure mode, the affected users, and the consequence. A common mistake is to document only the outcome (“bias”) without the pathway (“training data under-represents group X, causing lower approval rates”). When the pathway is captured, mitigations become obvious and testable.

Finally, treat the register as a living queue: every open risk must have an owner and next step. Rows without owners are wishes, not plans.

Section 5.2: Controls vs tasks: what changes vs what you do

Section 5.2: Controls vs tasks: what changes vs what you do

Many teams confuse “controls” with “tasks.” A task is an action you take (run a test, review a dataset, update a prompt). A control is the durable mechanism that reduces risk (rate limiting, access control, redaction, human review, monitoring). Tasks create or verify controls, but controls are what actually change the system’s risk profile.

This distinction matters because tasks can be “done” while risk remains unchanged. For example, “Hold a fairness meeting” is a task; it does not reduce unfair outcomes by itself. “Implement group-based performance monitoring with alert thresholds” is a control; it changes detection and response capability.

When you add mitigations to a risk register, write them in control language and then break them into tasks. A useful pattern is:

  • Control statement: “User inputs containing secrets are detected and blocked before reaching the model.”
  • Implementation tasks: Add secret-scanning regex + classifier, blocklist response template, integration tests.
  • Verification tasks: Red-team prompts, log sampling, periodic audits.

Assign owners and due dates at the task level, but track whether the control is actually in place and effective. If a control depends on ongoing behavior (like weekly review), document that cadence as part of the control, not as a one-time task.

A common mistake is to list mitigations that are not feasible in the real operating environment. For example, “human review of all outputs” may be impossible at scale; the better control might be “human review for high-risk intents” plus robust refusal behavior elsewhere. Document these trade-offs explicitly so you can defend why you chose a scalable control.

Practical outcome: your register becomes a map from risk → control → tasks → evidence, which is exactly what reviewers and future maintainers need.

Section 5.3: Evidence basics: what counts as proof

Section 5.3: Evidence basics: what counts as proof

An Evidence Pack is not a giant folder of random screenshots. It is a curated set of artifacts that prove key claims: “We tested X,” “We reviewed Y,” “We made decision Z,” and “We implemented control C.” The goal is traceability: a risk row should link to the evidence that supports its status.

Evidence can be lightweight and still credible. Useful categories include:

  • Test results: evaluation reports, metrics dashboards, bias slices, prompt-attack test logs, regression test outputs.
  • Reviews and approvals: privacy review notes, security sign-off, architecture review outcomes, legal guidance summaries.
  • User feedback notes: support tickets summaries, usability sessions, complaint themes, “known confusion” notes.
  • Operational artifacts: monitoring alerts configuration, runbooks, incident response playbooks, access-control policy exports.

Good evidence has four properties: it is dated, attributable (who produced or approved it), tied to a specific system version, and easy to interpret. A one-line statement like “tested and looks good” fails on all four.

Don’t over-optimize. For beginners, a simple approach is to create an “evidence index” document with links and short descriptions: “E-014: Prompt injection test suite results v0.3 (run on build 1.8.2, 2026-03-10).” Then, in the risk register, link to E-014 in the “proof” column.

Common mistakes: collecting evidence that is not connected to a risk, storing files without version context, and failing to capture negative results. If a test failed and you fixed it, include both the failing and passing runs; this shows progress and prevents repeating the same bug later.

Practical outcome: when someone asks “How do we know this risk is mitigated?”, you can answer with a link, not a meeting.

Section 5.4: Model/data summaries (plain-language model card style)

Section 5.4: Model/data summaries (plain-language model card style)

Your documentation pack needs a plain-language summary of the model and its data—similar in spirit to a model card, but lighter. This is not marketing content. It is operational clarity: what the system is, what it is for, and what it should not be used for.

A practical template that fits on one page includes:

  • Purpose and users: Who uses it and for what decisions or actions.
  • Inputs and outputs: What data goes in (including sensitive fields) and what comes out (including confidence scores, explanations, or free text).
  • Model basics: Model type (rules/ML/LLM), provider or training origin, and deployment setting (API, on-device, internal tool).
  • Known limitations: Where it performs poorly, where it may hallucinate, unsupported languages, edge cases.
  • Safety boundaries: Disallowed use cases (e.g., medical diagnosis), refusal behavior, and escalation paths.
  • Data notes: Data sources, collection time window, label quality, known skews, retention rules, and whether personal data is included.

Write for a smart non-specialist. Avoid jargon like “distribution shift” without explanation; instead say, “If user behavior changes (new products, new slang), accuracy may drop and we may need retraining.” Include one or two example inputs/outputs that reflect normal use and one that reflects an unacceptable use case.

This summary connects directly to risk identification. If the system outputs free-form text to end users, you should expect risks like harmful content, fabricated claims, and prompt injection. If the input includes personal data, privacy and retention risks move up the list. The model/data summary becomes the “map” that makes the risk register make sense.

Common mistake: documenting what the model could do instead of what it is allowed to do. Make the intended use explicit; many harms come from capability being mistaken for permission.

Section 5.5: Incident and issue tracking (what to record and why)

Section 5.5: Incident and issue tracking (what to record and why)

Even with good controls, issues happen: incorrect outputs, biased behavior, data leaks, security probes, or user misuse. Incident and issue tracking is how you learn systematically instead of repeatedly “firefighting.” For beginners, you do not need a complex incident management program—just a consistent record.

Track two related streams:

  • Issues: Bugs, near-misses, or low-severity problems discovered internally or via users (e.g., “model confuses two product names,” “refusal triggers too often”).
  • Incidents: Higher-severity events with real-world impact or policy breach potential (e.g., “PII exposed,” “unsafe advice provided,” “account takeover led to misuse”).

For each issue/incident, record: date/time, reporter, affected system version, user impact, reproduction steps or example prompts, immediate containment action, root cause (when known), and follow-up tasks with owners and due dates. Link the item back to the relevant risk register row; if no row exists, create one. This closes the loop between real-world failures and planned mitigations.

Capture trade-offs in the write-up. For example, tightening content filters may reduce harm but increase false positives that frustrate users. Document what threshold you chose and why, and what monitoring will tell you if it needs adjustment.

Common mistakes: deleting “embarrassing” examples (they are often the most educational), failing to record the exact prompt/context that caused failure, and treating incidents as one-off events instead of signals of a control gap.

Practical outcome: over time you build an evidence trail that demonstrates continuous improvement, not just initial compliance.

Section 5.6: Keeping documents alive: versioning and review cadence

Section 5.6: Keeping documents alive: versioning and review cadence

Documentation fails when it becomes stale. The easiest way to keep it alive is to tie updates to events you already have: releases, data refreshes, and incidents. Treat the Risk Register, model/data summary, and decision log as part of the system—not as separate “governance paperwork.”

Use simple versioning rules:

  • System version linkage: Every evidence item and summary should reference the deployed build/model version (or a timestamped configuration snapshot).
  • Change log: Add a short “What changed” section at the top of the model/data summary and decision log.
  • Immutable evidence: Store evidence artifacts in a place where edits are tracked (repo history, ticketing system, or controlled drive).

Set a review cadence that matches risk. A low-risk internal tool might review monthly or per release; a customer-facing model affecting decisions might review weekly metrics plus a formal monthly risk review. The key is to define the cadence in writing and assign an owner for the review itself. “Everyone will keep it updated” usually means no one will.

Your decision log is especially important for keeping context. Record key choices such as selecting a dataset, changing a threshold, enabling a new feature, or accepting a residual risk. Each entry should include: the decision, alternatives considered, trade-offs, approver, and link to supporting evidence. This prevents “decision amnesia” where the same debate repeats every quarter.

Checkpoint: assemble your minimum viable documentation pack. At minimum you should have (1) a readable risk register with owners/dates/status, (2) an evidence index with links, (3) a plain-language model/data summary, and (4) a decision log. If you can hand that pack to a new teammate and they can safely operate the system, your documentation is doing its job.

Chapter milestones
  • Create a simple AI Risk Register that anyone can read
  • Assign owners, due dates, and proof of completion
  • Capture key decisions and trade-offs in a decision log
  • Collect lightweight evidence: tests, reviews, user feedback notes
  • Checkpoint: assemble a “minimum viable” risk documentation pack
Chapter quiz

1. What is the main reason Chapter 5 says documentation is necessary for AI risk work?

Show answer
Correct answer: It keeps risk thinking attached to the system as teams, models, and assumptions change
The chapter emphasizes that risk work in people’s heads doesn’t scale; documentation preserves what was known, decided, tested, and changed as the system evolves.

2. In the chapter’s “minimum viable” documentation pack, what are the two core components?

Show answer
Correct answer: An AI Risk Register and a lightweight Evidence Pack
The chapter defines a practical minimum viable pack as a simple Risk Register plus an Evidence Pack that supports it.

3. Which description best matches the purpose of the AI Risk Register in Chapter 5?

Show answer
Correct answer: A place to track what could go wrong, how severe it would be, and who is responsible for reducing it
The Risk Register is for tracking risks, their potential impact, and ownership for mitigation.

4. Which item is an example of what belongs in the Evidence Pack according to the chapter?

Show answer
Correct answer: Tests, reviews, user feedback notes, and decision records
The Evidence Pack is supporting material that demonstrates the risk work happened (e.g., tests, reviews, notes, screenshots, decision records).

5. The chapter’s guiding question for judging whether documentation is “done” focuses on whether:

Show answer
Correct answer: A new team member could understand the current risk posture and continue the work safely if you left tomorrow
The chapter frames completion as ensuring continuity: someone else should be able to understand the risk posture and proceed safely.

Chapter 6: Reduce Harm and Monitor Over Time

In the earlier chapters you learned how to name harms, estimate likelihood and impact, and write them down in a simple risk register with owners and due dates. This chapter is about doing the work that actually reduces harm: choosing practical controls, setting operating limits, communicating clearly with users, and monitoring the system after release. “Reduce harm” is not a single feature or policy—it is a chain of small decisions that makes bad outcomes less likely, less severe, easier to detect, and faster to recover from.

A common beginner mistake is treating safety as a one-time “launch gate.” Real systems change: user behavior evolves, model behavior drifts, and your product surface area grows. So you need two mindsets at once: (1) prevention and guardrails before launch, and (2) detection and response after launch. Your risk register becomes a living document that links controls to specific risks, clarifies who monitors what, and defines what “good enough” looks like for your first release.

Throughout this chapter, aim for controls you can operate. A control that sounds great but cannot be measured, monitored, or owned is effectively not a control. Prefer lightweight mechanisms: clear warnings and instructions, safe defaults, rate limits, basic logging, and a starter incident process. You can always mature them later.

  • Prevent: reduce the chance a harmful event happens (e.g., data minimization, input validation).
  • Detect: notice issues quickly (e.g., monitoring, user feedback channels, audits).
  • Respond: limit damage and recover (e.g., rollback, disable features, incident playbooks).

By the end of this chapter you should be able to convert your prioritized risks into an actionable “control plan,” add human review where it matters, design user-facing transparency, and complete a simple launch readiness checklist that includes monitoring and incident response.

Practice note for Pick practical risk controls: prevent, detect, respond: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Add human review and safe operating limits where needed: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Design user-facing transparency: warnings, instructions, and consent: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Set up monitoring and an incident response starter plan: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Final checkpoint: complete a launch readiness checklist: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Pick practical risk controls: prevent, detect, respond: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Add human review and safe operating limits where needed: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Design user-facing transparency: warnings, instructions, and consent: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Sections in this chapter
Section 6.1: Risk controls 101: reduce likelihood vs reduce impact

Section 6.1: Risk controls 101: reduce likelihood vs reduce impact

Risk controls are actions you take to change the risk equation. In beginner-friendly terms: you can make a bad thing less likely to happen, or you can make it hurt less when it does happen. Both approaches matter, and the best plans usually mix them.

Reducing likelihood is about prevention. Examples: collecting less sensitive data, blocking unsafe prompts, requiring authentication, limiting model tools, or adding validation checks. Reducing impact is about containment. Examples: showing outputs as “draft,” limiting the decision scope, requiring a human to approve high-stakes actions, or offering an easy appeal path for users.

  • Prevent: prompt and input constraints; least-privilege access; training data screening; secure secrets management.
  • Detect: automated tests; anomaly alerts; sampling and review; user reporting.
  • Respond: kill switch; rollback; output disabling; customer support runbooks.

Connect controls directly to items in your risk register. For each top risk, add 2–4 controls and label which part they affect: likelihood or impact, and prevent/detect/respond. Also add an owner and a measurable check. “Add safety filter” is vague; “Block self-harm instructions using classifier X; alert on >1% blocked requests per day; owner: Safety Eng” is actionable.

Engineering judgment matters when controls conflict with product goals. Beginners often over-control everything and break usability, or under-control because they fear friction. Use risk-based scope: apply strongest controls to highest impact decisions (e.g., medical, financial, employment, legal) and lighter controls to low-stakes convenience features.

Common mistake: writing controls that are not enforceable. If you rely on a policy (“Users must not…”) without technical or operational enforcement, treat it as a weak control and pair it with stronger ones (rate limits, logging, reviews, or feature restrictions).

Section 6.2: Human-in-the-loop: approvals, overrides, and guardrails

Section 6.2: Human-in-the-loop: approvals, overrides, and guardrails

Human review is not a magic shield, but it is a practical control when the cost of a mistake is high or when automated methods are unreliable. The goal is to place humans at the points where they can meaningfully reduce impact: before an action is taken, when a decision is ambiguous, or when the system is outside its safe operating limits.

Start by defining when human approval is required. Good triggers include: high-stakes categories (credit decisions, hiring recommendations), low confidence scores, novel user segments, policy-sensitive content, or any request that would perform an irreversible action (sending an email, approving a refund, changing account access).

  • Approvals: the model drafts; a reviewer confirms before sending/acting.
  • Overrides: reviewers can correct outputs and those corrections are stored for later analysis.
  • Guardrails: hard limits the model cannot cross (e.g., cannot submit payments; cannot access raw PII).

Design the workflow so humans can do the job well. Provide the right context (input, retrieved sources, model rationale if available), a clear decision to make, and an audit trail. A common mistake is creating “checkbox approvals” where reviewers lack time or information, leading to rubber-stamping. If you cannot staff real review, tighten the model’s operating scope instead.

Also define safe operating limits. Examples: only answer from a specific knowledge base; only generate summaries for documents under N pages; only operate in certain regions; only allow certain languages until tested; block medical or legal advice entirely. Limits reduce likelihood by preventing the model from entering failure-prone zones.

Finally, be explicit about accountability. If a human approves, who is responsible for the outcome? Write it in your decision log and your risk register owner field. “The AI did it” is never an acceptable explanation in real operations.

Section 6.3: Data controls: minimization, retention, and access

Section 6.3: Data controls: minimization, retention, and access

Many AI harms are data harms: privacy violations, leakage of sensitive information, biased or unrepresentative datasets, and unauthorized access. Data controls are often your highest-leverage risk reductions because they reduce both likelihood (fewer chances to leak) and impact (less sensitive data exposed if something goes wrong).

Minimization means collecting only what you need. Ask: can we complete the task without names, exact addresses, or full free-text? Can we replace raw identifiers with tokens? Can we process on-device or in-memory without storing? Beginners often keep “just in case” data; that becomes liability. If you are not using a field in a product requirement, remove it.

Retention means keeping data only as long as necessary. Set default retention periods for logs, prompts, and outputs. A practical approach is tiered retention: short retention for raw content (days/weeks), longer retention for aggregated metrics (months), and strict exceptions with approval. Make deletion real: define where the data lives (app logs, analytics tools, vendor systems) and how deletion requests propagate.

  • Access control: least privilege; separate roles for engineers, analysts, and support.
  • Encryption: in transit and at rest; rotate keys; do not hardcode secrets.
  • Vendor boundaries: know what your model provider stores; configure opt-outs where available.

Connect these to user-facing consent and transparency. If you use user inputs for product improvement or model training, say so clearly and offer meaningful choices. “By using this, you agree…” buried in legal text is not user-centered transparency. Good transparency is timely: show it at the moment the user is about to share data.

Common mistake: forgetting derived data. Even if you remove direct identifiers, embeddings, conversation histories, and analytics events can still be sensitive. Document these in your data notes template and treat them as part of your data inventory.

Section 6.4: Product controls: rate limits, content filters, and fallback

Section 6.4: Product controls: rate limits, content filters, and fallback

Product controls are the practical mechanisms that shape how the AI behaves in the real world. They are often easier to ship than model changes, and they are essential for misuse resistance and reliability. Think of them as “safe operating mechanics”: they control volume, scope, and failure behavior.

Rate limits reduce abuse and contain blast radius. Apply limits per user, per IP, and per organization, and consider separate limits for expensive or risky capabilities (tool use, file upload, code execution). Add backoff and clear error messages so legitimate users understand what happened. A common mistake is shipping one global limit; attackers will distribute requests across accounts.

Content filters can block disallowed requests or outputs (self-harm instructions, explicit hate, doxxing). Use them as a layer, not a guarantee. Pair automated filters with: category-specific warnings, constrained modes (e.g., “general information only”), and escalation paths to humans for borderline cases.

  • Safe defaults: start in the most conservative mode; require opt-in for higher-risk features.
  • Fallback behavior: when uncertain, degrade gracefully (refuse, ask clarifying questions, route to human support, or provide non-personalized generic guidance).
  • Transparency UX: label outputs as AI-generated, show limitations, and provide “how to use safely” instructions.

User-facing transparency is a control when it changes behavior. Good warnings are specific: “This tool can be wrong; verify with official sources before submitting tax forms.” Good instructions reduce misuse: “Do not enter passwords or personal health details.” Consent prompts should be understandable and aligned with data practices described in your data notes.

Common mistake: relying on one control. For example, a single prompt instruction (“don’t do harmful things”) is not a control by itself. Layer: input checks + output filters + rate limits + logging + fallback + human escalation for high-stakes categories.

Section 6.5: Monitoring: drift, feedback loops, and red flags

Section 6.5: Monitoring: drift, feedback loops, and red flags

Monitoring turns safety from a promise into an operational practice. You are watching for changes in the system, its environment, and its users that increase risk. The most important beginner step is to pick a small set of signals you can actually review weekly.

Drift is when inputs or outputs shift over time. Inputs drift when user prompts change (new slang, new use cases, seasonal spikes). Outputs drift when the model version changes, retrieval sources update, or prompts are tweaked. Monitor basic distributions: topic categories, language, length, refusal rates, and tool-call frequency. For a customer-support assistant, also monitor resolution rate and escalation rate.

Feedback loops happen when the system’s outputs influence its future inputs or the world around it. Examples: a recommender amplifies extreme content because it boosts engagement; an HR screener filters candidates and changes who applies; a fraud model changes attacker strategies. Watch for second-order effects: are certain groups dropping out, complaining more, or being disproportionately refused?

  • Red flags: spikes in complaints; sudden increase in blocked content; repeated hallucinations about a specific topic; unusual access patterns; higher error rates for a user segment.
  • Channels: in-product “report” button; support ticket tags; internal sampling review; security alerts.
  • Metrics with owners: each metric needs a reviewer, a threshold, and a response action.

Design monitoring with privacy in mind. You often do not need to store full raw prompts to detect issues; store hashed identifiers, categories, and minimal excerpts with strict access. Document what you log and why in your data notes, and link it back to the risks it helps detect.

Common mistake: collecting dashboards without decisions. For every monitored signal, write down: “If it crosses threshold X, we will do Y within Z hours.” That converts monitoring into an actual control.

Section 6.6: Incident basics: report, triage, fix, and learn

Section 6.6: Incident basics: report, triage, fix, and learn

Even with strong controls, incidents happen: private data appears in outputs, the model gives dangerous instructions, a jailbreak spreads, or a tool integration performs an unintended action. A starter incident plan prevents panic and reduces impact. Keep it lightweight, but make it real.

Report: define how incidents are raised. Provide an internal channel (ticket queue or on-call alias) and a user channel (report button or support form). Require a minimal report format: what happened, when, user impact, screenshots/log IDs, and severity guess. Do not depend on “someone noticing in Slack.”

Triage: decide severity and immediate containment. Typical questions: Is anyone at immediate risk? Is sensitive data exposed? Is misuse ongoing? Can we disable a feature flag, tighten filters, revoke keys, or rate-limit? Assign an incident lead and record actions in a decision log for later learning.

  • Fix: patch prompts, filters, retrieval sources, permissions, or UI flows; add tests to prevent recurrence.
  • Communicate: inform affected users when appropriate; coordinate with legal/security for breaches.
  • Learn: run a blameless post-incident review; update the risk register, model card summary, and controls.

A practical “launch readiness checklist” ties this together. Before launch, confirm: top risks have controls with owners; human review triggers are implemented; safe operating limits are documented; user-facing warnings/instructions/consent are in the UI; logging and monitoring have thresholds and responders; a kill switch exists; and the incident process has named contacts. If any of these are missing for a high-impact risk, you are not “almost ready”—you are choosing to accept that risk. Make that choice explicit in the risk register, with leadership sign-off and a due date to revisit.

Common mistake: treating incidents as failures to hide. In safety work, incidents are also signals that your controls and assumptions need updating. The fastest teams to learn are the ones that document, fix, and feed improvements back into the system design.

Chapter milestones
  • Pick practical risk controls: prevent, detect, respond
  • Add human review and safe operating limits where needed
  • Design user-facing transparency: warnings, instructions, and consent
  • Set up monitoring and an incident response starter plan
  • Final checkpoint: complete a launch readiness checklist
Chapter quiz

1. Which approach best matches the chapter’s definition of “reduce harm” for an AI system?

Show answer
Correct answer: A chain of practical decisions that make bad outcomes less likely, less severe, easier to detect, and faster to recover from
The chapter emphasizes harm reduction as an operational chain of small, practical controls across prevention, detection, and response.

2. Why does the chapter warn against treating safety as a one-time “launch gate”?

Show answer
Correct answer: Because real systems change over time, so you need both pre-launch guardrails and post-launch detection/response
User behavior evolves, models drift, and product surface area grows, so monitoring and response after release are required.

3. Which set correctly matches the chapter’s prevent/detect/respond control types to their purpose?

Show answer
Correct answer: Prevent reduces the chance of harm, Detect notices issues quickly, Respond limits damage and helps recovery
The chapter defines each control type by its role: reduce likelihood, find problems fast, and recover/limit impact.

4. According to the chapter, what makes a proposed control effectively “not a control”?

Show answer
Correct answer: It sounds good but cannot be measured, monitored, or owned
Controls must be operable—measurable, monitorable, and owned—or they won’t reliably reduce harm.

5. Which action best reflects the chapter’s recommendation for early-stage harm reduction and operations readiness?

Show answer
Correct answer: Prefer lightweight, operable mechanisms (safe defaults, rate limits, basic logging) and include monitoring plus a starter incident response plan in launch readiness
The chapter advises starting with controls you can operate now and ensuring monitoring and incident response are part of launch readiness.
More Courses
Edu AI Last
AI Course Assistant
Hi! I'm your AI tutor for this course. Ask me anything — from concept explanations to hands-on examples.