AI Ethics, Safety & Governance — Beginner
Learn a simple, repeatable process to spot and reduce AI harm.
AI tools can help people work faster, but they can also cause harm: private data can leak, people can be treated unfairly, and confident-looking outputs can be wrong. If you are new to AI, this can feel overwhelming—especially when you’re asked to “do a risk assessment” without clear steps.
This beginner course is a short, book-style guide that teaches AI risk from first principles. You will learn a simple workflow to assess risks, document what you found, and reduce harm in practical ways. You do not need any coding, math, or data science background. Everything is explained in plain language and backed by real-world examples.
By the final chapter, you will be able to describe an AI system at a high level, list its most likely harms, prioritize what matters most, and capture your work in a clear set of documents that others can review. You will also learn how to choose realistic risk controls—like human review, operating limits, user disclosures, monitoring, and a basic incident plan—so your risk work leads to action.
The course progresses like a short technical book. Chapter 1 builds the basic vocabulary: what AI is, what risk is, and how harm shows up in the real world. Chapter 2 teaches you to “map the system,” because you can’t assess risk if you don’t know what the system does and who it affects. Chapter 3 gives you a practical way to spot harms without needing advanced technical knowledge.
Once you can list potential harms, Chapter 4 shows you how to prioritize them using a simple scoring method so you know what to tackle first. Chapter 5 turns your assessment into useful documentation: a risk register, decision log, and lightweight evidence that supports your conclusions. Finally, Chapter 6 focuses on reducing harm and staying safe over time with controls, monitoring, and a basic incident response approach.
This course is designed for absolute beginners: students, individual professionals, managers, policy staff, procurement teams, and anyone who needs to understand AI risks without becoming an engineer. It is also useful for small organizations that need a clear, lightweight process before using AI tools in customer-facing or employee-facing work.
If you want a simple, repeatable way to handle AI risk—without jargon—enroll and begin building your first risk documentation pack. Register free or browse all courses to find related beginner topics.
AI Governance Lead and Risk Educator
Sofia Chen helps teams adopt AI responsibly by turning complex safety and governance topics into practical, beginner-friendly steps. She has supported risk reviews for AI features in customer service, hiring support, and content tools, focusing on documentation, testing, and clear decision-making.
“AI risk” sounds abstract until you connect it to ordinary work: a support chatbot that gives the wrong refund policy, a résumé screener that quietly filters out qualified candidates, or a medical note summarizer that omits an allergy. In each case, the system may be built with good intentions, but it can still cause harm. This chapter builds a plain-language foundation you can reuse across projects: what AI is (and isn’t), what “risk” means, how harm happens, who is affected, and how to write your first risk statement.
A key idea for beginners: risk work is not about proving a system is “safe” forever. It is about being able to explain the system, anticipate plausible failures and misuse, prioritize what matters most, and document decisions so improvements are repeatable. By the end of this chapter, you should be able to describe one AI risk scenario in your own words using a one-sentence template, which becomes the seed for a risk register later in the course.
We will treat risk as a practical workflow, not a philosophical debate: map the system at a high level (purpose, users, data in/out, decisions), identify common harm types (privacy, bias, errors, security, misuse), estimate likelihood and impact, assign owners, and track fixes with due dates. This is engineering judgment applied with humility: you won’t know everything, but you can know enough to reduce harm.
As you read, keep one system in mind—something you use at work or in daily life. You will use it to practice writing a clear risk statement at the end of the chapter.
Practice note for Define AI, automation, and “risk” using everyday examples: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.
Practice note for Understand why AI can cause harm even with good intentions: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.
Practice note for Learn the difference between mistakes, harm, and responsibility: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.
Practice note for Set the course goal: a repeatable beginner risk workflow: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.
Practice note for Checkpoint: describe one AI risk scenario in your own words: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.
Practice note for Define AI, automation, and “risk” using everyday examples: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.
Practice note for Understand why AI can cause harm even with good intentions: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.
In plain language, an AI system takes inputs (text, images, clicks, sensor readings, database records), finds patterns based on past examples, and produces outputs (a label, score, ranking, recommendation, or generated content). A spam filter labels email; a credit model outputs a risk score; a routing system recommends a driver path; a generative assistant drafts text. The “intelligence” is mostly pattern-fitting: the system learns correlations that help it predict or generate outputs that look like training examples.
It helps to separate AI from automation. Automation means a process runs with limited human involvement (for example, automatically rejecting an application below a threshold). AI is one possible component inside automation. You can have automation with no AI (simple rules), and you can have AI with no automation (a tool that suggests but a human decides). Many real systems combine both: an AI score triggers an automated action unless a human intervenes.
For risk work, always map the system as a simple pipeline: (1) purpose, (2) users, (3) input data sources, (4) model behavior, (5) output format, (6) decision point, and (7) feedback loop (what gets logged or learned). This mapping prevents a common beginner mistake: focusing only on the model and forgetting the surrounding product. Most harms happen at the seams—where data is collected, how outputs are shown, and how people act on them.
Engineering judgment starts with asking: “If the output is wrong, who will notice, and how fast?” An AI that drafts marketing copy has different risk than one that influences hiring, lending, healthcare, or safety-critical operations. The same model can be low risk in one context and high risk in another.
Risk is not the same as a bug, and it is not the same as harm. A useful beginner definition is: risk = uncertainty × consequences. Uncertainty means you do not fully control what will happen—because the world changes, data is incomplete, people behave unexpectedly, and models can generalize poorly. Consequences mean there is something at stake: money, access, health, rights, trust, or safety.
This is why AI can cause harm even with good intentions. You can design a system to “help,” but if the model is uncertain in some situations (rare cases, new user groups, changed policies), the output can still lead to negative outcomes. A well-meaning fraud model might wrongly block legitimate customers; a content moderation model might silence certain dialects; a summarizer might omit key constraints in a contract draft.
To prioritize, use a simple likelihood × impact method. Likelihood asks: how often could this happen in the real deployment? Impact asks: if it happens, how bad is it for the affected people and the organization? Beginners often over-focus on spectacular but rare scenarios. Good practice is to list several plausible risks, then rank them consistently.
Common mistake: treating likelihood as “model accuracy.” Accuracy is measured on a test set; likelihood is about real life: data drift, user behavior, adversarial inputs, operational failures, and how strongly the system’s output influences decisions. Another mistake is only measuring impact in dollars. Impact should include human outcomes: privacy intrusion, unfair exclusion, reputational damage, or increased vulnerability to fraud.
As you continue the course, this simple scoring becomes the backbone of a beginner risk register: each risk gets a score, an owner, a mitigation plan, and a due date—so risk management becomes a routine practice, not a one-time document.
Think of harm as a chain of events, not a single moment. If you only look for “model mistakes,” you will miss many sources of risk. A practical mental model is the harm chain: where problems enter, how they propagate, and where they become real-world consequences.
Start by locating entry points:
Now connect these to the common harm types you will use throughout the course: privacy (exposing or misusing personal data), bias (unequal error rates or unfair outcomes), errors (wrong outputs leading to wrong actions), security (attacks on the model or pipeline), and misuse (harmful applications beyond the intended use). Most real incidents involve more than one type—for example, a security failure can become a privacy breach.
Distinguish mistakes, harm, and responsibility. A mistake is a technical or operational error (wrong label, broken data pipeline). Harm is the negative effect on people or systems (a customer loses access to funds, a candidate is unfairly rejected). Responsibility is about who had the ability to prevent or reduce that harm (product owner who chose automation level, engineering team who lacked monitoring, leadership who set unrealistic timelines). Risk management is the practice of making responsibility explicit and actionable.
Practical outcome: when you later fill a risk register, you will describe where in the harm chain you can intervene—data filters, model constraints, human review, rate limits, logging, monitoring, and user education. Interventions are rarely “just retrain the model.”
AI risk assessment fails when it only considers the intended user. A system can harm users (the people operating it), subjects (the people the system is about), non-users (people indirectly impacted), and society (broader effects like trust, misinformation, or inequality). Mapping stakeholders is not bureaucracy; it is a way to discover risks you would otherwise miss.
Example: a customer service assistant may be used by agents (users) but affects customers (subjects). If it suggests harsher language for certain names or regions, the harm is felt by customers, even if agents “just followed the tool.” A street-scene recognition model might be used by city staff but affects residents and visitors who never consented to being analyzed. A generative image tool might be used for fun but can enable harassment of targeted individuals (non-users).
Practical workflow: for any system map, write a short “who could be harmed” list before you write mitigations. This reduces a common mistake: designing controls that protect the company (e.g., disclaimers) but not the people most impacted (e.g., appeal paths, error correction, privacy choices). Another common mistake is assuming that “human in the loop” automatically solves the problem. Humans can be overloaded, may trust the tool too much, or may not have authority to override. If a human is part of the safety plan, specify what they review, how often, and what happens when they disagree with the model.
Practical outcome: stakeholder mapping will later feed your documentation templates (model card–style summary, data notes, decision log). Those documents should name the affected groups and the intended protections, so risk decisions are visible and auditable.
Beginners often treat safety, ethics, and compliance as interchangeable. They overlap, but they are not the same, and confusing them leads to gaps.
A compliant system can still be harmful (meeting minimum legal requirements is not the same as being responsible). An ethical intent can still produce unsafe outcomes (good goals don’t guarantee robust operation). And a “safe” system in one context can be unethical in another (a perfectly accurate surveillance model can still violate rights).
Use this distinction to make better engineering trade-offs. If a risk is primarily safety, you might invest in monitoring, rollback plans, conservative thresholds, and escalation paths. If it is primarily ethics, you might revisit whether the use case is appropriate, add user choice, or change incentives and metrics. If it is primarily compliance, you might focus on consent flows, data retention, access controls, and documentation (and involve legal early).
Common mistake: using a single control (like a disclaimer) as a universal mitigation. Disclaimers may help with communication, but they rarely reduce likelihood, and they do little for non-users. Stronger mitigations change system behavior or decision pathways: limit automation, add friction for high-risk actions, remove sensitive inputs, or implement auditing.
Practical outcome: in later chapters, your risk register will include a “risk type” tag (safety/ethics/compliance). This helps route work to the right owners and prevents “nobody owns it” problems.
Risk work becomes practical when you can state a risk clearly and consistently. A good beginner risk statement is one sentence that names the system behavior, the affected party, the harm type, and the consequence. Avoid vague language like “the model might be biased.” Biased how? Against whom? Leading to what?
Use this template:
Example (errors + safety): “When clinicians use the note summarizer for discharge instructions, the AI system may omit medication allergies, which could cause patient safety harm to the patient, resulting in inappropriate prescriptions.” Example (privacy + security): “When employees paste customer emails into the chatbot, the AI system may send personal data to an external service, which could create a privacy breach for customers, resulting in regulatory penalties and loss of trust.” Example (bias): “When the screening model ranks applicants, the AI system may systematically score certain groups lower due to historical label bias, which could create unfair exclusion for qualified candidates, resulting in reduced opportunity and legal exposure.”
Now write your own scenario in your own words using the template. Keep it specific enough that a teammate could propose a mitigation without asking follow-up questions. If you get stuck, revisit your high-level map from Section 1.1: inputs, outputs, and the decision point. A risk statement usually sits exactly at that decision point.
Practical outcome: your one sentence becomes the first row in an AI Risk Register later in the course, where you will add likelihood (1–5), impact (1–5), owner, mitigation steps, and a due date. That is the course goal: a repeatable workflow that turns concerns into tracked work, not endless discussion.
1. Which best matches the chapter’s plain-language definition of AI?
2. According to the chapter, why can an AI system still cause harm even if it was built with good intentions?
3. Which statement best describes the chapter’s view of “risk work” for beginners?
4. In the chapter’s framing, what is the key difference between mistakes, harm, and responsibility?
5. Which sequence best matches the chapter’s suggested beginner risk workflow?
Before you can reduce AI risk, you need a shared, concrete picture of what the “system” is. Beginners often jump straight to the model (e.g., “we use GPT-4” or “we trained XGBoost”), but harms usually emerge from the full workflow: data collection, feature creation, prompts, model behavior, business rules, user interfaces, human handoffs, and how decisions are acted on.
This chapter teaches you to map an AI system at a high level so you can assess risk with engineering judgment rather than guesswork. You will write a clear purpose statement and success criteria, list stakeholders and high-risk user groups, trace the end-to-end workflow from data to decision, and mark where humans interact (handoffs and overrides). By the end, you should be able to complete a one-page system map that makes later risk prioritization faster and more accurate.
As you read, keep a simple principle in mind: if you cannot describe the system in one page, you probably cannot control its risks. A good map is not about perfection; it is about clarity. You are building a reference that lets you answer practical questions like: “Where does the data come from?”, “Who sees the output?”, “What happens if it is wrong?”, and “Who can stop it?”
We will break the map into six pieces. Each piece is small enough to fill in quickly, but together they create a durable “source of truth” for your risk register and documentation templates later in the course.
Practice note for Write a clear purpose statement and success criteria: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.
Practice note for List stakeholders and high-risk user groups: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.
Practice note for Trace the end-to-end workflow from data to decision: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.
Practice note for Find where humans interact with the system (handoffs and overrides): document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.
Practice note for Checkpoint: complete a one-page system map: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.
Practice note for Write a clear purpose statement and success criteria: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.
Practice note for List stakeholders and high-risk user groups: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.
Practice note for Trace the end-to-end workflow from data to decision: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.
Start by defining what you are assessing. “The AI system” is rarely just a model; it is the combination of model(s), data pipelines, prompts/configuration, business rules, user interface, and operational procedures. Your goal is to draw a boundary that is wide enough to capture real harms but narrow enough to finish the work.
Write a purpose statement in one or two sentences. Include the user benefit and the organizational intent. Example: “This system summarizes customer support chats to help agents respond faster and more consistently.” Then add success criteria that are measurable and safety-aware, not only accuracy-focused. Example criteria might include: average handling time reduced by 15%, summary omission rate below X%, and “no sensitive personal data appears in summaries unless already present in the chat.”
Common mistake: scoping only the model and ignoring the decision pathway. If the system’s output is used to deny a benefit, prioritize a loan, or trigger an investigation, then the decision and action belong in scope. Another mistake is scoping so broadly that no one owns anything. A useful boundary aligns with owners: each major component should have a team or person who can change it.
Practical outcome: a short scope paragraph plus a list of components included/excluded. This becomes the front page of your documentation and prevents “surprise” risks from being discovered late.
Next, list who interacts with the system and who is affected by it. These are not always the same. A hiring screening model may be used by recruiters, but it affects candidates. A teacher-facing tutoring tool affects students. Mapping both groups is essential for risk work because harm often shows up in the “affected but not present” population.
Make a stakeholder list with at least four categories: primary users (direct operators), secondary users (people who consume outputs indirectly), affected parties (people subject to decisions), and governance stakeholders (legal, security, privacy, compliance, audit, and leadership). For each, note what they want and what they fear. This helps you anticipate misuse and incentives.
Include high-risk user groups and contexts. High-risk does not mean “problematic users”; it means users who face higher consequences if the system fails. Examples: minors, non-native speakers, people with disabilities, people in financial distress, patients, or individuals subject to disciplinary action. Also consider settings like healthcare, education, housing, employment, and law enforcement—where errors or bias can have serious impact.
Common mistake: documenting only intended use. Risk assessments fail when they ignore likely use—especially under incentives like speed, quotas, or cost-cutting. Practical outcome: a table of stakeholders and use-cases that you will later reference when identifying harm types (privacy, bias, errors, security, misuse) and assigning owners.
Data is where many risks begin. Map every input that influences the model or the decision, including “hidden” inputs like user profiles, device metadata, or retrieved documents. For each source, record: who provides it, how it is collected, whether it is optional, how often it updates, and what quality checks exist.
Separate data into three buckets: training data (what shaped model behavior), runtime data (what the system reads during operation), and feedback data (what you collect to improve the system). This distinction matters because privacy and consent rules may differ, and because feedback loops can amplify bias. For example, if a fraud model flags certain transactions, and flagged transactions get more scrutiny, you may generate more “fraud labels” in those groups—creating a self-reinforcing pattern.
Common mistakes: assuming “public” means “safe,” or forgetting that logs and analytics are also data sources. Another frequent error is mixing “ground truth” with “proxy labels” without noting the limitations. Practical outcome: a short data inventory that can be reused later in a model card–style summary or “data notes” template, and that clearly indicates where privacy and bias harms could enter.
Now map what the system produces and how those outputs are used. Output types include classifications (approve/deny), scores (risk score), rankings (top candidates), generated text (summaries, emails), or actions (auto-block, route to human review). Risk depends heavily on whether an output is advisory or automatically executed.
Describe each output with four details: format (number, label, text), recipient (which user sees it), decision role (recommendation vs. decision), and time sensitivity (real-time vs. batch). Then list what happens next: does someone click “send,” does an API trigger an email, does it update a record that downstream systems consume?
Common mistake: stopping at “the model outputs a score” and not documenting how the score becomes an action. Another is ignoring “soft” outputs like summaries that can still drive high-stakes decisions if copied into official records. Practical outcome: a clear output-to-action chain that later helps you pick controls: warnings, confidence indicators, human review thresholds, monitoring, and rollback procedures.
Two systems with the same model can have different risks because context changes behavior. Capture the operating environment: where and when the system is used, what users are optimizing for, and what constraints they face. This section is where engineering judgment becomes explicit.
Document incentives and pressure points. If agents are measured on speed, they may over-rely on AI text. If managers are rewarded for reducing headcount, automation may creep from “assist” to “auto-act.” Also note constraints: limited training time, poor UI, multilingual users, intermittent connectivity, or restricted ability to escalate. These constraints often create the conditions for misuse and errors to matter more.
Common mistake: treating context as “nice to have” detail. In reality, context determines likelihood: the same hallucination is low-impact in a brainstorming tool and high-impact in a medical note generator. Practical outcome: a short context paragraph that will later help you estimate likelihood × impact realistically and argue for proportionate safeguards without overengineering.
With scope, users, data, outputs, and context captured, you can produce the one-page system map. Use a simple box-and-arrow diagram—no specialized tools required. The goal is readability: a new team member should understand the workflow in two minutes.
Start left-to-right: data in (sources), processing (cleaning, feature building, retrieval, prompt assembly), model(s) (and key configuration), post-processing (business rules, thresholds, safety filters), outputs, and actions. Then add the human interaction points: where a person provides input, reviews output, approves actions, or can override. Mark these as explicit “handoff” boxes.
Include two annotations on the diagram: (1) where logs are stored (because logs can create privacy risk), and (2) where external systems connect (because integrations create security and misuse risk). If you have multiple models (e.g., a classifier plus an LLM), show them separately; mixing them into one box hides failure modes.
Checkpoint: complete your one-page system map and attach your purpose statement and success criteria at the top. If you cannot place a component on the diagram, that is a signal: either it is out of scope, or you have a “hidden dependency” that should be documented before moving on to risk identification.
1. Why does Chapter 2 emphasize mapping the full AI system rather than focusing only on the model?
2. Which set of questions best reflects what a good system map should help you answer?
3. What is the main deliverable for Chapter 2?
4. In this chapter, what does it mean to identify “human interactions” with the system?
5. According to the chapter, what is a common root cause of real-world AI incidents that system mapping helps prevent?
In Chapter 2 you mapped your AI system at a high level: what it’s for, who uses it, what data goes in and out, and what decisions it influences. In this chapter you’ll use that map to do beginner threat modeling: systematically scanning for common harm types so you can name risks early, before they turn into incidents.
“Threat modeling” can sound like an advanced security practice, but the beginner version is simply: (1) list what could go wrong, (2) group issues by harm type, (3) write down who would be affected and how, and (4) capture enough detail to prioritize later. You are not trying to prove something will happen; you are trying to avoid being surprised.
A useful mindset is to treat AI harms as the result of three things interacting: data (what the system sees), decisions (what the system does with it), and people (who depends on those decisions). Most real-world AI failures are not “AI is evil”; they are normal engineering gaps—missing constraints, unclear ownership, weak documentation, or optimistic assumptions about users.
As you read, keep your system map open. After each section, add at least one candidate risk to a scratch list. At the end of the chapter you will have your first “by harm type” risk list, ready to convert into a simple Risk Register in the next chapter.
Practice note for Identify privacy and data protection risks in simple terms: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.
Practice note for Recognize bias and unfair outcomes without statistics: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.
Practice note for Find reliability risks: errors, hallucinations, and edge cases: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.
Practice note for Consider security and misuse: prompt injection and abuse cases: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.
Practice note for Checkpoint: produce a first list of risks by harm type: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.
Practice note for Identify privacy and data protection risks in simple terms: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.
Practice note for Recognize bias and unfair outcomes without statistics: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.
Practice note for Find reliability risks: errors, hallucinations, and edge cases: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.
Practice note for Consider security and misuse: prompt injection and abuse cases: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.
Privacy risk starts with a simple question: what information about a person could this system collect, infer, store, or reveal? “Personal data” is any data that can identify someone directly (name, email, phone) or indirectly (device IDs, unique combinations of attributes). “Sensitive data” is personal data that can cause greater harm if exposed or misused—health details, financial data, precise location, government IDs, private messages, biometrics, and information about children.
In AI systems, privacy risk often hides in places beginners don’t expect. Inputs may contain personal data (a support chat transcript). Outputs may reveal personal data (a summary that repeats a phone number). Even if you never ask for personal data, the system may infer it (guessing pregnancy status from shopping history). Treat “inference” as a form of collection: if the model can reliably derive it, it can function as sensitive data.
Use a practical workflow: first list data in, data stored, and data out. Then annotate each item with whether it is personal, sensitive, or non-personal. Next, ask four beginner threat-model questions: (1) Could the system capture more data than necessary? (2) Could it retain data longer than needed? (3) Could someone see it who shouldn’t (internal or external)? (4) Could the model output it inappropriately (e.g., quoting raw text, exposing hidden fields)?
Finally, write risks in human terms. Instead of “PII leak,” say “a customer’s address could be revealed in an AI-generated email draft visible to the wrong recipient,” and note who is harmed and how (embarrassment, fraud risk, regulatory exposure). This clarity makes later prioritization easier.
Fairness risk is not limited to formal statistics. Beginners can detect many unfair outcomes by asking: does the system work worse for some groups, and do those groups carry the cost? Unequal outcomes can come from uneven data coverage, biased labels, proxy variables (like ZIP code standing in for race or income), or different user contexts (non-native speakers, disability accommodations, older devices).
Start with your “users” list from the system map and expand it. Include: primary users (operators), affected non-users (people being evaluated or described), and edge users (people who interact indirectly, like call-center staff relying on a summary). For each group, write what “good outcome” and “bad outcome” looks like. In a hiring screener, a bad outcome is being rejected unfairly; in a medical assistant, it’s receiving misleading guidance; in content moderation, it’s being silenced or harassed.
A practical, non-statistical check is to compare pathways. Ask: do different users provide different kinds of input? Do they have different ability to correct mistakes? Do they get different levels of scrutiny? For example, an automated claims triage tool might be reviewed when high-value customers are involved but auto-denied for others. That is a fairness risk even if the model is “accurate.”
Write fairness risks with a “cost bearer” line: “If wrong, who pays?” Often it’s the end user, not the organization. Capturing that explicitly helps you prioritize later, because harm severity depends not only on frequency but on who has power to recover from errors.
Reliability risk is about whether the system is dependable in the real world, not whether it looks impressive in demos. Beginners often over-focus on “accuracy” as a single score. In practice you need a more grounded question: when the system is wrong, can users detect it before harm occurs? A system can be “useful” while imperfect if it fails safely and predictably.
For generative AI, hallucinations (confidently wrong statements) are the headline failure mode, but edge cases are just as important: unusual inputs, ambiguous requests, long documents, mixed languages, or missing context. Map reliability risk by looking at your system’s decision points. Where does the output become an action—sending an email, denying a refund, recommending a dosage, escalating a ticket? The closer the output is to an irreversible action, the more reliability matters.
A practical workflow is to define “minimum acceptable behavior” for three scenarios: typical case, stressful case, and worst-case. Typical case is normal user inputs. Stressful case includes time pressure, incomplete data, or high volume. Worst-case includes adversarial or confusing inputs. Then ask: does the system degrade gracefully? Does it say “I don’t know,” request clarification, or route to a human?
Document reliability risks with a concrete failure story: “In the edge case where a customer mentions two accounts in one message, the system may merge them and expose details across accounts.” This forces you to specify the triggering input, the bad output, and the downstream decision that makes it harmful.
Safety risk is about harm to people, not just system performance. In beginner AI risk work, “safety” includes three broad categories: physical harm (injury, dangerous instructions), emotional harm (harassment, manipulation, distress), and financial harm (fraud enablement, bad financial guidance, wrongful charges). Your goal is to identify where your AI system could influence high-stakes decisions or sensitive moments.
Start by tagging any use case that touches health, legal status, housing, education, employment, policing, or financial decisions. Even if your product is “just content,” it may be used in a high-stakes context. A writing assistant used by a landlord to draft notices can affect housing stability; a chatbot used by a stressed user can influence mental health choices.
A simple safety scan: list the top three actions a user might take because of the output. Then ask what happens if the output is (1) wrong, (2) misread, or (3) followed too literally. For example, an AI fitness coach giving generic advice could be unsafe for users with medical conditions; a budgeting tool could recommend risky moves that trigger overdraft fees; a customer-service bot might escalate a conflict with insensitive language.
When you write safety risks, include the affected person and the plausible pathway to harm. Safety risk statements should read like short incident reports in advance: who, what trigger, what output, what action, what harm. This structure helps teams propose realistic mitigations later.
Security risk asks: can someone make the system do something it shouldn’t, or gain access to data or capabilities they shouldn’t have? In AI products, classic security issues (weak authentication, exposed databases) still apply, but there are AI-specific patterns you should learn early: prompt injection, data leakage through outputs, and tool/plugin abuse.
Prompt injection is when an attacker crafts input that causes the model to ignore instructions, reveal hidden prompts, or call tools in unsafe ways. This is especially relevant when the model can take actions (send emails, query internal docs, run code). The beginner threat-model technique is to inventory “connectors” and “tools”: databases, ticket systems, calendars, code execution, retrieval-augmented generation (RAG) over internal documents. Each connector expands the blast radius if the model is manipulated.
Data leakage can happen even without a breach. If the model has access to internal documents, it might summarize or quote sensitive content to an unauthorized user. If logs store prompts and outputs, an internal user might later access private data they shouldn’t. Treat authorization as end-to-end: user identity, retrieval filters, tool permissions, and output filtering all need to align.
Write security risks with the asset and the attacker. Example: “An external user could use prompt injection to cause the assistant to retrieve HR policies not meant for them, leading to confidential data exposure.” This framing makes it easier to choose controls such as least-privilege permissions, output redaction, allowlists, and human confirmation for high-impact tool actions.
Misuse risk is about how the system might be used in ways you did not intend. “Dual-use” means the same capability can be helpful or harmful depending on intent. A summarization model can help customer support—or help someone scale phishing. An image generator can help design—or help create deceptive content. Beginners sometimes avoid misuse because it feels speculative, but you can make it practical by grounding it in your system’s capabilities.
Start by listing your top capabilities as verbs: generate, rewrite, classify, search, persuade, impersonate, automate actions, extract entities, translate. For each verb, ask “how could this reduce someone else’s agency or safety?” Then consider unintended users: people outside your target audience who may still access the system (public endpoints, shared accounts, leaked API keys), and insiders who might use it to bypass policy.
A practical misuse scan uses three lenses: (1) scale (does AI let a bad actor do more, faster?), (2) quality (does it make harmful content more convincing?), and (3) access (does it lower skill barriers?). If your tool improves any of these for harmful tasks, capture a misuse risk.
End this chapter by producing your first list of risks grouped by harm type: privacy, fairness, reliability, safety, security, and misuse. Don’t worry yet about perfect wording. What matters is that each risk is concrete (who/what/how), connected to your system map (data and decisions), and ready to be prioritized in the next chapter using likelihood × impact. This is the foundation of a beginner-friendly AI Risk Register.
1. In this chapter’s beginner threat modeling approach, what is the main goal?
2. Which set best matches the four beginner threat modeling steps described in the chapter?
3. The chapter suggests thinking of AI harms as an interaction of three elements. Which three?
4. According to the chapter, why do many real-world AI failures happen?
5. What is the recommended habit while reading each section of the chapter?
By now you can name common AI harm types and sketch a system at a high level. The next step is deciding what matters most, because you will never have unlimited time, budget, or access to perfect information. Risk assessment is the practical bridge between “we think something could go wrong” and “we will do these specific things next week to reduce harm.”
This chapter introduces a beginner-friendly workflow: convert risk ideas into testable risk statements, score each risk on likelihood and impact using a simple scale, translate those scores into severity levels with clear escalation triggers, and then choose what to address first. The output is a prioritized risk list you can copy into your AI Risk Register with owners and due dates.
Two reminders keep this process grounded. First, risk is about uncertainty: you’re judging what could happen, not what already happened. Second, likelihood × impact is not a prediction engine. It is a shared language for making decisions and documenting why you chose one mitigation over another.
You will use engineering judgment throughout. The point is not to be “objectively correct”; it is to be explicit, consistent, and useful—so the next person can understand your reasoning and improve it over time.
Practice note for Turn risk ideas into testable risk statements: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.
Practice note for Score likelihood and impact using a simple scale: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.
Practice note for Decide severity levels and escalation triggers: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.
Practice note for Choose what to address first with limited time and budget: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.
Practice note for Checkpoint: build a prioritized risk list: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.
Practice note for Turn risk ideas into testable risk statements: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.
Practice note for Score likelihood and impact using a simple scale: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.
Practice note for Decide severity levels and escalation triggers: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.
Practice note for Choose what to address first with limited time and budget: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.
A good risk statement is specific enough to test and specific enough to own. Beginners often write risks as vague worries (“bias,” “privacy,” “security”), which makes scoring impossible and mitigation unfocused. Convert each idea into a cause → event → harm statement. This forces you to name what triggers the risk, what happens in the system, and who is harmed.
Use this template:
Example conversions:
Make the statement testable by adding context: which user group, what decision point, what data flow, and what environment (internal tool vs. public). If you can’t imagine a test, the statement is still too abstract. A common mistake is mixing multiple risks into one sentence (“The system may leak data and be biased and be insecure”). Split them: each risk should map to a distinct mitigation and an owner.
Practical outcome: a list of 8–20 crisp risk statements, each tied to a system component (data ingestion, prompt handling, model output, human review, logging, access control). This list becomes the input to likelihood and impact scoring.
Likelihood is your best judgment of how often the event could happen in the real world given how the system is built and used. You are not estimating exact probabilities; you are sorting risks into an order that drives action. Use a simple three-point scale (Low/Medium/High) and keep your criteria consistent across risks.
Assess likelihood by asking “How easily can the cause lead to the event?” Consider these practical drivers:
A simple scoring guide many teams find workable:
Common mistakes: scoring likelihood based on optimism (“we don’t think users will do that”) rather than observed behavior; ignoring scale (“it’s rare per user” can still be frequent at large volume); and forgetting that new features change likelihood (e.g., adding file upload or web browsing often increases it).
Practical outcome: each risk statement gets a likelihood rating with one sentence of justification (what evidence or reasoning you used). This justification is crucial for later review when the system or controls change.
Impact measures the severity of harm if the event occurs. Impact is not just money. For AI systems, impact often includes unfair outcomes, privacy violations, safety consequences, and erosion of trust. A helpful way to keep impact concrete is to evaluate three dimensions: who is harmed, how many people are affected, and how long the harm persists.
Also consider the “blast radius” beyond the direct user: a wrong medical instruction can affect a patient; a leaked dataset can be copied indefinitely; a biased scoring model can change hiring or admissions decisions for months. Some impacts are hard to reverse, which should push the rating upward even if likelihood is uncertain.
A simple impact scale:
Common mistakes: rating impact based only on “average” users while ignoring worst-affected groups; treating reputational damage as the only serious harm; and confusing impact with likelihood (“it probably won’t happen, so impact is low”). Keep them separate: a rare catastrophic outcome is still high impact.
Practical outcome: each risk statement gets an impact rating plus a brief note on affected stakeholders and why the harm would be hard or easy to reverse.
Once you have likelihood and impact, you need a consistent way to translate them into severity so you can prioritize. You do not need decimals or complicated formulas. A simple 3×3 matrix is enough for most beginner programs and keeps debates focused on the drivers rather than the arithmetic.
Use this rule-of-thumb matrix:
Then decide what to address first. With limited time and budget, prioritize by severity, but add two practical “tie-breakers”:
Common mistake: treating the matrix as a one-time classification. In reality, severity changes when you ship new features, change data sources, expand to new users, or add safeguards. Document the version/date of the assessment so you can revisit it.
Practical outcome: a prioritized risk list. At minimum, every High severity risk should have (1) an owner, (2) a near-term mitigation plan, and (3) a target date. Medium risks should have either a planned mitigation or a decision to accept/monitor. Low risks should still be recorded so they don’t disappear.
Beginners often feel pressure to “get the score right.” In practice, what matters is whether you understand your uncertainty. Two teams can assign different ratings and both be reasonable if they document assumptions and unknowns. Add a confidence tag to each risk (High/Medium/Low confidence) and list what information would change the rating.
Low confidence is common when:
Turn unknowns into actions. For each low-confidence item, write a short “learning task” with an owner and due date, such as: run a red-team session for prompt injection, sample 200 outputs for hallucinations in a key workflow, measure error rates by user segment, or verify whether logs store personal data. These tasks are often cheaper than full mitigations and can prevent wasted effort.
Common mistake: using low confidence as a reason to ignore a risk. If potential impact is high, low confidence is itself a warning sign. You may not need to fully fix the issue immediately, but you should at least instrument the system, restrict exposure, or add review gates while you learn.
Practical outcome: your prioritized list now includes not only mitigations, but also “evidence-building” tasks. This makes your AI Risk Register realistic: it reflects what you know, what you don’t, and how you plan to close the gaps.
Risk assessment is only useful if it changes decisions. You need simple decision rules that tell the team when it’s safe to proceed, when to pause a release, and when to escalate to legal, security, privacy, or leadership. These rules prevent “analysis paralysis” on one end and reckless shipping on the other.
Use basic escalation triggers tied to severity and confidence:
Decide what “done for now” means. A mitigation is not complete because it was discussed; it’s complete when it is implemented, verified, and owned. Verification can be lightweight: a checklist item, a log review, a small test set, or a documented walkthrough. The key is to make it repeatable.
Common mistake: escalating too late, after the system is widely used. Build escalation into your process early: if a risk touches privacy, security, or discrimination, bring the right experts in while the design is still flexible.
Checkpoint outcome: you should now be able to produce a prioritized risk list (top 5–10) with likelihood, impact, severity, confidence, owners, and near-term next steps. This list becomes the backbone of your risk register and the starting point for Chapter 5’s documentation and assignment of actions.
1. What is the main purpose of assessing and prioritizing AI risks in this chapter’s workflow?
2. Why does the chapter emphasize turning risk ideas into testable risk statements?
3. Which best describes how likelihood × impact should be used according to the chapter?
4. What additional note should accompany simple likelihood/impact scores to keep the assessment grounded?
5. What is the key output of Chapter 4’s workflow?
Risk work that lives only in people’s heads does not scale. Teams change, models change, and assumptions quietly expire. Documentation is how you keep risk thinking attached to the system as it evolves—so you can show what you knew, what you decided, what you tested, and what you changed.
This chapter focuses on a practical “minimum viable” documentation pack: a simple AI Risk Register plus a lightweight Evidence Pack. The Risk Register is where you track what could go wrong, how bad it would be, and who is responsible for reducing it. The Evidence Pack is the supporting material—tests, reviews, notes, screenshots, and decision records—that proves the work happened.
The goal is not paperwork for its own sake. Good documentation reduces rework, accelerates approvals, and prevents repeated debates. It also makes it easier to respond when something goes wrong: you can find the owner, the last decision, and the last known test results quickly.
As you read, keep one guiding question in mind: “If I left the project tomorrow, could someone else understand the current risk posture and continue the work safely?” If the answer is no, your documentation pack is not done yet.
Practice note for Create a simple AI Risk Register that anyone can read: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.
Practice note for Assign owners, due dates, and proof of completion: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.
Practice note for Capture key decisions and trade-offs in a decision log: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.
Practice note for Collect lightweight evidence: tests, reviews, user feedback notes: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.
Practice note for Checkpoint: assemble a “minimum viable” risk documentation pack: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.
Practice note for Create a simple AI Risk Register that anyone can read: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.
Practice note for Assign owners, due dates, and proof of completion: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.
Practice note for Capture key decisions and trade-offs in a decision log: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.
Practice note for Collect lightweight evidence: tests, reviews, user feedback notes: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.
A beginner-friendly AI Risk Register is a table that anyone on the team can read without needing specialized governance knowledge. Keep it short, consistent, and searchable. The register is not a brainstorm list—it is a tracking tool that turns “we should” into “we did,” with accountability.
Start with four required fields and only add more when you feel the pain of not having them:
In practice, you will also want a few supporting fields for prioritization and clarity: likelihood, impact, risk score, target date (due date), and a link to evidence. But keep the “front door” simple so people actually use it.
Engineering judgment matters most in the description. Write it so a non-expert can picture the failure mode, the affected users, and the consequence. A common mistake is to document only the outcome (“bias”) without the pathway (“training data under-represents group X, causing lower approval rates”). When the pathway is captured, mitigations become obvious and testable.
Finally, treat the register as a living queue: every open risk must have an owner and next step. Rows without owners are wishes, not plans.
Many teams confuse “controls” with “tasks.” A task is an action you take (run a test, review a dataset, update a prompt). A control is the durable mechanism that reduces risk (rate limiting, access control, redaction, human review, monitoring). Tasks create or verify controls, but controls are what actually change the system’s risk profile.
This distinction matters because tasks can be “done” while risk remains unchanged. For example, “Hold a fairness meeting” is a task; it does not reduce unfair outcomes by itself. “Implement group-based performance monitoring with alert thresholds” is a control; it changes detection and response capability.
When you add mitigations to a risk register, write them in control language and then break them into tasks. A useful pattern is:
Assign owners and due dates at the task level, but track whether the control is actually in place and effective. If a control depends on ongoing behavior (like weekly review), document that cadence as part of the control, not as a one-time task.
A common mistake is to list mitigations that are not feasible in the real operating environment. For example, “human review of all outputs” may be impossible at scale; the better control might be “human review for high-risk intents” plus robust refusal behavior elsewhere. Document these trade-offs explicitly so you can defend why you chose a scalable control.
Practical outcome: your register becomes a map from risk → control → tasks → evidence, which is exactly what reviewers and future maintainers need.
An Evidence Pack is not a giant folder of random screenshots. It is a curated set of artifacts that prove key claims: “We tested X,” “We reviewed Y,” “We made decision Z,” and “We implemented control C.” The goal is traceability: a risk row should link to the evidence that supports its status.
Evidence can be lightweight and still credible. Useful categories include:
Good evidence has four properties: it is dated, attributable (who produced or approved it), tied to a specific system version, and easy to interpret. A one-line statement like “tested and looks good” fails on all four.
Don’t over-optimize. For beginners, a simple approach is to create an “evidence index” document with links and short descriptions: “E-014: Prompt injection test suite results v0.3 (run on build 1.8.2, 2026-03-10).” Then, in the risk register, link to E-014 in the “proof” column.
Common mistakes: collecting evidence that is not connected to a risk, storing files without version context, and failing to capture negative results. If a test failed and you fixed it, include both the failing and passing runs; this shows progress and prevents repeating the same bug later.
Practical outcome: when someone asks “How do we know this risk is mitigated?”, you can answer with a link, not a meeting.
Your documentation pack needs a plain-language summary of the model and its data—similar in spirit to a model card, but lighter. This is not marketing content. It is operational clarity: what the system is, what it is for, and what it should not be used for.
A practical template that fits on one page includes:
Write for a smart non-specialist. Avoid jargon like “distribution shift” without explanation; instead say, “If user behavior changes (new products, new slang), accuracy may drop and we may need retraining.” Include one or two example inputs/outputs that reflect normal use and one that reflects an unacceptable use case.
This summary connects directly to risk identification. If the system outputs free-form text to end users, you should expect risks like harmful content, fabricated claims, and prompt injection. If the input includes personal data, privacy and retention risks move up the list. The model/data summary becomes the “map” that makes the risk register make sense.
Common mistake: documenting what the model could do instead of what it is allowed to do. Make the intended use explicit; many harms come from capability being mistaken for permission.
Even with good controls, issues happen: incorrect outputs, biased behavior, data leaks, security probes, or user misuse. Incident and issue tracking is how you learn systematically instead of repeatedly “firefighting.” For beginners, you do not need a complex incident management program—just a consistent record.
Track two related streams:
For each issue/incident, record: date/time, reporter, affected system version, user impact, reproduction steps or example prompts, immediate containment action, root cause (when known), and follow-up tasks with owners and due dates. Link the item back to the relevant risk register row; if no row exists, create one. This closes the loop between real-world failures and planned mitigations.
Capture trade-offs in the write-up. For example, tightening content filters may reduce harm but increase false positives that frustrate users. Document what threshold you chose and why, and what monitoring will tell you if it needs adjustment.
Common mistakes: deleting “embarrassing” examples (they are often the most educational), failing to record the exact prompt/context that caused failure, and treating incidents as one-off events instead of signals of a control gap.
Practical outcome: over time you build an evidence trail that demonstrates continuous improvement, not just initial compliance.
Documentation fails when it becomes stale. The easiest way to keep it alive is to tie updates to events you already have: releases, data refreshes, and incidents. Treat the Risk Register, model/data summary, and decision log as part of the system—not as separate “governance paperwork.”
Use simple versioning rules:
Set a review cadence that matches risk. A low-risk internal tool might review monthly or per release; a customer-facing model affecting decisions might review weekly metrics plus a formal monthly risk review. The key is to define the cadence in writing and assign an owner for the review itself. “Everyone will keep it updated” usually means no one will.
Your decision log is especially important for keeping context. Record key choices such as selecting a dataset, changing a threshold, enabling a new feature, or accepting a residual risk. Each entry should include: the decision, alternatives considered, trade-offs, approver, and link to supporting evidence. This prevents “decision amnesia” where the same debate repeats every quarter.
Checkpoint: assemble your minimum viable documentation pack. At minimum you should have (1) a readable risk register with owners/dates/status, (2) an evidence index with links, (3) a plain-language model/data summary, and (4) a decision log. If you can hand that pack to a new teammate and they can safely operate the system, your documentation is doing its job.
1. What is the main reason Chapter 5 says documentation is necessary for AI risk work?
2. In the chapter’s “minimum viable” documentation pack, what are the two core components?
3. Which description best matches the purpose of the AI Risk Register in Chapter 5?
4. Which item is an example of what belongs in the Evidence Pack according to the chapter?
5. The chapter’s guiding question for judging whether documentation is “done” focuses on whether:
In the earlier chapters you learned how to name harms, estimate likelihood and impact, and write them down in a simple risk register with owners and due dates. This chapter is about doing the work that actually reduces harm: choosing practical controls, setting operating limits, communicating clearly with users, and monitoring the system after release. “Reduce harm” is not a single feature or policy—it is a chain of small decisions that makes bad outcomes less likely, less severe, easier to detect, and faster to recover from.
A common beginner mistake is treating safety as a one-time “launch gate.” Real systems change: user behavior evolves, model behavior drifts, and your product surface area grows. So you need two mindsets at once: (1) prevention and guardrails before launch, and (2) detection and response after launch. Your risk register becomes a living document that links controls to specific risks, clarifies who monitors what, and defines what “good enough” looks like for your first release.
Throughout this chapter, aim for controls you can operate. A control that sounds great but cannot be measured, monitored, or owned is effectively not a control. Prefer lightweight mechanisms: clear warnings and instructions, safe defaults, rate limits, basic logging, and a starter incident process. You can always mature them later.
By the end of this chapter you should be able to convert your prioritized risks into an actionable “control plan,” add human review where it matters, design user-facing transparency, and complete a simple launch readiness checklist that includes monitoring and incident response.
Practice note for Pick practical risk controls: prevent, detect, respond: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.
Practice note for Add human review and safe operating limits where needed: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.
Practice note for Design user-facing transparency: warnings, instructions, and consent: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.
Practice note for Set up monitoring and an incident response starter plan: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.
Practice note for Final checkpoint: complete a launch readiness checklist: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.
Practice note for Pick practical risk controls: prevent, detect, respond: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.
Practice note for Add human review and safe operating limits where needed: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.
Practice note for Design user-facing transparency: warnings, instructions, and consent: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.
Risk controls are actions you take to change the risk equation. In beginner-friendly terms: you can make a bad thing less likely to happen, or you can make it hurt less when it does happen. Both approaches matter, and the best plans usually mix them.
Reducing likelihood is about prevention. Examples: collecting less sensitive data, blocking unsafe prompts, requiring authentication, limiting model tools, or adding validation checks. Reducing impact is about containment. Examples: showing outputs as “draft,” limiting the decision scope, requiring a human to approve high-stakes actions, or offering an easy appeal path for users.
Connect controls directly to items in your risk register. For each top risk, add 2–4 controls and label which part they affect: likelihood or impact, and prevent/detect/respond. Also add an owner and a measurable check. “Add safety filter” is vague; “Block self-harm instructions using classifier X; alert on >1% blocked requests per day; owner: Safety Eng” is actionable.
Engineering judgment matters when controls conflict with product goals. Beginners often over-control everything and break usability, or under-control because they fear friction. Use risk-based scope: apply strongest controls to highest impact decisions (e.g., medical, financial, employment, legal) and lighter controls to low-stakes convenience features.
Common mistake: writing controls that are not enforceable. If you rely on a policy (“Users must not…”) without technical or operational enforcement, treat it as a weak control and pair it with stronger ones (rate limits, logging, reviews, or feature restrictions).
Human review is not a magic shield, but it is a practical control when the cost of a mistake is high or when automated methods are unreliable. The goal is to place humans at the points where they can meaningfully reduce impact: before an action is taken, when a decision is ambiguous, or when the system is outside its safe operating limits.
Start by defining when human approval is required. Good triggers include: high-stakes categories (credit decisions, hiring recommendations), low confidence scores, novel user segments, policy-sensitive content, or any request that would perform an irreversible action (sending an email, approving a refund, changing account access).
Design the workflow so humans can do the job well. Provide the right context (input, retrieved sources, model rationale if available), a clear decision to make, and an audit trail. A common mistake is creating “checkbox approvals” where reviewers lack time or information, leading to rubber-stamping. If you cannot staff real review, tighten the model’s operating scope instead.
Also define safe operating limits. Examples: only answer from a specific knowledge base; only generate summaries for documents under N pages; only operate in certain regions; only allow certain languages until tested; block medical or legal advice entirely. Limits reduce likelihood by preventing the model from entering failure-prone zones.
Finally, be explicit about accountability. If a human approves, who is responsible for the outcome? Write it in your decision log and your risk register owner field. “The AI did it” is never an acceptable explanation in real operations.
Many AI harms are data harms: privacy violations, leakage of sensitive information, biased or unrepresentative datasets, and unauthorized access. Data controls are often your highest-leverage risk reductions because they reduce both likelihood (fewer chances to leak) and impact (less sensitive data exposed if something goes wrong).
Minimization means collecting only what you need. Ask: can we complete the task without names, exact addresses, or full free-text? Can we replace raw identifiers with tokens? Can we process on-device or in-memory without storing? Beginners often keep “just in case” data; that becomes liability. If you are not using a field in a product requirement, remove it.
Retention means keeping data only as long as necessary. Set default retention periods for logs, prompts, and outputs. A practical approach is tiered retention: short retention for raw content (days/weeks), longer retention for aggregated metrics (months), and strict exceptions with approval. Make deletion real: define where the data lives (app logs, analytics tools, vendor systems) and how deletion requests propagate.
Connect these to user-facing consent and transparency. If you use user inputs for product improvement or model training, say so clearly and offer meaningful choices. “By using this, you agree…” buried in legal text is not user-centered transparency. Good transparency is timely: show it at the moment the user is about to share data.
Common mistake: forgetting derived data. Even if you remove direct identifiers, embeddings, conversation histories, and analytics events can still be sensitive. Document these in your data notes template and treat them as part of your data inventory.
Product controls are the practical mechanisms that shape how the AI behaves in the real world. They are often easier to ship than model changes, and they are essential for misuse resistance and reliability. Think of them as “safe operating mechanics”: they control volume, scope, and failure behavior.
Rate limits reduce abuse and contain blast radius. Apply limits per user, per IP, and per organization, and consider separate limits for expensive or risky capabilities (tool use, file upload, code execution). Add backoff and clear error messages so legitimate users understand what happened. A common mistake is shipping one global limit; attackers will distribute requests across accounts.
Content filters can block disallowed requests or outputs (self-harm instructions, explicit hate, doxxing). Use them as a layer, not a guarantee. Pair automated filters with: category-specific warnings, constrained modes (e.g., “general information only”), and escalation paths to humans for borderline cases.
User-facing transparency is a control when it changes behavior. Good warnings are specific: “This tool can be wrong; verify with official sources before submitting tax forms.” Good instructions reduce misuse: “Do not enter passwords or personal health details.” Consent prompts should be understandable and aligned with data practices described in your data notes.
Common mistake: relying on one control. For example, a single prompt instruction (“don’t do harmful things”) is not a control by itself. Layer: input checks + output filters + rate limits + logging + fallback + human escalation for high-stakes categories.
Monitoring turns safety from a promise into an operational practice. You are watching for changes in the system, its environment, and its users that increase risk. The most important beginner step is to pick a small set of signals you can actually review weekly.
Drift is when inputs or outputs shift over time. Inputs drift when user prompts change (new slang, new use cases, seasonal spikes). Outputs drift when the model version changes, retrieval sources update, or prompts are tweaked. Monitor basic distributions: topic categories, language, length, refusal rates, and tool-call frequency. For a customer-support assistant, also monitor resolution rate and escalation rate.
Feedback loops happen when the system’s outputs influence its future inputs or the world around it. Examples: a recommender amplifies extreme content because it boosts engagement; an HR screener filters candidates and changes who applies; a fraud model changes attacker strategies. Watch for second-order effects: are certain groups dropping out, complaining more, or being disproportionately refused?
Design monitoring with privacy in mind. You often do not need to store full raw prompts to detect issues; store hashed identifiers, categories, and minimal excerpts with strict access. Document what you log and why in your data notes, and link it back to the risks it helps detect.
Common mistake: collecting dashboards without decisions. For every monitored signal, write down: “If it crosses threshold X, we will do Y within Z hours.” That converts monitoring into an actual control.
Even with strong controls, incidents happen: private data appears in outputs, the model gives dangerous instructions, a jailbreak spreads, or a tool integration performs an unintended action. A starter incident plan prevents panic and reduces impact. Keep it lightweight, but make it real.
Report: define how incidents are raised. Provide an internal channel (ticket queue or on-call alias) and a user channel (report button or support form). Require a minimal report format: what happened, when, user impact, screenshots/log IDs, and severity guess. Do not depend on “someone noticing in Slack.”
Triage: decide severity and immediate containment. Typical questions: Is anyone at immediate risk? Is sensitive data exposed? Is misuse ongoing? Can we disable a feature flag, tighten filters, revoke keys, or rate-limit? Assign an incident lead and record actions in a decision log for later learning.
A practical “launch readiness checklist” ties this together. Before launch, confirm: top risks have controls with owners; human review triggers are implemented; safe operating limits are documented; user-facing warnings/instructions/consent are in the UI; logging and monitoring have thresholds and responders; a kill switch exists; and the incident process has named contacts. If any of these are missing for a high-impact risk, you are not “almost ready”—you are choosing to accept that risk. Make that choice explicit in the risk register, with leadership sign-off and a due date to revisit.
Common mistake: treating incidents as failures to hide. In safety work, incidents are also signals that your controls and assumptions need updating. The fastest teams to learn are the ones that document, fix, and feed improvements back into the system design.
1. Which approach best matches the chapter’s definition of “reduce harm” for an AI system?
2. Why does the chapter warn against treating safety as a one-time “launch gate”?
3. Which set correctly matches the chapter’s prevent/detect/respond control types to their purpose?
4. According to the chapter, what makes a proposed control effectively “not a control”?
5. Which action best reflects the chapter’s recommendation for early-stage harm reduction and operations readiness?