HELP

+40 722 606 166

messenger@eduailast.com

Accessibility-First AI for Learning: Captions, Alt Text, Reading

AI In EdTech & Career Growth — Intermediate

Accessibility-First AI for Learning: Captions, Alt Text, Reading

Accessibility-First AI for Learning: Captions, Alt Text, Reading

Build inclusive AI supports—captions, alt text, and reading tools that work.

Intermediate accessibility · ai-in-education · captioning · alt-text

Accessibility-first AI: build supports learners actually use

This course is a short, technical, book-style playbook for designing and shipping AI-assisted accessibility supports in learning experiences—starting with captions, expanding to alt text, and finishing with reading supports that improve comprehension for everyone. Instead of treating accessibility as a last-minute checklist, you’ll learn how to make it a first-class product requirement: measurable, testable, and repeatable across content types.

You will work through a practical workflow that mirrors real instructional production: assemble a small “source assets pack” (a short video, a few images, and a reading passage), generate first-pass outputs with AI, and then apply human review steps that catch the failures AI tends to introduce—timing drift, missing sound cues, hallucinated visual details, and meaning loss during simplification.

What you’ll build across 6 chapters

  • Captioning and transcripts you can publish in common formats (SRT/VTT) with clear QA criteria.
  • Alt text that is context-aware and instructionally useful—plus long descriptions for complex visuals like charts and diagrams.
  • Reading supports including summaries, plain-language rewrites, glossaries, and TTS-ready text that preserves intent.
  • Quality operations: checklists, sampling plans, error budgets, and documentation that scale to teams.

Who this is for

This course is for instructional designers, educators, edtech practitioners, content producers, and career-switchers who want practical, job-ready skills. You don’t need to code, but you should be comfortable working with typical course media (documents, images, and video) and using web-based AI tools. If you’ve ever thought “AI can generate it, but can I trust it?”—this course shows you how to make it trustworthy through process and QA.

How the course teaches (and why it works)

Each chapter builds on the prior one in a deliberate sequence. You’ll start with foundations: learner needs, assistive technology touchpoints, and acceptance criteria. Then you’ll move through three production pipelines—captions, alt text, and reading supports—each with concrete standards, prompting patterns, and review rubrics. Finally, you’ll wrap the work in governance so it can survive contact with real timelines, real stakeholders, and real compliance expectations.

By the end, you’ll complete a capstone “Accessibility-First AI Support Pack” you can keep as a portfolio artifact: before/after examples, QA logs, and implementation notes that demonstrate you can ship inclusive features—not just talk about them.

Get started

If you want to build inclusive learning faster while maintaining quality, this course will give you a repeatable system you can apply to any module or product sprint. Register free to begin, or browse all courses to compare paths in AI and edtech career growth.

What You Will Learn

  • Design an accessibility-first AI workflow for learning content production
  • Generate, edit, and QA accurate captions and transcripts for instructional video
  • Write effective alt text with AI assistance while preserving learner intent and context
  • Build reading supports (simplification, summaries, glossary, TTS-ready text) safely
  • Apply WCAG/UDL-aligned acceptance criteria and audit checklists for AI outputs
  • Create governance: prompts, rubrics, error budgets, and human review steps
  • Measure quality with spot checks, sampling plans, and learner feedback loops

Requirements

  • Basic familiarity with instructional content (slides, docs, videos) and an LMS
  • Comfort using web-based AI tools (chat, transcription, or image captioning apps)
  • Access to sample media (1 short video, 3 images, 1 reading passage) for practice
  • No coding required (optional: spreadsheets for QA sampling)

Chapter 1: Accessibility-First AI Foundations for Learning

  • Map learner needs to supports: captions, alt text, reading tools
  • Define quality and risk: where AI helps and where it fails
  • Set acceptance criteria: UDL + WCAG-inspired checks
  • Create your project scope and source assets pack
  • Choose tool categories and plan the workflow

Chapter 2: Captioning and Transcripts with AI (End-to-End)

  • Generate first-pass captions and transcripts from audio/video
  • Fix timing, speaker labels, and non-speech events
  • Improve terminology and names with custom vocabulary
  • Run a caption QA checklist and publish in multiple formats
  • Document decisions and build a repeatable template

Chapter 3: Alt Text that Teaches (Not Just Describes)

  • Classify images and decide when alt text is required
  • Draft alt text with AI using context-aware prompts
  • Handle complex visuals: charts, diagrams, equations, maps
  • Review for bias, privacy, and over-description
  • Create an alt text style guide and rubric

Chapter 4: Reading Supports with AI (Simplify, Support, Preserve Meaning)

  • Create leveled summaries and previews without losing core ideas
  • Generate glossary entries and concept scaffolds
  • Rewrite for plain language and readability targets
  • Prepare text for TTS and screen readers (structure and punctuation)
  • Validate fidelity with side-by-side checks and learner testing

Chapter 5: QA, Compliance, and Responsible AI Operations

  • Build checklists for captions, alt text, and reading supports
  • Set sampling plans, error budgets, and review roles
  • Handle privacy, consent, and data retention for media and transcripts
  • Establish incident response for accessibility defects
  • Create a lightweight accessibility report for stakeholders

Chapter 6: Capstone: Ship an Accessibility-First AI Support Pack

  • Assemble a complete support pack for one mini-lesson
  • Run final QA and publish-ready exports
  • Write implementation notes for a team handoff
  • Present outcomes: before/after examples and metrics
  • Plan iteration: backlog and continuous improvement

Sofia Chen

Learning Experience Architect & Applied AI Accessibility Specialist

Sofia Chen designs accessible learning systems for higher ed and workforce training, blending UDL, WCAG-aligned design, and practical AI workflows. She has led captioning and content accessibility programs across LMS ecosystems and helps teams ship measurable, inclusive improvements without slowing delivery.

Chapter 1: Accessibility-First AI Foundations for Learning

Accessibility-first content production treats supports like captions, alt text, and reading aids as core learning features—not optional “compliance tasks” added at the end. When you introduce AI into this work, the goal is not simply to automate; it is to build a workflow that is faster and more dependable than a manual-only process. That requires engineering judgment: knowing which parts AI can draft well, which parts need strong constraints, and where human review is non-negotiable.

In this chapter, you will map learner needs to specific supports (captions, alt text, reading tools), define quality and risk boundaries for AI assistance, and set acceptance criteria inspired by UDL and WCAG. You will also scope a small pilot project, assemble a “source assets pack” that makes AI outputs more accurate, and choose tool categories to create an end-to-end workflow (draft → edit → QA → publish).

Throughout, the mindset is practical: you are designing a production system. Your deliverables are not only accessible outputs, but also repeatable prompts, rubrics, and checks that let a team produce consistent results across modules, instructors, and formats.

Practice note for Map learner needs to supports: captions, alt text, reading tools: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Define quality and risk: where AI helps and where it fails: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Set acceptance criteria: UDL + WCAG-inspired checks: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Create your project scope and source assets pack: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Choose tool categories and plan the workflow: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Map learner needs to supports: captions, alt text, reading tools: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Define quality and risk: where AI helps and where it fails: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Set acceptance criteria: UDL + WCAG-inspired checks: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Create your project scope and source assets pack: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Choose tool categories and plan the workflow: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Sections in this chapter
Section 1.1: Accessibility as product requirements (not a retrofit)

Accessibility-first work begins by rewriting the question from “Can we add captions later?” to “What learning supports must ship with the lesson for it to be usable by all intended learners?” Treat accessibility as a product requirement with acceptance criteria, owners, and test steps. This shift matters because retrofits are where errors hide: captions drift out of sync, alt text becomes generic, and simplified readings accidentally change meaning.

Practically, define accessibility requirements at the same time you define content requirements. For an instructional video, that usually means: accurate captions, an editable transcript, speaker identification where helpful, and text that can be reused for study guides. For images and diagrams, it means purposeful alt text and (when needed) longer descriptions. For reading supports, it means summaries, glossaries, and text formatted to work with text-to-speech (TTS) and screen readers.

Common mistake: equating “AI-generated” with “done.” AI can produce plausible text that looks correct, especially in captions and summaries, but small errors can harm comprehension or introduce misinformation. Another mistake is failing to budget time for QA. Accessibility-first AI is successful when the workflow includes explicit review steps and a definition of “publishable.”

Outcome for this section: you should be able to state, in one paragraph, the accessibility features your learning product will include and what “pass” looks like for each feature.

Section 1.2: Learner personas and assistive tech touchpoints

Supports are not generic; they exist to remove specific barriers. Build a small set of learner personas and identify the “touchpoints” where assistive technology or alternative formats intersect with your materials. Start with 3–5 personas that reflect your audience, not an abstract checklist.

Examples of practical personas: (1) a learner who is Deaf or hard of hearing and relies on captions and transcripts; (2) a learner with low vision using a screen reader and keyboard navigation; (3) a learner with ADHD who benefits from chunking, headings, and summaries; (4) a multilingual learner who needs plain language, glossaries, and consistent terminology; (5) a learner in a noisy environment using captions as a convenience feature. Each persona maps directly to supports you will produce.

Now map assets to touchpoints. Video touches captions, transcript, and sometimes audio description. Slide decks touch reading order, alt text, and exported PDFs. Interactive labs touch keyboard access and instructions that are readable by screen readers. The point is to know where a breakdown will occur: a missing label, a caption that omits “not,” a summary that removes a key constraint.

Outcome for this section: a simple table in your project notes that lists persona → barrier → support → where it appears in your content pipeline (video editor, CMS, LMS, PDF export, etc.). This table will later drive your acceptance criteria and QA checklist.

Section 1.3: UDL-aligned supports and measurable outcomes

Universal Design for Learning (UDL) is useful because it connects supports to learning outcomes rather than compliance alone. In production terms, UDL gives you categories of support you can build into your workflow: multiple means of representation (captions, transcripts, alternative text), action and expression (downloadable notes, copyable transcripts), and engagement (clear structure, reduced cognitive load, choice of format).

To make this measurable, define outcomes that your team can test. For captions, measurable outcomes include: correct terminology, correct numbers and symbols, and synchronization that allows following along without cognitive strain. For alt text, outcomes include: a learner can answer “what is this image for?” without seeing it, and the description matches the instructional intent. For reading supports, outcomes include: the simplified version preserves constraints and definitions, and the glossary covers the terms that block comprehension.

Write acceptance criteria that are observable. Instead of “captions are good,” use criteria like: “All spoken instructional steps appear in captions; no omissions of negation; domain terms match the lesson glossary; and speaker changes are indicated when it affects meaning.” For reading supports, include: “summary includes prerequisites, warnings, and edge cases; simplification maintains equations, variable names, and must/should distinctions.”

Outcome for this section: a draft of UDL-aligned acceptance criteria for your three support types—captions/transcripts, alt text, and reading aids—written as checks someone else could verify.

Section 1.4: AI capability map (ASR, vision, summarization, rewriting)

Designing an accessibility-first AI workflow requires a clear capability map: what each model type is good at, where it fails, and what inputs reduce risk. Four common capability buckets cover most learning workflows.

ASR (automatic speech recognition) is strong for generating a first-pass transcript and timecoded captions, especially with clean audio and known vocabulary. It fails with accents, crosstalk, low-quality mics, and domain-specific terms (chemical names, code, math notation). Risk control: provide a vocabulary list, speaker names, and reference materials; always plan a human edit pass for instructional content.

Vision models help draft alt text and identify elements in images (charts, UI screenshots). They fail when context is required: what matters in this diagram for this lesson, or which part of a chart supports the claim being taught. Risk control: give the model the surrounding paragraph, learning objective, and any labels/legend text; require a human to confirm intent and avoid hallucinated details.

Summarization can produce study aids and module overviews quickly. It fails by omitting constraints, flattening nuance, and inventing “helpful” additions. Risk control: constrain the summary format (bullets, required sections like “Warnings” and “Key terms”), and verify against the source.

Rewriting/simplification supports plain-language versions and multilingual learners. It fails by changing technical meaning, altering modal verbs (must/should), or “simplifying away” steps. Risk control: use guarded prompts (preserve terminology list; do not change variable names), and compare the simplified output to the original with targeted checks.

Outcome for this section: a workflow plan that assigns AI to drafting and transformation tasks, but assigns humans to intent verification, terminology control, and final QA.

Section 1.5: Quality dimensions: accuracy, completeness, intent, tone

Accessibility outputs fail in predictable ways. You can catch most issues by evaluating five quality dimensions. Treat these as your editing rubric for AI-assisted work.

Accuracy: Are facts, numbers, names, and terms correct? Captions often fail on proper nouns, code tokens, and homophones. Alt text can be “confidently wrong” about what an image contains. Reading supports can introduce subtle factual errors. Mitigation: reference a glossary, use source-of-truth documents, and spot-check against the original media.

Completeness: Does the output include everything a learner needs? Captions that omit side comments may be fine—unless the side comment contains the actual instruction (“Don’t click save yet”). Alt text that describes visuals but misses the purpose (“This diagram shows the causal relationship…”) is incomplete. Summaries that skip warnings and prerequisites are incomplete. Mitigation: require coverage of learning objectives, steps, and constraints.

Intent: Does the output preserve what the instructor meant? This is where AI rewriting can be risky. If an instructor says “this is an approximation,” the simplified version must keep that uncertainty. Mitigation: include a “preserve intent” rule and verify the output answers the same assessment questions as the source.

Tone: Is the language respectful, neutral, and appropriate for learners? Captions should not “clean up” dialects in a way that changes voice or identity; summaries should not become judgmental (“obviously”). Mitigation: style guidance plus quick tone review.

Usability: Is it readable and usable in assistive tech? Captions need line length limits and timing; transcripts need headings and speaker labels; TTS-ready text needs clean punctuation and expanded acronyms when appropriate. Mitigation: format checks and platform previews.

Outcome for this section: a one-page rubric (even if informal) that editors use for every AI-generated caption set, alt text batch, and reading support artifact.

Section 1.6: Governance basics: human-in-the-loop and documentation

Governance is what makes an accessibility-first AI workflow safe, repeatable, and defensible. It does not need to be heavy, but it must be explicit. Start with three building blocks: documented prompts, defined human review steps, and an audit trail of decisions.

Human-in-the-loop means you decide which steps require approval before publishing. A practical default: humans must review (1) captions for instructional accuracy and timing, (2) alt text for intent and non-hallucination, and (3) any simplification that could change meaning. The “human” may be the instructor, an editor, or a trained QA reviewer, but the role must be assigned.

Documentation includes: your prompt library (with versioning), your acceptance criteria/rubric, and a checklist used at QA time. Add an “error budget” per artifact type: what level of minor issues is acceptable before rework? For example, you might allow minor punctuation issues in transcripts but allow zero errors in safety warnings, chemical dosages, or assessment instructions.

Source assets pack: reduce AI error by standardizing inputs. For a module, assemble the script or outline, slide deck, glossary of terms, speaker names, reference links, and any diagrams with labels. This pack is also what reviewers use to verify outputs.

Tool categories and workflow: plan the pipeline: ingestion (video/audio/image), generation (ASR/vision/text), editing (caption editor, document editor), QA (automated checks + human review), and publishing (LMS/CMS). Choose tools based on export formats (SRT/VTT, accessible PDF, HTML), collaboration needs, and privacy constraints.

Outcome for this section: a scoped pilot plan (one lesson or one module) with named reviewers, a checklist, a storage location for source assets, and a clear definition of “ready to publish.”

Chapter milestones
  • Map learner needs to supports: captions, alt text, reading tools
  • Define quality and risk: where AI helps and where it fails
  • Set acceptance criteria: UDL + WCAG-inspired checks
  • Create your project scope and source assets pack
  • Choose tool categories and plan the workflow
Chapter quiz

1. What does “accessibility-first content production” mean in this chapter?

Show answer
Correct answer: Treating captions, alt text, and reading aids as core learning features from the start
The chapter frames supports like captions, alt text, and reading aids as central learning features, not end-stage add-ons.

2. According to the chapter, what is the primary goal of introducing AI into accessibility-focused production?

Show answer
Correct answer: To build a workflow that is faster and more dependable than a manual-only process
AI is used to improve speed and dependability, with clear boundaries and required human review where needed.

3. Which approach best reflects the chapter’s guidance on where AI can fail and what to do about it?

Show answer
Correct answer: Define quality and risk boundaries, constrain AI appropriately, and require human review where non-negotiable
The chapter emphasizes engineering judgment: constraints, risk boundaries, and mandatory human review for critical parts.

4. What is the purpose of setting acceptance criteria inspired by UDL and WCAG in this chapter?

Show answer
Correct answer: To create checks and rubrics that verify outputs meet defined accessibility/learning standards
Acceptance criteria function as quality gates—repeatable rubrics and checks for consistent results.

5. Which workflow best matches the end-to-end production system described in the chapter?

Show answer
Correct answer: Draft → edit → QA → publish
The chapter explicitly outlines an end-to-end workflow: draft, then edit, then QA, then publish.

Chapter 2: Captioning and Transcripts with AI (End-to-End)

Captions and transcripts are the most “visible” accessibility layer in instructional video: they directly affect comprehension, note-taking, search, and study workflows. AI speech-to-text (ASR) can generate a strong first pass, but publishing that first pass without standards, editing, and QA is how small errors become learning barriers. This chapter walks an end-to-end workflow you can repeat: generate captions/transcripts, fix timing and speakers, improve terminology with custom vocabulary, run a QA checklist, publish in multiple formats, and document decisions so your team can scale.

The goal is engineering judgment, not perfectionism. You will set measurable targets (accuracy, reading rate, line length), decide what gets human review, and maintain a correction log that steadily improves future runs. Think of captions as production software: you need acceptance criteria, a build pipeline, and release notes.

A practical workflow looks like this: (1) prepare audio to help ASR; (2) run ASR to create a first-pass caption file plus a transcript; (3) post-edit for correctness, timing, and accessibility cues; (4) apply custom vocabulary for names and domain terms; (5) QA using sampling and error categories; (6) publish SRT/VTT and a readable transcript; (7) document decisions and store reusable templates (prompts, rubrics, and checklists).

  • Inputs: video/audio file, slide deck (optional), course glossary, roster of names/pronunciations, and any prior transcripts.
  • Outputs: VTT (web), SRT (legacy tools), transcript (HTML/Doc/PDF), and a correction log for continuous improvement.
  • Human steps: focused editing where mistakes cause learning harm (key terms, numbers, steps, safety warnings, assessments).

The rest of the chapter breaks this pipeline into concrete standards and tactics you can apply immediately.

Practice note for Generate first-pass captions and transcripts from audio/video: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Fix timing, speaker labels, and non-speech events: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Improve terminology and names with custom vocabulary: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Run a caption QA checklist and publish in multiple formats: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Document decisions and build a repeatable template: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Generate first-pass captions and transcripts from audio/video: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Fix timing, speaker labels, and non-speech events: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Improve terminology and names with custom vocabulary: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Sections in this chapter
Section 2.1: Caption standards: accuracy targets, line length, reading rate

Before you generate anything, define what “good” looks like. Caption quality is not subjective: it can be measured and audited. For learning content, you should establish an accuracy target (for example, 99%+ for key instructional segments, and a lower threshold for informal discussion), plus formatting constraints that reduce cognitive load.

Accuracy targets. Use two tiers: (1) critical correctness for terminology, numbers, formulas, commands, and assessment prompts; (2) general correctness for conversational filler. AI ASR errors concentrate on names, acronyms, and domain terms—exactly what learners need most—so your rubric should weight those words more heavily than “um” or repeated phrases. If you can’t afford full human review, require full review of critical segments (demos, step-by-step instructions, safety, grading criteria) and sample the rest.

Line length and segmentation. Captions should break at natural linguistic boundaries (phrases and clauses), not mid-word or mid-idea. A practical rule is 1–2 lines per caption, keeping lines short enough for typical players. Many teams aim for roughly 32–42 characters per line depending on font and platform. Avoid placing a single word on a line unless it is intentionally emphasized and brief.

Reading rate. Learners read at different speeds, and captions compete with visuals and slides. Keep a conservative reading rate; many captioning guides target around 140–180 words per minute for instructional content, with exceptions for short bursts. When speech is too fast, do not “summarize” silently; instead, consider improving the source audio (Section 2.2) or planning pauses in the recording. If you must edit, prioritize clarity: remove repeated false starts, but do not delete meaning, steps, or qualifiers.

Common mistakes: (1) using perfect word accuracy but poor segmentation (hard to follow); (2) enforcing strict character limits by deleting key information; (3) allowing inconsistent punctuation that changes meaning (e.g., “Let’s eat, students” vs “Let’s eat students”). Your standard should explicitly state what you will preserve verbatim and where light cleanup is acceptable.

Section 2.2: Audio prep for better ASR (noise, mic distance, pacing)

Caption quality starts upstream. If your audio is noisy, distant, or rushed, AI will produce plausible-looking text that is wrong in ways that are hard to detect. Improving audio often saves more editing time than any prompt trick.

Noise and room sound. Reduce constant noise sources (fans, HVAC, laptop hum). If you can’t control the room, record with a close mic and consider light noise reduction in an editor. Avoid aggressive noise gates that clip syllables; clipped consonants (“t”, “k”, “p”) become ASR substitutions that change meaning in technical terms.

Mic distance and consistency. Use a lapel or headset mic when possible and keep distance consistent. Moving your head away from the mic produces fluctuating volume that confuses diarization (speaker separation) and timing alignment. If recording screen-capture, do a 10-second test, listen back on headphones, and check that the waveform is not peaking or whisper-quiet.

Pacing for learning. Caption readability depends on pacing. Speak slightly slower than conversational speed and insert deliberate micro-pauses between steps (“First… Next… Then…”). Those pauses help learners and also help ASR segment captions cleanly. If multiple speakers are present, ask them not to overlap. Overlap is one of the highest-cost caption problems because even humans struggle to separate the content.

Practical prep checklist. (1) Record 10 seconds; verify levels; (2) ensure the mic is within a hand’s width of the mouth (for lapel/headset, follow manufacturer guidance); (3) reduce background noise; (4) capture separate tracks per speaker when possible; (5) keep a list of names/acronyms you will say, and say them clearly the first time.

The payoff: better first-pass captions and fewer “mystery” errors. When you later build a repeatable template, your recording guidelines are part of the accessibility workflow—not an optional best practice.

Section 2.3: Prompting and post-editing for domain terminology

ASR engines often allow custom vocabulary (phrase lists, hints, or boosted terms). Separately, LLMs can help you post-edit transcripts, but you must constrain them so they don’t “improve” the meaning. The safest approach is: use ASR vocabulary features to prevent errors, then use an LLM for targeted edits with strict rules and diff-based review.

Build a terminology package. Start a simple table: preferred term, common mishearing, and context. Include product names, course-specific jargon, people’s names, and acronyms expanded on first use. Example entries: “UDL (Universal Design for Learning)”, “WCAG (Web Content Accessibility Guidelines)”, “Khan Academy”, “Nguyễn”, “PyTorch”, “SQL”. Feed this into your ASR’s custom vocabulary tool where supported. If not supported, keep it as a post-edit reference for human editors.

Prompting for post-edit. If you use an LLM to clean a transcript, use a prompt that forbids paraphrase and requires minimal edits. Example instruction set: keep wording verbatim; only fix obvious recognition errors; preserve numbers and units; do not change meaning; flag uncertain terms instead of guessing; output a change list. This keeps the model from rewriting content into “nicer” prose that no longer matches the audio.

Post-edit method that scales. (1) Run a terminology pass: search for known mishearings and fix them; (2) run a numbers/units pass: dates, decimals, code versions, measurements; (3) run a proper-noun pass: names, titles, citations; (4) run a punctuation pass: add sentence boundaries to aid reading and TTS. Use the slide deck or script as a grounding reference, but do not force the transcript to match slides if the instructor said something different.

Common mistakes: letting an LLM “summarize” within captions; correcting to the wrong but plausible term (“cache” vs “cash”); standardizing acronyms incorrectly; and silently expanding jargon so captions no longer match what learners hear. Your rubric should require that captions reflect the spoken content, while transcripts may optionally include clarifying expansions in brackets if your policy allows it.

Section 2.4: Speaker identification and meaningful sound cues

Instructional videos increasingly include panels, interviews, and screen-share coaching. Learners who rely on captions need two additional layers beyond words: speaker identification and non-speech information that is essential for understanding.

Speaker labels. Use consistent labels (e.g., “Instructor:”, “TA:”, “Student:”) and keep them stable across the whole course. Label the first caption when a new speaker begins, then again when confusion could occur. Avoid over-labeling every line if a single speaker continues; the goal is clarity, not clutter. If your platform supports positioning, keep labels readable and avoid covering important on-screen text.

Diarization pitfalls. AI diarization can mis-assign speakers when voices are similar, when audio levels change, or when there is crosstalk. Treat diarization as “assistive metadata,” not truth. A fast verification technique is to spot-check each speaker transition and any segment where the content references “I” or “you” (these pronouns often reveal mis-attribution).

Meaningful sound cues. Captions should include non-speech events that matter for comprehension: [laughter] when it signals tone, [applause] when it marks an achievement, [door slams] if it interrupts, [music] if it covers speech, and crucially, instructional sounds such as [timer beeps], [error chime], or [notification sound] during demos. Do not caption irrelevant ambient noise; do caption events that affect meaning, pacing, or the learner’s ability to follow steps.

Timing and synchronization. Good timing is an accessibility feature. Ensure captions appear when the words are spoken and disappear when the phrase ends. Fixing timing often means splitting long captions, aligning start times to the first spoken syllable, and ensuring that on-screen reading is possible without racing the learner. For demos, align captions to actions (“Click Settings…”) so the learner sees the instruction when the click happens.

Common mistakes: labeling speakers inconsistently (“Host” vs “Instructor”), omitting critical cues like [silence] during a pause that signals “think time,” and using editorial cues (“[jokes]”) instead of observable events. Keep cues objective and meaningful.

Section 2.5: Formats and delivery: SRT, VTT, transcripts, LMS embedding

Publishing captions is not a single file export; it is a delivery strategy. Different platforms prefer different formats, and learners benefit from both time-synced captions and a navigable transcript.

SRT vs VTT. SRT is widely supported and simple, but WebVTT (VTT) is often better for web delivery because it supports additional metadata and is the standard for HTML5 video. If your LMS or video host supports VTT, use it as the primary web format and keep SRT as a compatibility artifact for editors and legacy tools. Always validate files after export; small formatting issues (timestamps, commas vs periods, missing blank lines) can break playback.

Transcript formats. Provide a clean transcript in an accessible document format (HTML page, properly structured PDF, or a well-styled document). A transcript is more than captions without timecodes: it should be readable, searchable, and scannable with headings if you include sections. Include speaker names and meaningful sound cues. If you have timecodes, consider a “click-to-jump” interactive transcript when the platform supports it—this benefits all learners, not only those using captions.

Embedding in the LMS. Verify that captions are enabled by default or that the control is discoverable. If the LMS embeds a third-party player, test in a student view and on mobile. Ensure keyboard accessibility: learners should be able to toggle captions and navigate the transcript without a mouse. If you provide a separate transcript link, place it near the video with clear labeling (“Transcript (HTML)”).

Multiple outputs from one source. Treat your edited caption file as the “source of truth.” From that, generate: VTT for web playback, SRT for interchange, and a transcript (with or without timecodes) for reading and study. This reduces drift where the transcript says one thing and captions say another. Store versions with clear naming: course_module_lesson_v1_en.vtt, plus a change log entry for each publish.

Common mistakes: exporting only one format and discovering late that the platform needs another; providing a transcript that is just a raw ASR dump without paragraphs; and failing to test playback after upload (some players silently drop malformed files).

Section 2.6: QA workflow: sampling, error types, and correction logs

Quality assurance is where accessibility-first teams distinguish themselves. The objective is not to “trust the model” or “fix everything manually,” but to implement a repeatable QA workflow with clear acceptance criteria and a feedback loop.

Sampling strategy. If you cannot review 100% of runtime, sample intelligently: (1) always review critical segments (definitions, instructions, assessments, safety, demos); (2) randomly sample at least 3–5 short clips per video (beginning, middle, end); (3) sample any segment containing dense terminology, rapid speech, or multiple speakers. Track the sample coverage so you can justify release decisions.

Error taxonomy. Classify issues so you can prioritize and improve: (1) meaning errors (wrong term, missing “not,” incorrect number)—highest severity; (2) timing errors (late/early captions, unreadably short durations); (3) formatting errors (line breaks, punctuation, speaker labels); (4) accessibility cue errors (missing sound cues, unclear speaker changes); (5) style consistency (acronym expansion, capitalization). Assign an “error budget” per minute (e.g., zero meaning errors allowed; limited minor formatting issues) aligned to your standards from Section 2.1.

Correction log. Maintain a lightweight log with: video ID, timestamp, error type, original text, corrected text, root cause (noise, fast speech, missing vocabulary), and prevention action (add term to custom vocabulary; update recording guidance; adjust prompt). Over time, this becomes your governance artifact: it tells you which content types need more human review and which vocabulary items should be preloaded.

Release checklist. Confirm captions toggle on/off, sync is acceptable, speakers are identified, non-speech cues are present where meaningful, and exports validate. Record who reviewed, what sampling was performed, and what exceptions were accepted. The practical outcome is a repeatable template: a standard operating procedure with prompts, rubrics, and a QA checklist that supports consistent, auditable accessibility.

Common mistakes: relying on a single end-to-end “accuracy score” instead of reviewing meaning-critical segments; fixing errors without logging them (so they recur); and allowing last-minute edits that introduce drift between captions, transcript, and video versions. Treat QA as part of publishing, not a final optional step.

Chapter milestones
  • Generate first-pass captions and transcripts from audio/video
  • Fix timing, speaker labels, and non-speech events
  • Improve terminology and names with custom vocabulary
  • Run a caption QA checklist and publish in multiple formats
  • Document decisions and build a repeatable template
Chapter quiz

1. Why does the chapter warn against publishing AI-generated captions/transcripts without editing and QA?

Show answer
Correct answer: Small errors can become learning barriers that hurt comprehension and study workflows
The chapter emphasizes that unreviewed first-pass ASR can introduce errors that directly affect learning and accessibility.

2. Which sequence best matches the end-to-end workflow described in the chapter?

Show answer
Correct answer: Prepare audio → run ASR → post-edit → apply custom vocabulary → QA → publish formats → document decisions/templates
The chapter lays out a repeatable pipeline from preparation through documentation, with QA before publishing.

3. What is the main purpose of using a custom vocabulary in the captioning process?

Show answer
Correct answer: To improve recognition of names and domain-specific terminology
Custom vocabulary is used to boost accuracy for proper nouns and specialized terms that ASR often mishears.

4. According to the chapter, where should human review be focused to reduce learning harm?

Show answer
Correct answer: Key terms, numbers, steps, safety warnings, and assessments
The chapter recommends prioritizing human effort on content where errors most damage understanding or safety.

5. What does the chapter mean by treating captions like “production software”?

Show answer
Correct answer: Use acceptance criteria, a repeatable pipeline, and release-note-style documentation (e.g., correction logs)
The analogy highlights setting measurable standards, running a workflow, and documenting decisions rather than pursuing perfectionism.

Chapter 3: Alt Text that Teaches (Not Just Describes)

Alt text is often treated as a compliance checkbox: “describe what’s in the picture.” In learning content, that mindset produces alt text that is either too vague (“a chart”) or too literal (“a blue line going up”), neither of which helps a learner build understanding. Accessibility-first AI changes the goal: alt text should deliver equivalent learning value for learners who can’t see the image, without overwhelming them or duplicating the surrounding text.

This chapter turns alt text into a teachable, repeatable production workflow. You will classify images to decide whether alt text is required, generate drafts with context-aware prompts, and apply engineering judgment for complex visuals like charts, diagrams, equations, and maps. You’ll also learn how to QA for bias, privacy, and over-description, and then consolidate decisions into a style guide and rubric so your course stays consistent across lessons and authors.

A practical framing: alt text is a micro-explanation that sits at the intersection of instruction and interface. It must be accurate, minimal, and aligned to the learning objective—especially when AI is involved. AI can accelerate drafting, but it can also hallucinate details, inject assumptions, or expose sensitive information. Your workflow should therefore include: (1) image classification, (2) context-first prompting, (3) pattern-based drafting, (4) long descriptions where needed, and (5) a review rubric with acceptance criteria.

Practice note for Classify images and decide when alt text is required: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Draft alt text with AI using context-aware prompts: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Handle complex visuals: charts, diagrams, equations, maps: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Review for bias, privacy, and over-description: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Create an alt text style guide and rubric: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Classify images and decide when alt text is required: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Draft alt text with AI using context-aware prompts: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Handle complex visuals: charts, diagrams, equations, maps: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Review for bias, privacy, and over-description: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Sections in this chapter
Section 3.1: Decision tree: decorative vs informative vs functional images

Start by deciding whether the image needs alt text at all. This is not a writing step; it’s a content design step. A simple decision tree prevents two common errors: adding noisy alt text for decorative images (which creates cognitive load for screen reader users), and omitting essential information from informative or functional images.

Decorative images do not contribute meaning. Examples: background textures, purely aesthetic stock photos, divider icons, and repeated branding marks. Best practice is to use empty alt text (alt="") so assistive tech can skip it. If an image is decorative but still announced because of the UI, fix the implementation rather than writing descriptive filler.

Informative images carry learning content: a diagram of the water cycle, a photo demonstrating a lab setup, a chart showing trends, or a screenshot showing where to click. These require alt text that communicates the instructional point. If the same information is already stated in adjacent text, alt text can be shorter (or may be redundant), but you still need to ensure non-visual users aren’t missing information that sighted learners get “for free.”

Functional images trigger an action: an icon button, a linked image, an image-based control. The alt text should name the action, not the appearance: “Download worksheet (PDF)” rather than “down arrow icon.” For a linked logo, the alt text should be the destination or purpose (“Home” or “Company homepage”), not “Company logo.”

  • Rule of thumb: If removing the image changes what a learner can do or learn, it is not decorative.
  • Workflow tip: Add an “image role” field in your content production template (decorative / informative / functional / complex). Make it mandatory before drafting alt text.
  • AI tip: Ask AI to classify first, then draft—two separate steps reduce over-confident writing.

This classification also sets up how you handle complex visuals: if an image carries dense data (chart, map), you may need both short alt text and a structured long description. Don’t force everything into one cramped sentence.

Section 3.2: Context-first prompting: learning objective, audience, tone

AI can describe pixels; it cannot automatically infer instructional intent. Context-first prompting supplies what the model lacks: why the image exists in the lesson, who the learner is, and what level of detail is helpful. Without context, AI tends to list visual details (colors, positions) rather than teach the concept (relationships, comparisons, steps).

Use a prompt template that always includes: (1) learning objective, (2) audience, (3) surrounding text or narration, (4) what to emphasize, and (5) constraints (length, tone, avoid guessing). Example prompt you can reuse in a pipeline:

  • Learning objective: What should learners understand or do after seeing the image?
  • Audience: Grade level, domain familiarity, language proficiency, accommodations (e.g., screen reader users, early readers).
  • Context: Paste the paragraph before/after the image, plus any callouts or labels.
  • Instructional emphasis: Identify key relationships (“compare A vs B”), steps (“first…then…”), or outcomes (“trend increases after…”).
  • Constraints: 50–150 characters unless complex; do not mention colors unless necessary; do not infer identity or sensitive traits; if uncertain, flag unknowns.

Also specify the function of the image within the lesson: attention getter, example, evidence, step-by-step procedure, or assessment support. That single line often changes the best alt text from “Photo of…” to “Example of…” or “Step 2 shows…”

Engineering judgment matters: if the lesson already defines terms, keep alt text lean; if the image introduces a new concept (first time learners see a schematic), allow more explanation or add a long description. Treat prompts as governed assets: version them, document approved templates, and align them to your course outcomes so authors don’t improvise wildly.

Section 3.3: Concise alt text patterns (50–150 chars) and when to expand

Most instructional images work best with concise alt text—typically 50–150 characters—because screen reader users should not be forced through a paragraph when a sentence will do. Concision does not mean shallow; it means selecting the teaching point. Use patterns to keep output consistent across a course.

Reliable short patterns include:

  • Concept + key relationship: “Bar chart comparing quiz scores: practice group averages 82, control 71.”
  • Procedure step: “Step 3: Insert the resistor between nodes A and B on the breadboard.”
  • Definition example: “Example of a dependent clause: ‘because I studied’ cannot stand alone.”
  • Interface guidance: “Screenshot: Click ‘Settings’, then ‘Accessibility’ in the left menu.”
  • Equation identity (simple): “Quadratic formula: x = (−b ± √(b²−4ac)) / 2a.”

Common mistakes are (1) starting every alt text with “Image of,” (2) listing irrelevant details (“blue button, top right”), and (3) restating the caption verbatim. Instead, aim for: what it is, what it shows, why it matters—often in one sentence.

When should you expand beyond 150 characters? Expand when the image contains multiple labeled parts needed for comprehension (a biology diagram), dense data (a multi-series chart), spatial relationships critical to the task (a geometry proof diagram), or instructions that must be followed precisely (safety steps). In those cases, keep the alt text short (“Diagram of the heart with labeled chambers; long description follows.”) and move detail into a long description or nearby text. This keeps navigation efficient while still providing full access.

Section 3.4: Long descriptions for complex visuals (structured breakdowns)

Complex visuals—charts, diagrams, equations with annotations, maps—often cannot be translated into a single alt sentence without losing meaning. The accessibility-first approach is to provide: (1) a short alt text that identifies the visual and its main takeaway, and (2) a long description that delivers the underlying data or structure in a navigable format.

Long descriptions work best when structured. Avoid a wall of text. Use headings or ordered lists in nearby content (or an adjacent “Description” expandable panel) so learners can skim. A practical structure for complex visuals is:

  • Title/purpose: What the visual is about and why it’s included.
  • Layout: How it’s organized (axes, regions, components, flow direction).
  • Key elements: List labeled parts, series, nodes, or regions.
  • Data highlights: Max/min, trends, comparisons, outliers—include numbers if they matter.
  • Conclusion: One sentence linking back to the learning objective.

Charts: Provide axes labels, units, time ranges, and the main trend(s). If the chart supports calculation, include a small table of values or the dataset in text. Diagrams: Describe components and relationships (“A connects to B via…”, “inputs flow into…”). Equations: Ensure they’re in accessible math markup where possible; if not, provide a text math version and define symbols. Maps: State the region, the variable mapped, the legend categories, and the spatial pattern (“higher in the northeast, lower along the coast”).

AI is helpful for first drafts of long descriptions, especially to extract visible labels and propose a structure. But require a verification step: confirm labels, numbers, and relationships against the source data. If the chart was generated from a spreadsheet, prefer generating the long description from the underlying data rather than from an image interpretation. That shift—data-first instead of pixels-first—reduces hallucinations and improves precision.

Section 3.5: Common failure modes: hallucinations, assumptions, sensitive traits

Alt text is a high-risk surface for AI errors because it sounds authoritative. Your review process should explicitly check for failure modes that appear frequently in AI-generated descriptions.

Hallucinations: The model may invent labels, numbers, or relationships (“the red line peaks at 50%”) even when the chart is unreadable or ambiguous. Mitigation: instruct the model to state uncertainty (“Text is unclear”) and require human verification for any quantitative claim. In a governed workflow, set an “error budget”: for example, zero tolerance for incorrect numbers in charts and zero tolerance for invented UI labels in screenshots.

Assumptions and mind-reading: AI may infer intent or causality (“sales increased because of the campaign”) when the chart only shows correlation. It may also infer what someone is doing or feeling (“a happy student”). Mitigation: keep descriptions observable and learning-relevant; avoid attributing emotions, motivations, or causes unless the lesson text explicitly states them.

Sensitive traits and privacy: Do not identify people by race, disability, religion, immigration status, or other sensitive characteristics. Do not guess age, gender identity, or medical conditions. Also avoid doxxing: names on badges, emails in screenshots, student faces in classroom photos, home addresses on forms. Mitigation: redact at the source when possible; otherwise write alt text that omits identifying details (“Instructor demonstrates pipetting at a lab bench”) and ensure your media policy covers consent and anonymization.

Over-description: Listing colors, clothing, decorative objects, or background scenery can bury the teaching point. Include such details only when instructionally necessary (e.g., “highlighted in yellow” if color is the only indicator—though better is to fix the visual to not rely on color alone).

These checks are not merely editorial; they are part of responsible AI governance. Treat alt text as learner-facing instructional content with the same QA rigor as explanations and assessments.

Section 3.6: Review rubric and consistency across a course module

Consistency is what turns good alt text into a scalable practice. A course module may contain dozens of images produced by different authors using different AI prompts. Without a rubric and style guide, learners experience a patchwork: some alt text is verbose, some is empty, some repeats captions, and some introduces new terminology. Build a module-level standard and enforce it in review.

A practical review rubric (score or pass/fail) can include:

  • Classification correct: Decorative uses alt=""; functional describes action; informative/complex handled appropriately.
  • Instructional alignment: Alt text supports the learning objective and matches surrounding content.
  • Accuracy: No invented labels, numbers, or causal claims; units and terms match the lesson.
  • Conciseness: Typically 50–150 characters; longer only with clear justification or moved to long description.
  • No redundancy: Does not duplicate captions or nearby text word-for-word unless required for equivalence.
  • Bias/privacy safe: No sensitive-trait guesses; no personal data from screenshots; respectful language.
  • Consistency: Terminology, tone, and patterns match the course style guide.

Turn the rubric into a style guide with “house rules,” such as: avoid “image of,” prefer present tense, include numbers only when verified, define acronyms only if not defined elsewhere, and standardize how you describe common visuals (“Screenshot: …”, “Diagram: …”, “Chart: …”). Include examples from your own course so authors can pattern-match.

Finally, operationalize the workflow: store approved prompts, track changes to alt text like any other curriculum asset, and require human sign-off for complex visuals. When AI is used, keep an audit trail: prompt, model, date, and reviewer. This governance may feel heavy at first, but it prevents small accessibility regressions from compounding across a module—and it ensures your alt text does what it should in learning: teach.

Chapter milestones
  • Classify images and decide when alt text is required
  • Draft alt text with AI using context-aware prompts
  • Handle complex visuals: charts, diagrams, equations, maps
  • Review for bias, privacy, and over-description
  • Create an alt text style guide and rubric
Chapter quiz

1. What is the chapter’s main shift in purpose for alt text in learning content?

Show answer
Correct answer: Provide equivalent learning value aligned to the learning objective, without overwhelming or duplicating nearby text
The chapter emphasizes alt text as a micro-explanation that supports learning, not a compliance-only or exhaustive description.

2. Why can “a chart” and “a blue line going up” both be ineffective alt text in instructional materials?

Show answer
Correct answer: They are either too vague or too literal to help learners build understanding
The chapter warns that vague labels and literal visual recounting often fail to convey the concept the image is meant to teach.

3. Which workflow best matches the chapter’s recommended production process for alt text when AI is involved?

Show answer
Correct answer: Image classification, context-first prompting, pattern-based drafting, long descriptions where needed, review rubric with acceptance criteria
The chapter presents a structured workflow that starts with classification and context, and ends with rubric-based review.

4. What is the key engineering judgment needed for complex visuals like charts, diagrams, equations, and maps?

Show answer
Correct answer: Deciding when a longer description is needed to convey equivalent learning value
Complex visuals may require extended description to communicate the instructional meaning, not just surface features.

5. Which risk is specifically highlighted as a reason AI-generated alt text must be reviewed with a rubric?

Show answer
Correct answer: AI can hallucinate details, inject assumptions, or expose sensitive information
The chapter notes common AI risks—hallucination, assumptions, and privacy exposure—making QA and acceptance criteria essential.

Chapter 4: Reading Supports with AI (Simplify, Support, Preserve Meaning)

Reading supports are not “easier content.” They are alternative pathways to the same learning goals. In accessibility-first production, you design supports so learners can enter the material at the right level of complexity, while preserving the author’s intent, disciplinary precision, and assessment alignment. AI can help you produce these supports at scale—leveled summaries, previews, plain-language rewrites, glossaries, and TTS-ready text—but only if you treat AI as a drafting tool with explicit constraints and rigorous verification.

This chapter focuses on a workflow mindset: generate supports, validate fidelity, and ship with guardrails. You will practice engineering judgment: deciding what must remain unchanged (definitions, formulas, claims, citations, terminology), what can be simplified (sentence structure, ordering, redundancy), and what must be added (signposting, examples, non-examples, pronunciations) to reduce cognitive load. You’ll also build a QA loop: side-by-side checks between source and AI output, uncertainty flags, and small-scale learner testing to catch subtle meaning drift.

As you work through the sections, keep one principle in view: every support should be reversible. A learner who reads a summary or simplified passage should be able to return to the original and recognize the same ideas, not a different argument. This is the core of “simplify, support, preserve meaning.”

  • Practical outcomes you should be able to ship after this chapter: leveled previews and summaries, a glossary with concept scaffolds, plain-language versions that meet readability targets, and text formatted for reliable TTS/screen reader rendering—each with acceptance criteria and a fidelity audit trail.

In the next sections, you’ll build the reading-support toolbelt: structure and chunking, summarization frameworks, constrained simplification prompts, vocabulary scaffolds, TTS-ready authoring, and fidelity checks.

Practice note for Create leveled summaries and previews without losing core ideas: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Generate glossary entries and concept scaffolds: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Rewrite for plain language and readability targets: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Prepare text for TTS and screen readers (structure and punctuation): document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Validate fidelity with side-by-side checks and learner testing: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Create leveled summaries and previews without losing core ideas: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Generate glossary entries and concept scaffolds: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Rewrite for plain language and readability targets: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Sections in this chapter
Section 4.1: Types of reading supports: chunking, headings, previews, questions

Reading supports begin with structure, not paraphrase. Many accessibility failures happen because content is “correct” but presented as an uninterrupted wall of text, forcing learners to hold too much in working memory. AI can help you reorganize content into chunks, but your job is to decide the pedagogical skeleton: what is the sequence of ideas, what is the minimum set of headings, and where should the learner pause to self-check?

Start by extracting a content outline from the source: main claim, sub-claims, procedures, examples, and exceptions. Then ask AI to propose headings that reflect meaning, not decoration. Good headings are action-oriented (“Identify variables in the experiment”) or concept-oriented (“Why correlation is not causation”). Poor headings restate vague topics (“Introduction,” “More information”).

  • Chunking: Break paragraphs at idea boundaries; keep each chunk to a single purpose (define, explain, demonstrate, warn).
  • Previews: Add a short “In this section you will…” that sets expectations and reduces anxiety.
  • Guiding questions: Use a few prompts that cue attention (“What changes when temperature increases?”). These are supports, not test items—avoid scoring language.
  • Progress markers: Add signposts (“Step 2 of 4”) for procedures and lab instructions.

Common mistakes: (1) chunking by length instead of meaning, which splits definitions from their constraints; (2) adding too many questions, increasing load; (3) headings that introduce new claims not in the original. Acceptance criteria should include: every heading maps to an idea in the source, and every question is answerable from nearby text without inference leaps.

Workflow tip: keep a “structure-only pass” separate from simplification. First restructure and preview; then simplify language inside each chunk. This separation makes fidelity checks easier because you can verify one kind of change at a time.

Section 4.2: Summarization frameworks: key points, takeaways, action steps

Summaries are among the highest-risk AI outputs because they compress meaning. To reduce risk, summarize with an explicit framework and require the model to ground each bullet in the source. A practical approach is to produce three layers that serve different learner needs: Key Points (what the text says), Takeaways (why it matters), and Action Steps (what to do next). This aligns with UDL by offering multiple ways to engage and plan.

Define “key point” narrowly: a claim, definition, or procedure stated in the source. Define “takeaway” as an interpretation that must be clearly marked as such and still consistent with the source. Define “action steps” as study actions or process steps explicitly supported (e.g., “review the formula,” “try the worked example”), not invented assignments.

  • Leveled summaries: Produce a 1-sentence gist, a 5-bullet short summary, and a 1-paragraph expanded summary. Require consistent terminology across levels.
  • Preview vs. recap: A preview should mention goals and scope; a recap should mention results and key conclusions.
  • Constraint control: Cap length, require inclusion of specific terms, and forbid new facts.

Engineering judgment: in technical domains, you may need to preserve exact definitions, thresholds, or conditions. Tell AI which statements are “must-keep exact,” such as safety warnings, legal language, or assessment-critical definitions. Another high-value pattern is to ask for “what is not covered” in the preview to prevent overgeneralization.

Common mistakes: summaries that swap causation for correlation, omit exceptions (“only if”), or replace a precise term with a near-synonym that changes meaning (e.g., “accuracy” vs. “precision”). Your acceptance criteria should include a coverage check (major sections represented) and a no-new-claims check (every claim traceable to the source).

Section 4.3: Simplification prompts with constraints (grade level, terms to keep)

Simplification is not dumbing down; it is reducing unnecessary complexity while retaining disciplinary accuracy. AI is helpful when you specify constraints that prevent meaning drift. The best prompts treat simplification as a controlled transformation with explicit “do not change” lists.

Start by selecting a target: grade-level band, CEFR level, or a readability metric your organization uses (e.g., shorter sentences, fewer embedded clauses). Then specify terms to keep—domain vocabulary that learners must learn and that assessments may require. If a term is advanced but essential, keep it and add a brief inline explanation or link to a glossary entry rather than replacing it.

  • Prompt pattern (practical): “Rewrite for Grade 7. Keep these terms exactly: [list]. Do not add facts. Preserve all numbers, conditions, and warnings. Use active voice when possible. Max sentence length: 20 words. If a sentence cannot be simplified without losing meaning, keep it and add a short clarification sentence after it.”
  • Decompose hard passages: Ask AI to first identify the “hardness sources” (nested clauses, dense nominalizations, jargon clusters), then rewrite.
  • Parallel outputs: Generate two versions: “plain language” and “technical but clearer.” This supports learners who want precision without extra density.

Common mistakes: (1) replacing a technical term with a broader everyday word (e.g., “work” in physics); (2) removing hedges that signal uncertainty (“may,” “suggests”); (3) rewriting examples into different scenarios, accidentally changing what is being illustrated. A good QA habit is to highlight all changed nouns/verbs and scan for conceptual substitutions.

Practical outcome: a simplification pipeline that produces consistent, leveled text while maintaining assessment alignment. You should be able to point to a prompt template, a list of protected terms, and an error budget (e.g., “0 tolerance for changed numeric values; low tolerance for altered modality; moderate tolerance for sentence reordering”).

Section 4.4: Vocabulary support: glossaries, examples, analogies, non-examples

Vocabulary support is where AI can dramatically increase learner access—especially for multilingual learners and novices—if you treat it as concept scaffolding rather than a dictionary dump. A useful glossary entry includes: a learner-friendly definition, the technical definition (if different), pronunciation guidance when needed, an example, a non-example, and common confusions.

Generate glossary candidates by having AI extract high-utility terms: those that appear frequently, carry heavy conceptual load, or are prerequisites for later lessons. Then decide which are teach terms (must be learned) versus support terms (help comprehension but not assessed). Teach terms should be stable across the course; standardize their definitions and reuse them to avoid inconsistent phrasing.

  • Examples: Require examples that mirror the lesson context (same domain, same variable types) to avoid accidental curriculum drift.
  • Analogies: Use cautiously. Ask AI to label analogies explicitly and list where the analogy breaks to prevent misconceptions.
  • Non-examples: Highly effective for boundary setting (e.g., “A hypothesis is not a guess; it is a testable explanation”).
  • Concept scaffolds: Add “related terms,” “prerequisites,” and “when you will use this next.”

Common mistakes: definitions that are circular, analogies that introduce new technical constructs, and examples that contradict the original text’s constraints. Acceptance criteria: each definition is consistent with course usage, each example is plausible and aligned, and each non-example clarifies a boundary without introducing a competing misconception.

Practical workflow: maintain a glossary database (spreadsheet or CMS) with fields for term, definition (plain), definition (technical), example, non-example, pronunciation, and source lesson. Use AI to draft, but require human approval for teach terms. Over time, this becomes governance: a controlled vocabulary that makes future AI rewriting safer and more consistent.

Section 4.5: TTS-ready authoring: formatting, abbreviations, math and symbols

Text-to-speech (TTS) and screen readers are unforgiving about structure and punctuation. AI can help you rewrite content to be “speakable,” but you must understand what assistive tech needs: clear headings, predictable lists, unambiguous abbreviations, and math expressed in a readable form. A TTS-ready version is not a different lesson; it is the same lesson encoded for reliable audio rendering.

Begin with formatting: use semantic headings (consistent levels), short paragraphs, and lists for steps. Avoid visual-only cues such as “see above” or “in the box on the right.” Replace them with explicit references (“In the previous section, ‘Variables’…”). Ensure link text is descriptive (“Read the rubric” instead of “Click here”).

  • Abbreviations: Expand on first use (“WCAG (Web Content Accessibility Guidelines)”), and consider adding pronunciation hints for uncommon acronyms.
  • Numbers and units: Keep units adjacent (“5 mm”), avoid ambiguous ranges (“3–5” should become “3 to 5”).
  • Math and symbols: Provide a spoken equivalent (e.g., “x squared” for x²; “greater than or equal to” for ≥). For complex equations, include a linearized version and, when necessary, a separate “math description” line.
  • Punctuation for prosody: Insert commas to prevent run-on speech; use colons before lists; avoid excessive parentheses that can confuse TTS pacing.

Common mistakes: leaving raw symbols (→, ∑, ≤) without text equivalents, using inconsistent list punctuation, and embedding critical meaning in bold/italics only. Acceptance criteria: headings are navigable, lists read as lists, abbreviations are expanded at first mention, and math has an accessible reading.

Practical outcome: a “TTS pass” checklist you run after simplification. In production, this can be partially automated (regex checks for symbols/abbreviations) plus a quick screen-reader smoke test to catch awkward phrasing before learners do.

Section 4.6: Fidelity checks: meaning preservation, citations, and uncertainty flags

Fidelity is the difference between helpful supports and harmful misinformation. Treat every AI-generated reading support as a transformation that must be verified against the source. The most reliable method is a side-by-side check: source on the left, AI output on the right, with a reviewer confirming that each claim in the output is supported and that all critical constraints from the source are preserved.

Operationalize fidelity with a rubric. At minimum, check: (1) meaning preservation (no changed relationships, conditions, or causality), (2) completeness for the intended purpose (key steps not dropped in procedures; core definitions present), (3) language integrity (hedges, modality, and tone not altered in ways that change certainty), and (4) terminology consistency (protected terms unchanged; glossary-aligned definitions).

  • Citations and traceability: For summaries, require the model to include “source anchors” (e.g., section heading, timestamp, paragraph ID) for each bullet. Even if you remove anchors in the learner-facing version, keep them in your editorial record.
  • Uncertainty flags: Instruct AI to mark any statement it cannot ground with “UNCERTAIN” rather than guessing. Reviewers either correct, cite, or remove flagged items.
  • Error budgets: Define non-negotiables (numbers, names, safety steps, legal language) and tolerances (sentence order, synonym choice outside protected terms).
  • Learner testing: Run quick comprehension checks with representative learners (including screen reader users). Look for misinterpretations caused by simplified phrasing or missing context.

Common mistakes: accepting a “fluent” rewrite without verifying subtle shifts (especially in scientific claims), allowing AI to invent examples that feel plausible, and failing to record decisions for later audits. Practical governance: store prompts, version history, reviewer notes, and known failure modes. Over time, your prompts become safer because they incorporate real errors you’ve seen, and your supports become more consistent across a course catalog.

When you can demonstrate traceability, clear uncertainty handling, and learner-informed improvements, AI becomes a scalable accessibility ally rather than a hidden risk. That is the standard for accessibility-first reading supports.

Chapter milestones
  • Create leveled summaries and previews without losing core ideas
  • Generate glossary entries and concept scaffolds
  • Rewrite for plain language and readability targets
  • Prepare text for TTS and screen readers (structure and punctuation)
  • Validate fidelity with side-by-side checks and learner testing
Chapter quiz

1. In this chapter, what is the core purpose of reading supports?

Show answer
Correct answer: Provide alternative pathways to the same learning goals while preserving meaning
Reading supports are not “easier content”; they help learners access the same goals at an appropriate complexity level without changing the ideas.

2. Which workflow best matches the chapter’s accessibility-first approach to using AI for reading supports?

Show answer
Correct answer: Generate supports, validate fidelity, then publish with guardrails and an audit trail
The chapter emphasizes AI as a drafting tool plus rigorous verification: side-by-side checks, flags for uncertainty, learner testing, and guardrails.

3. Which set best represents what must remain unchanged during simplification to preserve fidelity?

Show answer
Correct answer: Definitions, formulas, claims, citations, and key terminology
Engineering judgment requires protecting precision elements (e.g., definitions, formulas, claims, citations, terminology) while simplifying structure.

4. What does the chapter mean by the principle that every support should be “reversible”?

Show answer
Correct answer: A learner can return to the original and recognize the same ideas, not a different argument
Reversibility means the support preserves the author’s intent and core ideas so the original remains recognizable and aligned.

5. Which practice is part of the chapter’s QA loop for catching subtle meaning drift?

Show answer
Correct answer: Side-by-side checks of source vs. AI output plus small-scale learner testing
The chapter calls for side-by-side fidelity checks, uncertainty flags, and learner testing to detect subtle changes in meaning.

Chapter 5: QA, Compliance, and Responsible AI Operations

Accessibility-first production does not end when an AI tool generates captions, alt text, or simplified reading supports. The work becomes operational: verifying quality, documenting decisions, protecting learners, and making improvements predictable. In practice, teams fail not because they “don’t care,” but because they lack a shared definition of done, review roles, and a plan for what happens when defects are found after publication.

This chapter treats accessibility as a system you operate. You will set acceptance criteria for captions, transcripts, alt text, and reading supports; decide when to review everything versus sampling; manage privacy risks in audio/video and transcripts; and create lightweight reporting that helps stakeholders understand progress without drowning the team in paperwork. You’ll also establish an incident response path—because accessibility defects are not hypothetical, and learners should not be the first to discover them.

Think like an engineering lead: you are balancing risk, time, and learner impact. You will create checklists that are specific enough to catch common errors (names, numbers, equations, speaker changes, image intent) and flexible enough to support different course types. You will set error budgets and sampling plans so review effort is proportional to risk. You’ll clarify who approves what, and what evidence you keep to demonstrate compliance with WCAG and alignment with UDL goals.

The practical outcome: a repeatable workflow where AI accelerates production, humans safeguard meaning, and your team can answer tough questions—“How do you know captions are accurate?” “What do you do with minors’ voices?” “Can you show an audit trail?”—with confidence.

Practice note for Build checklists for captions, alt text, and reading supports: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Set sampling plans, error budgets, and review roles: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Handle privacy, consent, and data retention for media and transcripts: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Establish incident response for accessibility defects: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Create a lightweight accessibility report for stakeholders: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Build checklists for captions, alt text, and reading supports: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Set sampling plans, error budgets, and review roles: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Handle privacy, consent, and data retention for media and transcripts: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Sections in this chapter
Section 5.1: Accessibility acceptance criteria as “definition of done”

Acceptance criteria are your accessibility “definition of done.” Without them, teams debate quality after the fact, and accessibility becomes subjective. Your criteria should be testable, tied to learner impact, and mapped to the asset type: captions/transcripts, alt text, and reading supports (summaries, simplifications, glossary, and TTS-ready text).

Start by turning WCAG/UDL principles into concrete checks. For captions and transcripts, criteria typically include: correct words and numbers (especially assessments, dates, prices, and measurements), correct speaker attribution, synchronized timing, punctuation that supports comprehension, and inclusion of meaningful non-speech audio (e.g., “laughter,” “door slams,” “music begins”) when relevant. A common mistake is “pretty good” captions that miss domain terms; fix this by requiring a glossary pass: all key terms must match course vocabulary exactly.

For alt text, acceptance criteria must emphasize intent and context: describe what matters for learning, not every pixel. Require that charts include the takeaway (trend, comparison, outliers) and that decorative images are marked as decorative (or given empty alt text) to reduce noise for screen reader users. Common mistakes include repeating nearby text verbatim, guessing identities, and including subjective judgments (“beautiful,” “scary”) without instructional purpose.

For reading supports, define what “safe simplification” means: preserve learning objectives, keep definitions consistent, avoid changing technical meaning, and avoid fabricating citations. Add a TTS-ready criterion: expand ambiguous abbreviations, remove odd punctuation that breaks speech, and ensure headings and lists read cleanly.

  • Deliverable-based checklist: create a short checklist per asset type (10–15 items) that reviewers can complete quickly.
  • Stop-ship items: define defects that block release (e.g., missing captions, wrong answer options, miscaptioned safety instructions, or incorrect chart interpretation).
  • Evidence: decide what you keep (reviewer initials, tool used, version, timestamp) so compliance does not rely on memory.

When acceptance criteria are explicit, AI becomes safer: prompts can target each criterion (“verify numbers,” “flag uncertain terms”), and humans can review with consistency.

Section 5.2: Human review models: full review vs sampling vs risk-based

Human review is non-negotiable, but “review everything” is not always feasible. Choose a review model that matches risk, maturity, and the stakes of the content. The three common models are full review, sampling, and risk-based review—and most teams blend them.

Full review means a human checks every asset before release. Use it when you are onboarding a new AI workflow, launching a new course with high visibility, publishing high-stakes content (certification prep, compliance training, health/safety), or working with minors. Full review also makes sense when source audio is noisy, speaker accents vary widely, or when content includes many numbers, formulas, or names.

Sampling review checks a subset to estimate overall quality. Sampling works when production volume is high and content risk is moderate. Define a sampling plan with clear rules: for example, review 10% of videos per module, with at least one from each instructor; review the first two items from a new vendor; and always sample items with low model confidence or high edit distance from previous versions. A common mistake is “random sampling” that misses edge cases; include targeted sampling for known failure modes (math, code, multiple speakers, heavy jargon).

Risk-based review allocates effort where harm is likely. Build a risk score using factors like: learner stakes (graded vs optional), audience (ESL learners, deaf/hard-of-hearing, screen reader users), media complexity (multiple speakers, background noise), and content sensitivity (medical, legal, identity). High-risk items get full review; low-risk items get sampling plus automated checks.

Make roles explicit. Typical roles include: Producer (runs AI generation), Accessibility QA (checks against criteria), Subject Matter Expert (validates technical meaning), and Approver (signs off for release). Pair this with error budgets: e.g., “caption word error rate must be below X% for release” and “zero tolerance for wrong numbers in assessments.” Error budgets clarify trade-offs and prevent endless polishing while still protecting learners.

Section 5.3: Bias and harm review for language and images

Accessibility quality is not only about accuracy; it is also about harm prevention. AI-generated captions, alt text, and reading supports can introduce bias through word choice, assumptions about identity, or omission of context. A responsible operation includes a bias and harm review that is lightweight but consistent.

For language outputs (captions, transcripts, summaries, simplifications), check for: stigmatizing terms, unnecessary mention of protected characteristics, and “tone drift” that changes how a speaker is perceived (e.g., making a hesitant speaker sound incompetent). Watch for over-simplification that removes agency (“they can’t understand”) or changes modality (“must” vs “may”). Another common mistake is “correcting” dialect or accent in a way that erases identity; captions should be readable, but they should not rewrite meaning or mock speech patterns.

For image descriptions, avoid guessing sensitive attributes (race, gender identity, disability, religion) unless it is clearly relevant to the learning objective and evident from the content. Prefer observational, verifiable descriptions: “a person using a wheelchair” (if visible and relevant) rather than inferred medical conditions. For classroom photos, be careful: alt text can unintentionally identify students (“Sarah in the front row”) or disclose a minor’s identity; default to non-identifying descriptions.

Operationalize harm review with a short rubric and escalation path:

  • Rubric categories: identity assumptions, stereotypes, demeaning language, privacy leakage, and content sensitivity.
  • Required flags: any mention of protected traits, any uncertain identification, any “editorializing” adjectives without instructional purpose.
  • Escalation: if flagged, route to an accessibility lead or DEI reviewer, not only the content author.

The goal is not to eliminate all risk—no process can—but to make risk visible, reviewed, and learnable. Each flagged issue should feed back into prompt guidance (e.g., “do not infer identity,” “describe only what is needed for the learning point”) and into your training data choices (custom term lists and style guides).

Section 5.4: Privacy and security: PII in audio, faces, classrooms, minors

Media accessibility workflows handle sensitive data by default: voices, faces, names, and sometimes full classroom scenes. Captions and transcripts can amplify privacy risk because they turn hard-to-search audio into searchable text. Responsible AI operations require privacy-by-design decisions before you upload content to any tool.

Start with a data inventory: what media you collect, where it is stored, who accesses it, and which vendors process it. Identify PII and sensitive content: student names, email addresses, student IDs, health information, location references, and any content involving minors. For classroom recordings, assume incidental capture (students speaking off-camera, name tags, laptop screens). Decide whether you can avoid capturing it (camera framing, audio policies) rather than trying to “clean it later.”

Consent must be explicit and age-appropriate. For minors, obtain guardian consent and follow school or district policy. Even with consent, apply minimization: do not include faces or names in publicly distributed materials unless necessary. In transcripts, consider redaction rules: replace identifiers with neutral tokens (“[Student]”) when the identity is not instructionally relevant.

Operational controls to implement:

  • Retention limits: set how long raw uploads, generated transcripts, and intermediate files are kept. Delete what you don’t need.
  • Access control: least-privilege permissions; separate roles for editors vs approvers; log access to sensitive assets.
  • Vendor review: confirm whether uploaded media is used for model training, where it is processed geographically, and whether you can opt out. Ensure encryption in transit and at rest.
  • Secure handling: avoid pasting full transcripts containing PII into unmanaged tools; use approved environments.

A common mistake is treating captions as “just text.” Treat them as derived personal data when they include identifiable speech. Align your workflow with your institution’s privacy policy and applicable laws (e.g., FERPA or local equivalents), and document the decisions so teams can follow them consistently.

Section 5.5: Metrics: accuracy rates, turnaround time, learner satisfaction

Metrics make QA and compliance manageable at scale. You do not need a complex dashboard to start, but you do need a few measures that connect directly to learner experience and operational performance. Good metrics also support your sampling plans and error budgets: you can tighten review when quality drops and relax it when the process is stable.

For captions and transcripts, track an accuracy metric that matches your tools and capacity. Word Error Rate (WER) is common, but even a simpler “critical error count per minute” can be effective if consistently applied. Define what counts as critical: wrong numbers, wrong negation (“can” vs “can’t”), incorrect key terms, or missing speaker changes. Track formatting compliance too: are captions present, properly synchronized, and readable (line length, timing)?

For alt text, measure coverage (percentage of images with appropriate alt text or marked decorative) and rubric pass rate (intent captured, no identity guessing, chart takeaway present). A frequent operational issue is “alt text exists but is unhelpful”; a rubric-based review catches this better than a binary check.

For reading supports, track factual consistency issues found in review, and whether learners use the supports. Useful measures include: number of glossary lookups, time-on-page changes, and support feature adoption (TTS usage, summary views). Pair usage with learner satisfaction: short feedback prompts embedded in the UI (“Was this summary accurate and helpful?”) can surface failures that internal QA misses.

Operational metrics matter as well: turnaround time from upload to publish; percent of items meeting SLA; and rework rate (how often assets require a second review). Tie these to your review model: if sampling finds rising error rates, increase sample size or shift categories to full review. Finally, publish a monthly “top defects” list (e.g., misrecognized domain terms, missing non-speech cues) to drive prompt updates, vocabulary injection, and targeted training for editors.

Section 5.6: Documentation: prompt libraries, rubrics, and audit trails

Documentation is what turns a one-off success into an operation you can trust, delegate, and defend. The goal is lightweight governance: enough structure to produce consistent outputs and satisfy audits, without creating a bureaucratic bottleneck. Focus on three artifacts: prompt libraries, rubrics, and audit trails.

Prompt libraries are versioned templates for recurring tasks: caption cleanup, speaker labeling, alt text for charts, safe simplification, and glossary generation. Store prompts with context: intended use, constraints (“do not infer identity,” “do not add facts”), examples of good outputs, and known failure modes. When a defect occurs, you should be able to update the prompt and know which assets were generated under the old version.

Rubrics translate acceptance criteria into scoring guides reviewers can apply consistently. Keep rubrics short and observable: “captures instructional intent,” “accurate key terms,” “no privacy leakage,” “chart takeaway included,” “TTS-ready formatting.” Attach “stop-ship” conditions so reviewers don’t negotiate critical issues under deadline pressure. A common mistake is rubrics that are too abstract; include concrete examples from your own content types (math lecture vs coding demo vs discussion-based seminar).

Audit trails are your evidence of compliance and responsible AI use. At minimum, log: source asset ID, tool/vendor, model/version (if available), prompt version, reviewer(s), review outcome, defects found, and final publish date. If you support incident response, add a link to the ticket and remediation notes.

Plan for incidents explicitly. An accessibility defect—missing captions, incorrect transcript for an assessment, harmful alt text—should trigger a predictable response: severity classification, temporary mitigation (unpublish or replace), correction timeline, and a post-incident review that updates checklists, prompts, and sampling rules. This is also where you create a lightweight stakeholder accessibility report: coverage rates, quality metrics, known risks, incidents resolved, and next improvements. Stakeholders don’t need every detail; they need to see that accessibility is measured, owned, and continuously improved.

Chapter milestones
  • Build checklists for captions, alt text, and reading supports
  • Set sampling plans, error budgets, and review roles
  • Handle privacy, consent, and data retention for media and transcripts
  • Establish incident response for accessibility defects
  • Create a lightweight accessibility report for stakeholders
Chapter quiz

1. What is the main shift described in Chapter 5 after AI generates captions, alt text, or reading supports?

Show answer
Correct answer: The work becomes operational: verifying quality, documenting decisions, protecting learners, and improving predictably
Chapter 5 emphasizes operating accessibility as an ongoing system with QA, documentation, and safeguards—not a one-time generation step.

2. Which practice most directly prevents teams from failing due to a lack of shared expectations and post-publication defect handling?

Show answer
Correct answer: Define a shared definition of done, assign review roles, and establish a plan for defects found after publication
The chapter notes teams fail when they lack clear acceptance criteria, role clarity, and a plan for defects after release.

3. How does the chapter recommend balancing review effort with risk and learner impact?

Show answer
Correct answer: Use sampling plans and error budgets so review is proportional to risk
Sampling plans and error budgets are presented as tools to scale review effort based on risk, time, and impact.

4. What is the purpose of making checklists both specific and flexible?

Show answer
Correct answer: To catch common errors (e.g., names, numbers, equations, speaker changes, image intent) while supporting different course types
The chapter calls for checklists detailed enough to catch frequent failures but adaptable to varied course contexts.

5. Why does Chapter 5 emphasize incident response and lightweight reporting together?

Show answer
Correct answer: To ensure defects are handled through a defined path and stakeholders can see progress and evidence without excessive paperwork
The chapter advocates an incident response path so learners aren’t the first to find defects, plus lightweight reporting to communicate progress and compliance evidence efficiently.

Chapter 6: Capstone: Ship an Accessibility-First AI Support Pack

This capstone is where your accessibility-first workflow stops being “good practice” and becomes a ship-ready artifact a team can adopt. You will assemble a complete support pack for one mini-lesson (a 3–7 minute instructional segment), run final QA and publish-ready exports, and write implementation notes for a handoff. You’ll also present outcomes with before/after examples and a small set of metrics, then plan iteration with a backlog and continuous improvement loop.

Think like a production engineer and an accessibility specialist at the same time: your job is not just to generate outputs, but to control risk. AI can accelerate captions, alt text, and reading supports—but it also introduces predictable errors (misheard terms, hallucinated definitions, incorrect speaker labels, overly chatty summaries). Your capstone must show governance: prompts, rubrics, acceptance criteria, human review steps, and an “error budget” that defines what must be fixed before release versus what can wait.

Throughout this chapter, you’ll build three deliverables—(1) captions + transcript package, (2) alt text + one long description, and (3) a reading supports bundle—then package them for an LMS/repository. Finally, you’ll frame your work as a portfolio-ready case study: not “I used AI,” but “I shipped an accessibility-first support pack with measurable quality controls.”

Practice note for Assemble a complete support pack for one mini-lesson: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Run final QA and publish-ready exports: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Write implementation notes for a team handoff: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Present outcomes: before/after examples and metrics: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Plan iteration: backlog and continuous improvement: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Assemble a complete support pack for one mini-lesson: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Run final QA and publish-ready exports: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Write implementation notes for a team handoff: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Present outcomes: before/after examples and metrics: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Plan iteration: backlog and continuous improvement: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Sections in this chapter
Section 6.1: Capstone brief: assets, constraints, and success criteria

Your capstone starts with a brief that makes the work real. Choose one mini-lesson: a short video with at least one visual aid (slide, diagram, or screenshot) and a short accompanying reading (script, handout, or outline). List all assets you will touch: source video, raw audio (if available), slide deck/images, any on-screen text, instructor notes, and a glossary of domain terms if the course already has one.

Next, define constraints. Common ones: tight turnaround (e.g., 2 hours), a target platform (YouTube, Kaltura, Panopto, LMS), and a baseline accessibility standard (WCAG 2.2 AA where applicable; UDL principles for multiple means of representation). Also note learner context: are they novices, multilingual learners, or professionals skimming for reference? This context should influence your reading supports and alt text detail.

Write success criteria as acceptance tests, not aspirations. Example criteria you can reuse:

  • Captions: ≥99% accuracy for key terms/names; no missing sentences; correct speaker identification where needed; reading rate typically ≤20 cps (characters per second) unless a platform-specific guideline differs.
  • Transcript: includes speaker labels, meaningful non-speech audio cues (e.g., [laughter], [music fades]), and matches the final captions.
  • Alt text: describes purpose and essential information, avoids “image of,” avoids subjective judgments, and reflects instructional intent.
  • Reading supports: summary is faithful (no new claims), glossary definitions are brief and correct, plain-language rewrite preserves technical meaning.
  • Governance: every AI-generated artifact has a reviewer, a QA log, and version history.

Common mistake: starting with AI prompts before writing acceptance criteria. Your criteria act as a guardrail when the AI output feels fluent but is wrong. End this brief with your “definition of done” and who must sign off (you, SME, accessibility reviewer).

Section 6.2: Deliverable 1: captions + transcript package with QA log

For the captions package, your goal is publish-ready captions plus evidence that you checked them. Start by generating a first-pass transcript using your preferred tool (ASR, then LLM cleanup). Immediately lock in a terminology list: product names, acronyms, instructor names, and any specialized vocabulary. Feed that list back into your editing pass so the model has a controlled reference.

Edit in two layers: (1) semantic accuracy and (2) captioning conventions. Semantic accuracy means the words match the audio and intent—no paraphrasing, no “helpful” expansions. Captioning conventions include line breaks, punctuation that supports readability, and timing that avoids flashing too quickly. If you can’t edit timing directly, at minimum ensure the text is segmented into caption-friendly chunks.

Create a QA log. This is the artifact that proves your workflow is accessibility-first rather than AI-first. Include:

  • Source info: video title, duration, date, version, tool used.
  • Known risks: accents, crosstalk, background noise, domain terms.
  • Checks performed: term list applied, numbers verified, proper nouns verified, speaker labels verified, non-speech cues added, reading rate spot-checked.
  • Findings and fixes: e.g., “ASR misheard ‘WCAG’ as ‘double-u tag’ (10 instances) — corrected.”

Export formats should match publishing needs: SRT or VTT for captions, and a clean transcript (TXT/DOCX/PDF) for learners who prefer reading. Keep the transcript TTS-friendly: avoid unusual punctuation, keep speaker labels consistent, and ensure headings are meaningful if you format it.

Common mistakes: letting the model “clean up” filler words into different meaning; failing to verify numbers (2.2 vs 2.1), URLs, or code snippets; and forgetting that captions are not just a transcript—they are timed reading supports.

Section 6.3: Deliverable 2: alt text set + long description for one complex visual

Your second deliverable is an alt text set for all visuals in the mini-lesson plus one long description for a complex visual (a chart, process diagram, or multi-step interface screenshot). Start by inventorying visuals and classifying them: decorative, informative, functional (buttons/links), and complex. Decorative items get null alt (empty alt attribute) in HTML contexts; informative and functional must communicate purpose; complex visuals need both succinct alt text and a longer structured description nearby.

Use AI to draft, but you must supply context: “What is the learning objective of this visual?” and “What should the learner be able to do after seeing it?” Provide surrounding narration or slide notes to prevent AI from describing irrelevant details.

For standard informative images, aim for one to two sentences. Example pattern: What it is + what matters. Avoid redundancies like “image of.” For functional images (icons), name the action: “Search” or “Download transcript (PDF).”

For the complex visual, write a long description that is scannable and complete. A practical structure:

  • Title/caption: what the visual represents.
  • High-level takeaway: the main relationship or trend.
  • Ordered breakdown: steps in a flow, axes and key values in a chart, or regions in a screenshot.
  • Exceptions/edge cases: where the visual warns or constrains behavior.

QA the alt text against intent: if a learner only had the text, could they complete the learning task? Common mistakes: over-describing colors or layout while missing the point; inventing data in a chart; and failing to align with on-screen text (if text is already present, don’t duplicate it verbatim—summarize its function).

Include reviewer notes in your QA log: what changed from AI draft to final, and why. This demonstrates engineering judgment, not just output generation.

Section 6.4: Deliverable 3: reading supports bundle (summary, glossary, plain language)

Reading supports are where AI can help the most—and also mislead the fastest. Your bundle should include: a short summary, a glossary of key terms, and a plain-language version of the lesson text (or a simplified companion). Tie each artifact to a learner need: quick review, vocabulary support, and reduced cognitive load.

Start from a single “source of truth”: the final transcript or instructor script. Generate the summary from that text, not from memory, and enforce a “no new claims” rule. A useful governance technique is a traceability check: pick 5–10 sentences from the summary and confirm each is directly supported by the transcript. If any sentence can’t be traced, revise or remove it.

For the glossary, limit to terms that matter for comprehension (usually 6–12 for a mini-lesson). Require definitions to be:

  • Brief (one sentence when possible).
  • Correct in this course’s context.
  • Free of circular definitions (“Accessibility is being accessible”).
  • Consistent with course terminology and prior chapters.

For plain language, do not “dumb down.” Preserve technical meaning while improving clarity: shorter sentences, active voice, defined acronyms, and explicit referents (replace “this” with “this checklist”). Keep structure: headings, bullets, and numbered steps. Ensure TTS readiness: avoid odd punctuation, keep abbreviations expanded on first use, and avoid ambiguous symbols.

Common mistakes: AI summaries that sound polished but overgeneralize; glossaries that hallucinate confident-sounding definitions; and simplifications that remove critical constraints (“always” vs “often”). Your QA should include a quick SME spot-check and a readability pass (e.g., grade level as a signal, not a requirement).

Section 6.5: Packaging for LMS and content repositories (naming and versioning)

Now make it shippable. Packaging is where accessibility work often fails—files exist, but nobody can find, trust, or update them. Create a folder structure that mirrors how teams actually work (lesson-based, with a clear version). Example:

  • Lesson_06_MiniLessonTitle/
  • 01_source/ (original video, slides; read-only)
  • 02_captions/ (SRT, VTT, transcript)
  • 03_alt-text/ (alt-text.csv or doc; long-description.md)
  • 04_reading-supports/ (summary.md, glossary.csv, plain-language.md)
  • 05_QA/ (qa-log.md, checklist.pdf, review-notes.md)
  • CHANGELOG.md

Adopt naming conventions that survive handoffs: include lesson ID, locale, and version. For example: L06_MiniLesson_en-US_captions_v1.1.vtt. If your team uses Git, treat these as version-controlled text assets where possible (VTT, MD, CSV). If not, maintain a CHANGELOG that records what changed, who changed it, and why.

Create publish-ready exports for the LMS: captions uploaded to the video host, transcript linked adjacent to the video, long description placed near the complex visual, and reading supports downloadable and/or embedded below the lesson. Validate that links work and that the LMS doesn’t strip formatting needed for readability (lists, headings).

Final QA before publishing should include at least: playback check with captions on, transcript download/open check, screen reader spot-check for long description placement, and a quick mobile view check. Common mistakes: mismatched caption files (old version uploaded), transcripts not updated after caption edits, and long descriptions buried in a separate file learners won’t discover.

Section 6.6: Portfolio framing: resume bullets, case study, and interview talk track

Your capstone becomes career leverage when you frame it as impact plus rigor. Capture “before/after” examples: a 15–30 second caption segment before edits (with errors highlighted) and after; an AI-drafted alt text vs final alt text; an original paragraph vs plain-language rewrite. Keep examples small and anonymized if needed.

Define lightweight metrics that signal quality without pretending to be perfect science. Examples:

  • Caption accuracy: number of corrected term errors; missing/incorrect speaker labels fixed; timing issues reduced (spot-checked segments).
  • Alt text coverage: % of informative visuals with reviewed alt text; 1 complex visual with long description placed in-context.
  • Reading supports: summary traceability check pass rate (e.g., 10/10 sentences traceable); glossary verified by SME.
  • Cycle time: time from raw video to published support pack, including review steps.

Write 2–3 resume bullets that show ownership and governance. Example patterns: “Shipped…,” “Implemented QA…,” “Reduced errors by…,” “Built a reusable prompt + rubric….” Avoid vague claims like “improved accessibility.” Name the artifacts and acceptance criteria you enforced.

For a short case study, use a clear structure: context → constraints → workflow → deliverables → QA findings → outcomes → next iteration. In interviews, your talk track should highlight engineering judgment: where you did not trust AI, how you validated outputs, and how you designed the process so others can repeat it. End with an iteration plan: a backlog of improvements (e.g., term dictionary expansion, automated reading-rate checks, localization support) and a continuous improvement cadence (monthly audits, error budget review, stakeholder feedback loop).

Chapter milestones
  • Assemble a complete support pack for one mini-lesson
  • Run final QA and publish-ready exports
  • Write implementation notes for a team handoff
  • Present outcomes: before/after examples and metrics
  • Plan iteration: backlog and continuous improvement
Chapter quiz

1. What makes the Chapter 6 capstone “ship-ready” rather than just an example of good practice?

Show answer
Correct answer: It produces a complete, adoptable support pack with QA, exports, and handoff notes
The capstone emphasizes publish-ready artifacts a team can adopt, including QA, exports, and implementation notes.

2. Why does the chapter emphasize thinking like both a production engineer and an accessibility specialist?

Show answer
Correct answer: To control risk by combining reliable delivery processes with accessibility quality
The chapter frames the job as shipping reliably while meeting accessibility needs and reducing predictable AI-related errors.

3. Which set of deliverables is required in the capstone support pack?

Show answer
Correct answer: Captions + transcript package, alt text + one long description, and a reading supports bundle
The chapter specifies three deliverables: captions/transcript, alt text plus one long description, and reading supports.

4. Which approach best demonstrates “governance” for AI-assisted accessibility work in the capstone?

Show answer
Correct answer: Using prompts, rubrics, acceptance criteria, human review steps, and an error budget
Governance is defined as structured controls: prompts, rubrics, acceptance criteria, review steps, and an error budget.

5. How should outcomes be presented to make the capstone portfolio-ready according to the chapter?

Show answer
Correct answer: As a case study showing before/after examples and a small set of metrics tied to quality controls
The chapter calls for before/after examples plus metrics, framed as shipping an accessibility-first pack with measurable quality controls.
More Courses
Edu AI Last
AI Course Assistant
Hi! I'm your AI tutor for this course. Ask me anything — from concept explanations to hands-on examples.