AI In EdTech & Career Growth — Intermediate
Build inclusive AI supports—captions, alt text, and reading tools that work.
This course is a short, technical, book-style playbook for designing and shipping AI-assisted accessibility supports in learning experiences—starting with captions, expanding to alt text, and finishing with reading supports that improve comprehension for everyone. Instead of treating accessibility as a last-minute checklist, you’ll learn how to make it a first-class product requirement: measurable, testable, and repeatable across content types.
You will work through a practical workflow that mirrors real instructional production: assemble a small “source assets pack” (a short video, a few images, and a reading passage), generate first-pass outputs with AI, and then apply human review steps that catch the failures AI tends to introduce—timing drift, missing sound cues, hallucinated visual details, and meaning loss during simplification.
This course is for instructional designers, educators, edtech practitioners, content producers, and career-switchers who want practical, job-ready skills. You don’t need to code, but you should be comfortable working with typical course media (documents, images, and video) and using web-based AI tools. If you’ve ever thought “AI can generate it, but can I trust it?”—this course shows you how to make it trustworthy through process and QA.
Each chapter builds on the prior one in a deliberate sequence. You’ll start with foundations: learner needs, assistive technology touchpoints, and acceptance criteria. Then you’ll move through three production pipelines—captions, alt text, and reading supports—each with concrete standards, prompting patterns, and review rubrics. Finally, you’ll wrap the work in governance so it can survive contact with real timelines, real stakeholders, and real compliance expectations.
By the end, you’ll complete a capstone “Accessibility-First AI Support Pack” you can keep as a portfolio artifact: before/after examples, QA logs, and implementation notes that demonstrate you can ship inclusive features—not just talk about them.
If you want to build inclusive learning faster while maintaining quality, this course will give you a repeatable system you can apply to any module or product sprint. Register free to begin, or browse all courses to compare paths in AI and edtech career growth.
Learning Experience Architect & Applied AI Accessibility Specialist
Sofia Chen designs accessible learning systems for higher ed and workforce training, blending UDL, WCAG-aligned design, and practical AI workflows. She has led captioning and content accessibility programs across LMS ecosystems and helps teams ship measurable, inclusive improvements without slowing delivery.
Accessibility-first content production treats supports like captions, alt text, and reading aids as core learning features—not optional “compliance tasks” added at the end. When you introduce AI into this work, the goal is not simply to automate; it is to build a workflow that is faster and more dependable than a manual-only process. That requires engineering judgment: knowing which parts AI can draft well, which parts need strong constraints, and where human review is non-negotiable.
In this chapter, you will map learner needs to specific supports (captions, alt text, reading tools), define quality and risk boundaries for AI assistance, and set acceptance criteria inspired by UDL and WCAG. You will also scope a small pilot project, assemble a “source assets pack” that makes AI outputs more accurate, and choose tool categories to create an end-to-end workflow (draft → edit → QA → publish).
Throughout, the mindset is practical: you are designing a production system. Your deliverables are not only accessible outputs, but also repeatable prompts, rubrics, and checks that let a team produce consistent results across modules, instructors, and formats.
Practice note for Map learner needs to supports: captions, alt text, reading tools: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.
Practice note for Define quality and risk: where AI helps and where it fails: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.
Practice note for Set acceptance criteria: UDL + WCAG-inspired checks: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.
Practice note for Create your project scope and source assets pack: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.
Practice note for Choose tool categories and plan the workflow: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.
Practice note for Map learner needs to supports: captions, alt text, reading tools: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.
Practice note for Define quality and risk: where AI helps and where it fails: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.
Practice note for Set acceptance criteria: UDL + WCAG-inspired checks: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.
Practice note for Create your project scope and source assets pack: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.
Practice note for Choose tool categories and plan the workflow: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.
Accessibility-first work begins by rewriting the question from “Can we add captions later?” to “What learning supports must ship with the lesson for it to be usable by all intended learners?” Treat accessibility as a product requirement with acceptance criteria, owners, and test steps. This shift matters because retrofits are where errors hide: captions drift out of sync, alt text becomes generic, and simplified readings accidentally change meaning.
Practically, define accessibility requirements at the same time you define content requirements. For an instructional video, that usually means: accurate captions, an editable transcript, speaker identification where helpful, and text that can be reused for study guides. For images and diagrams, it means purposeful alt text and (when needed) longer descriptions. For reading supports, it means summaries, glossaries, and text formatted to work with text-to-speech (TTS) and screen readers.
Common mistake: equating “AI-generated” with “done.” AI can produce plausible text that looks correct, especially in captions and summaries, but small errors can harm comprehension or introduce misinformation. Another mistake is failing to budget time for QA. Accessibility-first AI is successful when the workflow includes explicit review steps and a definition of “publishable.”
Outcome for this section: you should be able to state, in one paragraph, the accessibility features your learning product will include and what “pass” looks like for each feature.
Supports are not generic; they exist to remove specific barriers. Build a small set of learner personas and identify the “touchpoints” where assistive technology or alternative formats intersect with your materials. Start with 3–5 personas that reflect your audience, not an abstract checklist.
Examples of practical personas: (1) a learner who is Deaf or hard of hearing and relies on captions and transcripts; (2) a learner with low vision using a screen reader and keyboard navigation; (3) a learner with ADHD who benefits from chunking, headings, and summaries; (4) a multilingual learner who needs plain language, glossaries, and consistent terminology; (5) a learner in a noisy environment using captions as a convenience feature. Each persona maps directly to supports you will produce.
Now map assets to touchpoints. Video touches captions, transcript, and sometimes audio description. Slide decks touch reading order, alt text, and exported PDFs. Interactive labs touch keyboard access and instructions that are readable by screen readers. The point is to know where a breakdown will occur: a missing label, a caption that omits “not,” a summary that removes a key constraint.
Outcome for this section: a simple table in your project notes that lists persona → barrier → support → where it appears in your content pipeline (video editor, CMS, LMS, PDF export, etc.). This table will later drive your acceptance criteria and QA checklist.
Universal Design for Learning (UDL) is useful because it connects supports to learning outcomes rather than compliance alone. In production terms, UDL gives you categories of support you can build into your workflow: multiple means of representation (captions, transcripts, alternative text), action and expression (downloadable notes, copyable transcripts), and engagement (clear structure, reduced cognitive load, choice of format).
To make this measurable, define outcomes that your team can test. For captions, measurable outcomes include: correct terminology, correct numbers and symbols, and synchronization that allows following along without cognitive strain. For alt text, outcomes include: a learner can answer “what is this image for?” without seeing it, and the description matches the instructional intent. For reading supports, outcomes include: the simplified version preserves constraints and definitions, and the glossary covers the terms that block comprehension.
Write acceptance criteria that are observable. Instead of “captions are good,” use criteria like: “All spoken instructional steps appear in captions; no omissions of negation; domain terms match the lesson glossary; and speaker changes are indicated when it affects meaning.” For reading supports, include: “summary includes prerequisites, warnings, and edge cases; simplification maintains equations, variable names, and must/should distinctions.”
Outcome for this section: a draft of UDL-aligned acceptance criteria for your three support types—captions/transcripts, alt text, and reading aids—written as checks someone else could verify.
Designing an accessibility-first AI workflow requires a clear capability map: what each model type is good at, where it fails, and what inputs reduce risk. Four common capability buckets cover most learning workflows.
ASR (automatic speech recognition) is strong for generating a first-pass transcript and timecoded captions, especially with clean audio and known vocabulary. It fails with accents, crosstalk, low-quality mics, and domain-specific terms (chemical names, code, math notation). Risk control: provide a vocabulary list, speaker names, and reference materials; always plan a human edit pass for instructional content.
Vision models help draft alt text and identify elements in images (charts, UI screenshots). They fail when context is required: what matters in this diagram for this lesson, or which part of a chart supports the claim being taught. Risk control: give the model the surrounding paragraph, learning objective, and any labels/legend text; require a human to confirm intent and avoid hallucinated details.
Summarization can produce study aids and module overviews quickly. It fails by omitting constraints, flattening nuance, and inventing “helpful” additions. Risk control: constrain the summary format (bullets, required sections like “Warnings” and “Key terms”), and verify against the source.
Rewriting/simplification supports plain-language versions and multilingual learners. It fails by changing technical meaning, altering modal verbs (must/should), or “simplifying away” steps. Risk control: use guarded prompts (preserve terminology list; do not change variable names), and compare the simplified output to the original with targeted checks.
Outcome for this section: a workflow plan that assigns AI to drafting and transformation tasks, but assigns humans to intent verification, terminology control, and final QA.
Accessibility outputs fail in predictable ways. You can catch most issues by evaluating five quality dimensions. Treat these as your editing rubric for AI-assisted work.
Accuracy: Are facts, numbers, names, and terms correct? Captions often fail on proper nouns, code tokens, and homophones. Alt text can be “confidently wrong” about what an image contains. Reading supports can introduce subtle factual errors. Mitigation: reference a glossary, use source-of-truth documents, and spot-check against the original media.
Completeness: Does the output include everything a learner needs? Captions that omit side comments may be fine—unless the side comment contains the actual instruction (“Don’t click save yet”). Alt text that describes visuals but misses the purpose (“This diagram shows the causal relationship…”) is incomplete. Summaries that skip warnings and prerequisites are incomplete. Mitigation: require coverage of learning objectives, steps, and constraints.
Intent: Does the output preserve what the instructor meant? This is where AI rewriting can be risky. If an instructor says “this is an approximation,” the simplified version must keep that uncertainty. Mitigation: include a “preserve intent” rule and verify the output answers the same assessment questions as the source.
Tone: Is the language respectful, neutral, and appropriate for learners? Captions should not “clean up” dialects in a way that changes voice or identity; summaries should not become judgmental (“obviously”). Mitigation: style guidance plus quick tone review.
Usability: Is it readable and usable in assistive tech? Captions need line length limits and timing; transcripts need headings and speaker labels; TTS-ready text needs clean punctuation and expanded acronyms when appropriate. Mitigation: format checks and platform previews.
Outcome for this section: a one-page rubric (even if informal) that editors use for every AI-generated caption set, alt text batch, and reading support artifact.
Governance is what makes an accessibility-first AI workflow safe, repeatable, and defensible. It does not need to be heavy, but it must be explicit. Start with three building blocks: documented prompts, defined human review steps, and an audit trail of decisions.
Human-in-the-loop means you decide which steps require approval before publishing. A practical default: humans must review (1) captions for instructional accuracy and timing, (2) alt text for intent and non-hallucination, and (3) any simplification that could change meaning. The “human” may be the instructor, an editor, or a trained QA reviewer, but the role must be assigned.
Documentation includes: your prompt library (with versioning), your acceptance criteria/rubric, and a checklist used at QA time. Add an “error budget” per artifact type: what level of minor issues is acceptable before rework? For example, you might allow minor punctuation issues in transcripts but allow zero errors in safety warnings, chemical dosages, or assessment instructions.
Source assets pack: reduce AI error by standardizing inputs. For a module, assemble the script or outline, slide deck, glossary of terms, speaker names, reference links, and any diagrams with labels. This pack is also what reviewers use to verify outputs.
Tool categories and workflow: plan the pipeline: ingestion (video/audio/image), generation (ASR/vision/text), editing (caption editor, document editor), QA (automated checks + human review), and publishing (LMS/CMS). Choose tools based on export formats (SRT/VTT, accessible PDF, HTML), collaboration needs, and privacy constraints.
Outcome for this section: a scoped pilot plan (one lesson or one module) with named reviewers, a checklist, a storage location for source assets, and a clear definition of “ready to publish.”
1. What does “accessibility-first content production” mean in this chapter?
2. According to the chapter, what is the primary goal of introducing AI into accessibility-focused production?
3. Which approach best reflects the chapter’s guidance on where AI can fail and what to do about it?
4. What is the purpose of setting acceptance criteria inspired by UDL and WCAG in this chapter?
5. Which workflow best matches the end-to-end production system described in the chapter?
Captions and transcripts are the most “visible” accessibility layer in instructional video: they directly affect comprehension, note-taking, search, and study workflows. AI speech-to-text (ASR) can generate a strong first pass, but publishing that first pass without standards, editing, and QA is how small errors become learning barriers. This chapter walks an end-to-end workflow you can repeat: generate captions/transcripts, fix timing and speakers, improve terminology with custom vocabulary, run a QA checklist, publish in multiple formats, and document decisions so your team can scale.
The goal is engineering judgment, not perfectionism. You will set measurable targets (accuracy, reading rate, line length), decide what gets human review, and maintain a correction log that steadily improves future runs. Think of captions as production software: you need acceptance criteria, a build pipeline, and release notes.
A practical workflow looks like this: (1) prepare audio to help ASR; (2) run ASR to create a first-pass caption file plus a transcript; (3) post-edit for correctness, timing, and accessibility cues; (4) apply custom vocabulary for names and domain terms; (5) QA using sampling and error categories; (6) publish SRT/VTT and a readable transcript; (7) document decisions and store reusable templates (prompts, rubrics, and checklists).
The rest of the chapter breaks this pipeline into concrete standards and tactics you can apply immediately.
Practice note for Generate first-pass captions and transcripts from audio/video: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.
Practice note for Fix timing, speaker labels, and non-speech events: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.
Practice note for Improve terminology and names with custom vocabulary: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.
Practice note for Run a caption QA checklist and publish in multiple formats: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.
Practice note for Document decisions and build a repeatable template: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.
Practice note for Generate first-pass captions and transcripts from audio/video: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.
Practice note for Fix timing, speaker labels, and non-speech events: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.
Practice note for Improve terminology and names with custom vocabulary: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.
Before you generate anything, define what “good” looks like. Caption quality is not subjective: it can be measured and audited. For learning content, you should establish an accuracy target (for example, 99%+ for key instructional segments, and a lower threshold for informal discussion), plus formatting constraints that reduce cognitive load.
Accuracy targets. Use two tiers: (1) critical correctness for terminology, numbers, formulas, commands, and assessment prompts; (2) general correctness for conversational filler. AI ASR errors concentrate on names, acronyms, and domain terms—exactly what learners need most—so your rubric should weight those words more heavily than “um” or repeated phrases. If you can’t afford full human review, require full review of critical segments (demos, step-by-step instructions, safety, grading criteria) and sample the rest.
Line length and segmentation. Captions should break at natural linguistic boundaries (phrases and clauses), not mid-word or mid-idea. A practical rule is 1–2 lines per caption, keeping lines short enough for typical players. Many teams aim for roughly 32–42 characters per line depending on font and platform. Avoid placing a single word on a line unless it is intentionally emphasized and brief.
Reading rate. Learners read at different speeds, and captions compete with visuals and slides. Keep a conservative reading rate; many captioning guides target around 140–180 words per minute for instructional content, with exceptions for short bursts. When speech is too fast, do not “summarize” silently; instead, consider improving the source audio (Section 2.2) or planning pauses in the recording. If you must edit, prioritize clarity: remove repeated false starts, but do not delete meaning, steps, or qualifiers.
Common mistakes: (1) using perfect word accuracy but poor segmentation (hard to follow); (2) enforcing strict character limits by deleting key information; (3) allowing inconsistent punctuation that changes meaning (e.g., “Let’s eat, students” vs “Let’s eat students”). Your standard should explicitly state what you will preserve verbatim and where light cleanup is acceptable.
Caption quality starts upstream. If your audio is noisy, distant, or rushed, AI will produce plausible-looking text that is wrong in ways that are hard to detect. Improving audio often saves more editing time than any prompt trick.
Noise and room sound. Reduce constant noise sources (fans, HVAC, laptop hum). If you can’t control the room, record with a close mic and consider light noise reduction in an editor. Avoid aggressive noise gates that clip syllables; clipped consonants (“t”, “k”, “p”) become ASR substitutions that change meaning in technical terms.
Mic distance and consistency. Use a lapel or headset mic when possible and keep distance consistent. Moving your head away from the mic produces fluctuating volume that confuses diarization (speaker separation) and timing alignment. If recording screen-capture, do a 10-second test, listen back on headphones, and check that the waveform is not peaking or whisper-quiet.
Pacing for learning. Caption readability depends on pacing. Speak slightly slower than conversational speed and insert deliberate micro-pauses between steps (“First… Next… Then…”). Those pauses help learners and also help ASR segment captions cleanly. If multiple speakers are present, ask them not to overlap. Overlap is one of the highest-cost caption problems because even humans struggle to separate the content.
Practical prep checklist. (1) Record 10 seconds; verify levels; (2) ensure the mic is within a hand’s width of the mouth (for lapel/headset, follow manufacturer guidance); (3) reduce background noise; (4) capture separate tracks per speaker when possible; (5) keep a list of names/acronyms you will say, and say them clearly the first time.
The payoff: better first-pass captions and fewer “mystery” errors. When you later build a repeatable template, your recording guidelines are part of the accessibility workflow—not an optional best practice.
ASR engines often allow custom vocabulary (phrase lists, hints, or boosted terms). Separately, LLMs can help you post-edit transcripts, but you must constrain them so they don’t “improve” the meaning. The safest approach is: use ASR vocabulary features to prevent errors, then use an LLM for targeted edits with strict rules and diff-based review.
Build a terminology package. Start a simple table: preferred term, common mishearing, and context. Include product names, course-specific jargon, people’s names, and acronyms expanded on first use. Example entries: “UDL (Universal Design for Learning)”, “WCAG (Web Content Accessibility Guidelines)”, “Khan Academy”, “Nguyễn”, “PyTorch”, “SQL”. Feed this into your ASR’s custom vocabulary tool where supported. If not supported, keep it as a post-edit reference for human editors.
Prompting for post-edit. If you use an LLM to clean a transcript, use a prompt that forbids paraphrase and requires minimal edits. Example instruction set: keep wording verbatim; only fix obvious recognition errors; preserve numbers and units; do not change meaning; flag uncertain terms instead of guessing; output a change list. This keeps the model from rewriting content into “nicer” prose that no longer matches the audio.
Post-edit method that scales. (1) Run a terminology pass: search for known mishearings and fix them; (2) run a numbers/units pass: dates, decimals, code versions, measurements; (3) run a proper-noun pass: names, titles, citations; (4) run a punctuation pass: add sentence boundaries to aid reading and TTS. Use the slide deck or script as a grounding reference, but do not force the transcript to match slides if the instructor said something different.
Common mistakes: letting an LLM “summarize” within captions; correcting to the wrong but plausible term (“cache” vs “cash”); standardizing acronyms incorrectly; and silently expanding jargon so captions no longer match what learners hear. Your rubric should require that captions reflect the spoken content, while transcripts may optionally include clarifying expansions in brackets if your policy allows it.
Instructional videos increasingly include panels, interviews, and screen-share coaching. Learners who rely on captions need two additional layers beyond words: speaker identification and non-speech information that is essential for understanding.
Speaker labels. Use consistent labels (e.g., “Instructor:”, “TA:”, “Student:”) and keep them stable across the whole course. Label the first caption when a new speaker begins, then again when confusion could occur. Avoid over-labeling every line if a single speaker continues; the goal is clarity, not clutter. If your platform supports positioning, keep labels readable and avoid covering important on-screen text.
Diarization pitfalls. AI diarization can mis-assign speakers when voices are similar, when audio levels change, or when there is crosstalk. Treat diarization as “assistive metadata,” not truth. A fast verification technique is to spot-check each speaker transition and any segment where the content references “I” or “you” (these pronouns often reveal mis-attribution).
Meaningful sound cues. Captions should include non-speech events that matter for comprehension: [laughter] when it signals tone, [applause] when it marks an achievement, [door slams] if it interrupts, [music] if it covers speech, and crucially, instructional sounds such as [timer beeps], [error chime], or [notification sound] during demos. Do not caption irrelevant ambient noise; do caption events that affect meaning, pacing, or the learner’s ability to follow steps.
Timing and synchronization. Good timing is an accessibility feature. Ensure captions appear when the words are spoken and disappear when the phrase ends. Fixing timing often means splitting long captions, aligning start times to the first spoken syllable, and ensuring that on-screen reading is possible without racing the learner. For demos, align captions to actions (“Click Settings…”) so the learner sees the instruction when the click happens.
Common mistakes: labeling speakers inconsistently (“Host” vs “Instructor”), omitting critical cues like [silence] during a pause that signals “think time,” and using editorial cues (“[jokes]”) instead of observable events. Keep cues objective and meaningful.
Publishing captions is not a single file export; it is a delivery strategy. Different platforms prefer different formats, and learners benefit from both time-synced captions and a navigable transcript.
SRT vs VTT. SRT is widely supported and simple, but WebVTT (VTT) is often better for web delivery because it supports additional metadata and is the standard for HTML5 video. If your LMS or video host supports VTT, use it as the primary web format and keep SRT as a compatibility artifact for editors and legacy tools. Always validate files after export; small formatting issues (timestamps, commas vs periods, missing blank lines) can break playback.
Transcript formats. Provide a clean transcript in an accessible document format (HTML page, properly structured PDF, or a well-styled document). A transcript is more than captions without timecodes: it should be readable, searchable, and scannable with headings if you include sections. Include speaker names and meaningful sound cues. If you have timecodes, consider a “click-to-jump” interactive transcript when the platform supports it—this benefits all learners, not only those using captions.
Embedding in the LMS. Verify that captions are enabled by default or that the control is discoverable. If the LMS embeds a third-party player, test in a student view and on mobile. Ensure keyboard accessibility: learners should be able to toggle captions and navigate the transcript without a mouse. If you provide a separate transcript link, place it near the video with clear labeling (“Transcript (HTML)”).
Multiple outputs from one source. Treat your edited caption file as the “source of truth.” From that, generate: VTT for web playback, SRT for interchange, and a transcript (with or without timecodes) for reading and study. This reduces drift where the transcript says one thing and captions say another. Store versions with clear naming: course_module_lesson_v1_en.vtt, plus a change log entry for each publish.
Common mistakes: exporting only one format and discovering late that the platform needs another; providing a transcript that is just a raw ASR dump without paragraphs; and failing to test playback after upload (some players silently drop malformed files).
Quality assurance is where accessibility-first teams distinguish themselves. The objective is not to “trust the model” or “fix everything manually,” but to implement a repeatable QA workflow with clear acceptance criteria and a feedback loop.
Sampling strategy. If you cannot review 100% of runtime, sample intelligently: (1) always review critical segments (definitions, instructions, assessments, safety, demos); (2) randomly sample at least 3–5 short clips per video (beginning, middle, end); (3) sample any segment containing dense terminology, rapid speech, or multiple speakers. Track the sample coverage so you can justify release decisions.
Error taxonomy. Classify issues so you can prioritize and improve: (1) meaning errors (wrong term, missing “not,” incorrect number)—highest severity; (2) timing errors (late/early captions, unreadably short durations); (3) formatting errors (line breaks, punctuation, speaker labels); (4) accessibility cue errors (missing sound cues, unclear speaker changes); (5) style consistency (acronym expansion, capitalization). Assign an “error budget” per minute (e.g., zero meaning errors allowed; limited minor formatting issues) aligned to your standards from Section 2.1.
Correction log. Maintain a lightweight log with: video ID, timestamp, error type, original text, corrected text, root cause (noise, fast speech, missing vocabulary), and prevention action (add term to custom vocabulary; update recording guidance; adjust prompt). Over time, this becomes your governance artifact: it tells you which content types need more human review and which vocabulary items should be preloaded.
Release checklist. Confirm captions toggle on/off, sync is acceptable, speakers are identified, non-speech cues are present where meaningful, and exports validate. Record who reviewed, what sampling was performed, and what exceptions were accepted. The practical outcome is a repeatable template: a standard operating procedure with prompts, rubrics, and a QA checklist that supports consistent, auditable accessibility.
Common mistakes: relying on a single end-to-end “accuracy score” instead of reviewing meaning-critical segments; fixing errors without logging them (so they recur); and allowing last-minute edits that introduce drift between captions, transcript, and video versions. Treat QA as part of publishing, not a final optional step.
1. Why does the chapter warn against publishing AI-generated captions/transcripts without editing and QA?
2. Which sequence best matches the end-to-end workflow described in the chapter?
3. What is the main purpose of using a custom vocabulary in the captioning process?
4. According to the chapter, where should human review be focused to reduce learning harm?
5. What does the chapter mean by treating captions like “production software”?
Alt text is often treated as a compliance checkbox: “describe what’s in the picture.” In learning content, that mindset produces alt text that is either too vague (“a chart”) or too literal (“a blue line going up”), neither of which helps a learner build understanding. Accessibility-first AI changes the goal: alt text should deliver equivalent learning value for learners who can’t see the image, without overwhelming them or duplicating the surrounding text.
This chapter turns alt text into a teachable, repeatable production workflow. You will classify images to decide whether alt text is required, generate drafts with context-aware prompts, and apply engineering judgment for complex visuals like charts, diagrams, equations, and maps. You’ll also learn how to QA for bias, privacy, and over-description, and then consolidate decisions into a style guide and rubric so your course stays consistent across lessons and authors.
A practical framing: alt text is a micro-explanation that sits at the intersection of instruction and interface. It must be accurate, minimal, and aligned to the learning objective—especially when AI is involved. AI can accelerate drafting, but it can also hallucinate details, inject assumptions, or expose sensitive information. Your workflow should therefore include: (1) image classification, (2) context-first prompting, (3) pattern-based drafting, (4) long descriptions where needed, and (5) a review rubric with acceptance criteria.
Practice note for Classify images and decide when alt text is required: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.
Practice note for Draft alt text with AI using context-aware prompts: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.
Practice note for Handle complex visuals: charts, diagrams, equations, maps: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.
Practice note for Review for bias, privacy, and over-description: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.
Practice note for Create an alt text style guide and rubric: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.
Practice note for Classify images and decide when alt text is required: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.
Practice note for Draft alt text with AI using context-aware prompts: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.
Practice note for Handle complex visuals: charts, diagrams, equations, maps: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.
Practice note for Review for bias, privacy, and over-description: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.
Start by deciding whether the image needs alt text at all. This is not a writing step; it’s a content design step. A simple decision tree prevents two common errors: adding noisy alt text for decorative images (which creates cognitive load for screen reader users), and omitting essential information from informative or functional images.
Decorative images do not contribute meaning. Examples: background textures, purely aesthetic stock photos, divider icons, and repeated branding marks. Best practice is to use empty alt text (alt="") so assistive tech can skip it. If an image is decorative but still announced because of the UI, fix the implementation rather than writing descriptive filler.
Informative images carry learning content: a diagram of the water cycle, a photo demonstrating a lab setup, a chart showing trends, or a screenshot showing where to click. These require alt text that communicates the instructional point. If the same information is already stated in adjacent text, alt text can be shorter (or may be redundant), but you still need to ensure non-visual users aren’t missing information that sighted learners get “for free.”
Functional images trigger an action: an icon button, a linked image, an image-based control. The alt text should name the action, not the appearance: “Download worksheet (PDF)” rather than “down arrow icon.” For a linked logo, the alt text should be the destination or purpose (“Home” or “Company homepage”), not “Company logo.”
This classification also sets up how you handle complex visuals: if an image carries dense data (chart, map), you may need both short alt text and a structured long description. Don’t force everything into one cramped sentence.
AI can describe pixels; it cannot automatically infer instructional intent. Context-first prompting supplies what the model lacks: why the image exists in the lesson, who the learner is, and what level of detail is helpful. Without context, AI tends to list visual details (colors, positions) rather than teach the concept (relationships, comparisons, steps).
Use a prompt template that always includes: (1) learning objective, (2) audience, (3) surrounding text or narration, (4) what to emphasize, and (5) constraints (length, tone, avoid guessing). Example prompt you can reuse in a pipeline:
Also specify the function of the image within the lesson: attention getter, example, evidence, step-by-step procedure, or assessment support. That single line often changes the best alt text from “Photo of…” to “Example of…” or “Step 2 shows…”
Engineering judgment matters: if the lesson already defines terms, keep alt text lean; if the image introduces a new concept (first time learners see a schematic), allow more explanation or add a long description. Treat prompts as governed assets: version them, document approved templates, and align them to your course outcomes so authors don’t improvise wildly.
Most instructional images work best with concise alt text—typically 50–150 characters—because screen reader users should not be forced through a paragraph when a sentence will do. Concision does not mean shallow; it means selecting the teaching point. Use patterns to keep output consistent across a course.
Reliable short patterns include:
Common mistakes are (1) starting every alt text with “Image of,” (2) listing irrelevant details (“blue button, top right”), and (3) restating the caption verbatim. Instead, aim for: what it is, what it shows, why it matters—often in one sentence.
When should you expand beyond 150 characters? Expand when the image contains multiple labeled parts needed for comprehension (a biology diagram), dense data (a multi-series chart), spatial relationships critical to the task (a geometry proof diagram), or instructions that must be followed precisely (safety steps). In those cases, keep the alt text short (“Diagram of the heart with labeled chambers; long description follows.”) and move detail into a long description or nearby text. This keeps navigation efficient while still providing full access.
Complex visuals—charts, diagrams, equations with annotations, maps—often cannot be translated into a single alt sentence without losing meaning. The accessibility-first approach is to provide: (1) a short alt text that identifies the visual and its main takeaway, and (2) a long description that delivers the underlying data or structure in a navigable format.
Long descriptions work best when structured. Avoid a wall of text. Use headings or ordered lists in nearby content (or an adjacent “Description” expandable panel) so learners can skim. A practical structure for complex visuals is:
Charts: Provide axes labels, units, time ranges, and the main trend(s). If the chart supports calculation, include a small table of values or the dataset in text. Diagrams: Describe components and relationships (“A connects to B via…”, “inputs flow into…”). Equations: Ensure they’re in accessible math markup where possible; if not, provide a text math version and define symbols. Maps: State the region, the variable mapped, the legend categories, and the spatial pattern (“higher in the northeast, lower along the coast”).
AI is helpful for first drafts of long descriptions, especially to extract visible labels and propose a structure. But require a verification step: confirm labels, numbers, and relationships against the source data. If the chart was generated from a spreadsheet, prefer generating the long description from the underlying data rather than from an image interpretation. That shift—data-first instead of pixels-first—reduces hallucinations and improves precision.
Alt text is a high-risk surface for AI errors because it sounds authoritative. Your review process should explicitly check for failure modes that appear frequently in AI-generated descriptions.
Hallucinations: The model may invent labels, numbers, or relationships (“the red line peaks at 50%”) even when the chart is unreadable or ambiguous. Mitigation: instruct the model to state uncertainty (“Text is unclear”) and require human verification for any quantitative claim. In a governed workflow, set an “error budget”: for example, zero tolerance for incorrect numbers in charts and zero tolerance for invented UI labels in screenshots.
Assumptions and mind-reading: AI may infer intent or causality (“sales increased because of the campaign”) when the chart only shows correlation. It may also infer what someone is doing or feeling (“a happy student”). Mitigation: keep descriptions observable and learning-relevant; avoid attributing emotions, motivations, or causes unless the lesson text explicitly states them.
Sensitive traits and privacy: Do not identify people by race, disability, religion, immigration status, or other sensitive characteristics. Do not guess age, gender identity, or medical conditions. Also avoid doxxing: names on badges, emails in screenshots, student faces in classroom photos, home addresses on forms. Mitigation: redact at the source when possible; otherwise write alt text that omits identifying details (“Instructor demonstrates pipetting at a lab bench”) and ensure your media policy covers consent and anonymization.
Over-description: Listing colors, clothing, decorative objects, or background scenery can bury the teaching point. Include such details only when instructionally necessary (e.g., “highlighted in yellow” if color is the only indicator—though better is to fix the visual to not rely on color alone).
These checks are not merely editorial; they are part of responsible AI governance. Treat alt text as learner-facing instructional content with the same QA rigor as explanations and assessments.
Consistency is what turns good alt text into a scalable practice. A course module may contain dozens of images produced by different authors using different AI prompts. Without a rubric and style guide, learners experience a patchwork: some alt text is verbose, some is empty, some repeats captions, and some introduces new terminology. Build a module-level standard and enforce it in review.
A practical review rubric (score or pass/fail) can include:
Turn the rubric into a style guide with “house rules,” such as: avoid “image of,” prefer present tense, include numbers only when verified, define acronyms only if not defined elsewhere, and standardize how you describe common visuals (“Screenshot: …”, “Diagram: …”, “Chart: …”). Include examples from your own course so authors can pattern-match.
Finally, operationalize the workflow: store approved prompts, track changes to alt text like any other curriculum asset, and require human sign-off for complex visuals. When AI is used, keep an audit trail: prompt, model, date, and reviewer. This governance may feel heavy at first, but it prevents small accessibility regressions from compounding across a module—and it ensures your alt text does what it should in learning: teach.
1. What is the chapter’s main shift in purpose for alt text in learning content?
2. Why can “a chart” and “a blue line going up” both be ineffective alt text in instructional materials?
3. Which workflow best matches the chapter’s recommended production process for alt text when AI is involved?
4. What is the key engineering judgment needed for complex visuals like charts, diagrams, equations, and maps?
5. Which risk is specifically highlighted as a reason AI-generated alt text must be reviewed with a rubric?
Reading supports are not “easier content.” They are alternative pathways to the same learning goals. In accessibility-first production, you design supports so learners can enter the material at the right level of complexity, while preserving the author’s intent, disciplinary precision, and assessment alignment. AI can help you produce these supports at scale—leveled summaries, previews, plain-language rewrites, glossaries, and TTS-ready text—but only if you treat AI as a drafting tool with explicit constraints and rigorous verification.
This chapter focuses on a workflow mindset: generate supports, validate fidelity, and ship with guardrails. You will practice engineering judgment: deciding what must remain unchanged (definitions, formulas, claims, citations, terminology), what can be simplified (sentence structure, ordering, redundancy), and what must be added (signposting, examples, non-examples, pronunciations) to reduce cognitive load. You’ll also build a QA loop: side-by-side checks between source and AI output, uncertainty flags, and small-scale learner testing to catch subtle meaning drift.
As you work through the sections, keep one principle in view: every support should be reversible. A learner who reads a summary or simplified passage should be able to return to the original and recognize the same ideas, not a different argument. This is the core of “simplify, support, preserve meaning.”
In the next sections, you’ll build the reading-support toolbelt: structure and chunking, summarization frameworks, constrained simplification prompts, vocabulary scaffolds, TTS-ready authoring, and fidelity checks.
Practice note for Create leveled summaries and previews without losing core ideas: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.
Practice note for Generate glossary entries and concept scaffolds: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.
Practice note for Rewrite for plain language and readability targets: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.
Practice note for Prepare text for TTS and screen readers (structure and punctuation): document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.
Practice note for Validate fidelity with side-by-side checks and learner testing: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.
Practice note for Create leveled summaries and previews without losing core ideas: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.
Practice note for Generate glossary entries and concept scaffolds: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.
Practice note for Rewrite for plain language and readability targets: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.
Reading supports begin with structure, not paraphrase. Many accessibility failures happen because content is “correct” but presented as an uninterrupted wall of text, forcing learners to hold too much in working memory. AI can help you reorganize content into chunks, but your job is to decide the pedagogical skeleton: what is the sequence of ideas, what is the minimum set of headings, and where should the learner pause to self-check?
Start by extracting a content outline from the source: main claim, sub-claims, procedures, examples, and exceptions. Then ask AI to propose headings that reflect meaning, not decoration. Good headings are action-oriented (“Identify variables in the experiment”) or concept-oriented (“Why correlation is not causation”). Poor headings restate vague topics (“Introduction,” “More information”).
Common mistakes: (1) chunking by length instead of meaning, which splits definitions from their constraints; (2) adding too many questions, increasing load; (3) headings that introduce new claims not in the original. Acceptance criteria should include: every heading maps to an idea in the source, and every question is answerable from nearby text without inference leaps.
Workflow tip: keep a “structure-only pass” separate from simplification. First restructure and preview; then simplify language inside each chunk. This separation makes fidelity checks easier because you can verify one kind of change at a time.
Summaries are among the highest-risk AI outputs because they compress meaning. To reduce risk, summarize with an explicit framework and require the model to ground each bullet in the source. A practical approach is to produce three layers that serve different learner needs: Key Points (what the text says), Takeaways (why it matters), and Action Steps (what to do next). This aligns with UDL by offering multiple ways to engage and plan.
Define “key point” narrowly: a claim, definition, or procedure stated in the source. Define “takeaway” as an interpretation that must be clearly marked as such and still consistent with the source. Define “action steps” as study actions or process steps explicitly supported (e.g., “review the formula,” “try the worked example”), not invented assignments.
Engineering judgment: in technical domains, you may need to preserve exact definitions, thresholds, or conditions. Tell AI which statements are “must-keep exact,” such as safety warnings, legal language, or assessment-critical definitions. Another high-value pattern is to ask for “what is not covered” in the preview to prevent overgeneralization.
Common mistakes: summaries that swap causation for correlation, omit exceptions (“only if”), or replace a precise term with a near-synonym that changes meaning (e.g., “accuracy” vs. “precision”). Your acceptance criteria should include a coverage check (major sections represented) and a no-new-claims check (every claim traceable to the source).
Simplification is not dumbing down; it is reducing unnecessary complexity while retaining disciplinary accuracy. AI is helpful when you specify constraints that prevent meaning drift. The best prompts treat simplification as a controlled transformation with explicit “do not change” lists.
Start by selecting a target: grade-level band, CEFR level, or a readability metric your organization uses (e.g., shorter sentences, fewer embedded clauses). Then specify terms to keep—domain vocabulary that learners must learn and that assessments may require. If a term is advanced but essential, keep it and add a brief inline explanation or link to a glossary entry rather than replacing it.
Common mistakes: (1) replacing a technical term with a broader everyday word (e.g., “work” in physics); (2) removing hedges that signal uncertainty (“may,” “suggests”); (3) rewriting examples into different scenarios, accidentally changing what is being illustrated. A good QA habit is to highlight all changed nouns/verbs and scan for conceptual substitutions.
Practical outcome: a simplification pipeline that produces consistent, leveled text while maintaining assessment alignment. You should be able to point to a prompt template, a list of protected terms, and an error budget (e.g., “0 tolerance for changed numeric values; low tolerance for altered modality; moderate tolerance for sentence reordering”).
Vocabulary support is where AI can dramatically increase learner access—especially for multilingual learners and novices—if you treat it as concept scaffolding rather than a dictionary dump. A useful glossary entry includes: a learner-friendly definition, the technical definition (if different), pronunciation guidance when needed, an example, a non-example, and common confusions.
Generate glossary candidates by having AI extract high-utility terms: those that appear frequently, carry heavy conceptual load, or are prerequisites for later lessons. Then decide which are teach terms (must be learned) versus support terms (help comprehension but not assessed). Teach terms should be stable across the course; standardize their definitions and reuse them to avoid inconsistent phrasing.
Common mistakes: definitions that are circular, analogies that introduce new technical constructs, and examples that contradict the original text’s constraints. Acceptance criteria: each definition is consistent with course usage, each example is plausible and aligned, and each non-example clarifies a boundary without introducing a competing misconception.
Practical workflow: maintain a glossary database (spreadsheet or CMS) with fields for term, definition (plain), definition (technical), example, non-example, pronunciation, and source lesson. Use AI to draft, but require human approval for teach terms. Over time, this becomes governance: a controlled vocabulary that makes future AI rewriting safer and more consistent.
Text-to-speech (TTS) and screen readers are unforgiving about structure and punctuation. AI can help you rewrite content to be “speakable,” but you must understand what assistive tech needs: clear headings, predictable lists, unambiguous abbreviations, and math expressed in a readable form. A TTS-ready version is not a different lesson; it is the same lesson encoded for reliable audio rendering.
Begin with formatting: use semantic headings (consistent levels), short paragraphs, and lists for steps. Avoid visual-only cues such as “see above” or “in the box on the right.” Replace them with explicit references (“In the previous section, ‘Variables’…”). Ensure link text is descriptive (“Read the rubric” instead of “Click here”).
Common mistakes: leaving raw symbols (→, ∑, ≤) without text equivalents, using inconsistent list punctuation, and embedding critical meaning in bold/italics only. Acceptance criteria: headings are navigable, lists read as lists, abbreviations are expanded at first mention, and math has an accessible reading.
Practical outcome: a “TTS pass” checklist you run after simplification. In production, this can be partially automated (regex checks for symbols/abbreviations) plus a quick screen-reader smoke test to catch awkward phrasing before learners do.
Fidelity is the difference between helpful supports and harmful misinformation. Treat every AI-generated reading support as a transformation that must be verified against the source. The most reliable method is a side-by-side check: source on the left, AI output on the right, with a reviewer confirming that each claim in the output is supported and that all critical constraints from the source are preserved.
Operationalize fidelity with a rubric. At minimum, check: (1) meaning preservation (no changed relationships, conditions, or causality), (2) completeness for the intended purpose (key steps not dropped in procedures; core definitions present), (3) language integrity (hedges, modality, and tone not altered in ways that change certainty), and (4) terminology consistency (protected terms unchanged; glossary-aligned definitions).
Common mistakes: accepting a “fluent” rewrite without verifying subtle shifts (especially in scientific claims), allowing AI to invent examples that feel plausible, and failing to record decisions for later audits. Practical governance: store prompts, version history, reviewer notes, and known failure modes. Over time, your prompts become safer because they incorporate real errors you’ve seen, and your supports become more consistent across a course catalog.
When you can demonstrate traceability, clear uncertainty handling, and learner-informed improvements, AI becomes a scalable accessibility ally rather than a hidden risk. That is the standard for accessibility-first reading supports.
1. In this chapter, what is the core purpose of reading supports?
2. Which workflow best matches the chapter’s accessibility-first approach to using AI for reading supports?
3. Which set best represents what must remain unchanged during simplification to preserve fidelity?
4. What does the chapter mean by the principle that every support should be “reversible”?
5. Which practice is part of the chapter’s QA loop for catching subtle meaning drift?
Accessibility-first production does not end when an AI tool generates captions, alt text, or simplified reading supports. The work becomes operational: verifying quality, documenting decisions, protecting learners, and making improvements predictable. In practice, teams fail not because they “don’t care,” but because they lack a shared definition of done, review roles, and a plan for what happens when defects are found after publication.
This chapter treats accessibility as a system you operate. You will set acceptance criteria for captions, transcripts, alt text, and reading supports; decide when to review everything versus sampling; manage privacy risks in audio/video and transcripts; and create lightweight reporting that helps stakeholders understand progress without drowning the team in paperwork. You’ll also establish an incident response path—because accessibility defects are not hypothetical, and learners should not be the first to discover them.
Think like an engineering lead: you are balancing risk, time, and learner impact. You will create checklists that are specific enough to catch common errors (names, numbers, equations, speaker changes, image intent) and flexible enough to support different course types. You will set error budgets and sampling plans so review effort is proportional to risk. You’ll clarify who approves what, and what evidence you keep to demonstrate compliance with WCAG and alignment with UDL goals.
The practical outcome: a repeatable workflow where AI accelerates production, humans safeguard meaning, and your team can answer tough questions—“How do you know captions are accurate?” “What do you do with minors’ voices?” “Can you show an audit trail?”—with confidence.
Practice note for Build checklists for captions, alt text, and reading supports: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.
Practice note for Set sampling plans, error budgets, and review roles: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.
Practice note for Handle privacy, consent, and data retention for media and transcripts: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.
Practice note for Establish incident response for accessibility defects: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.
Practice note for Create a lightweight accessibility report for stakeholders: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.
Practice note for Build checklists for captions, alt text, and reading supports: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.
Practice note for Set sampling plans, error budgets, and review roles: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.
Practice note for Handle privacy, consent, and data retention for media and transcripts: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.
Acceptance criteria are your accessibility “definition of done.” Without them, teams debate quality after the fact, and accessibility becomes subjective. Your criteria should be testable, tied to learner impact, and mapped to the asset type: captions/transcripts, alt text, and reading supports (summaries, simplifications, glossary, and TTS-ready text).
Start by turning WCAG/UDL principles into concrete checks. For captions and transcripts, criteria typically include: correct words and numbers (especially assessments, dates, prices, and measurements), correct speaker attribution, synchronized timing, punctuation that supports comprehension, and inclusion of meaningful non-speech audio (e.g., “laughter,” “door slams,” “music begins”) when relevant. A common mistake is “pretty good” captions that miss domain terms; fix this by requiring a glossary pass: all key terms must match course vocabulary exactly.
For alt text, acceptance criteria must emphasize intent and context: describe what matters for learning, not every pixel. Require that charts include the takeaway (trend, comparison, outliers) and that decorative images are marked as decorative (or given empty alt text) to reduce noise for screen reader users. Common mistakes include repeating nearby text verbatim, guessing identities, and including subjective judgments (“beautiful,” “scary”) without instructional purpose.
For reading supports, define what “safe simplification” means: preserve learning objectives, keep definitions consistent, avoid changing technical meaning, and avoid fabricating citations. Add a TTS-ready criterion: expand ambiguous abbreviations, remove odd punctuation that breaks speech, and ensure headings and lists read cleanly.
When acceptance criteria are explicit, AI becomes safer: prompts can target each criterion (“verify numbers,” “flag uncertain terms”), and humans can review with consistency.
Human review is non-negotiable, but “review everything” is not always feasible. Choose a review model that matches risk, maturity, and the stakes of the content. The three common models are full review, sampling, and risk-based review—and most teams blend them.
Full review means a human checks every asset before release. Use it when you are onboarding a new AI workflow, launching a new course with high visibility, publishing high-stakes content (certification prep, compliance training, health/safety), or working with minors. Full review also makes sense when source audio is noisy, speaker accents vary widely, or when content includes many numbers, formulas, or names.
Sampling review checks a subset to estimate overall quality. Sampling works when production volume is high and content risk is moderate. Define a sampling plan with clear rules: for example, review 10% of videos per module, with at least one from each instructor; review the first two items from a new vendor; and always sample items with low model confidence or high edit distance from previous versions. A common mistake is “random sampling” that misses edge cases; include targeted sampling for known failure modes (math, code, multiple speakers, heavy jargon).
Risk-based review allocates effort where harm is likely. Build a risk score using factors like: learner stakes (graded vs optional), audience (ESL learners, deaf/hard-of-hearing, screen reader users), media complexity (multiple speakers, background noise), and content sensitivity (medical, legal, identity). High-risk items get full review; low-risk items get sampling plus automated checks.
Make roles explicit. Typical roles include: Producer (runs AI generation), Accessibility QA (checks against criteria), Subject Matter Expert (validates technical meaning), and Approver (signs off for release). Pair this with error budgets: e.g., “caption word error rate must be below X% for release” and “zero tolerance for wrong numbers in assessments.” Error budgets clarify trade-offs and prevent endless polishing while still protecting learners.
Accessibility quality is not only about accuracy; it is also about harm prevention. AI-generated captions, alt text, and reading supports can introduce bias through word choice, assumptions about identity, or omission of context. A responsible operation includes a bias and harm review that is lightweight but consistent.
For language outputs (captions, transcripts, summaries, simplifications), check for: stigmatizing terms, unnecessary mention of protected characteristics, and “tone drift” that changes how a speaker is perceived (e.g., making a hesitant speaker sound incompetent). Watch for over-simplification that removes agency (“they can’t understand”) or changes modality (“must” vs “may”). Another common mistake is “correcting” dialect or accent in a way that erases identity; captions should be readable, but they should not rewrite meaning or mock speech patterns.
For image descriptions, avoid guessing sensitive attributes (race, gender identity, disability, religion) unless it is clearly relevant to the learning objective and evident from the content. Prefer observational, verifiable descriptions: “a person using a wheelchair” (if visible and relevant) rather than inferred medical conditions. For classroom photos, be careful: alt text can unintentionally identify students (“Sarah in the front row”) or disclose a minor’s identity; default to non-identifying descriptions.
Operationalize harm review with a short rubric and escalation path:
The goal is not to eliminate all risk—no process can—but to make risk visible, reviewed, and learnable. Each flagged issue should feed back into prompt guidance (e.g., “do not infer identity,” “describe only what is needed for the learning point”) and into your training data choices (custom term lists and style guides).
Media accessibility workflows handle sensitive data by default: voices, faces, names, and sometimes full classroom scenes. Captions and transcripts can amplify privacy risk because they turn hard-to-search audio into searchable text. Responsible AI operations require privacy-by-design decisions before you upload content to any tool.
Start with a data inventory: what media you collect, where it is stored, who accesses it, and which vendors process it. Identify PII and sensitive content: student names, email addresses, student IDs, health information, location references, and any content involving minors. For classroom recordings, assume incidental capture (students speaking off-camera, name tags, laptop screens). Decide whether you can avoid capturing it (camera framing, audio policies) rather than trying to “clean it later.”
Consent must be explicit and age-appropriate. For minors, obtain guardian consent and follow school or district policy. Even with consent, apply minimization: do not include faces or names in publicly distributed materials unless necessary. In transcripts, consider redaction rules: replace identifiers with neutral tokens (“[Student]”) when the identity is not instructionally relevant.
Operational controls to implement:
A common mistake is treating captions as “just text.” Treat them as derived personal data when they include identifiable speech. Align your workflow with your institution’s privacy policy and applicable laws (e.g., FERPA or local equivalents), and document the decisions so teams can follow them consistently.
Metrics make QA and compliance manageable at scale. You do not need a complex dashboard to start, but you do need a few measures that connect directly to learner experience and operational performance. Good metrics also support your sampling plans and error budgets: you can tighten review when quality drops and relax it when the process is stable.
For captions and transcripts, track an accuracy metric that matches your tools and capacity. Word Error Rate (WER) is common, but even a simpler “critical error count per minute” can be effective if consistently applied. Define what counts as critical: wrong numbers, wrong negation (“can” vs “can’t”), incorrect key terms, or missing speaker changes. Track formatting compliance too: are captions present, properly synchronized, and readable (line length, timing)?
For alt text, measure coverage (percentage of images with appropriate alt text or marked decorative) and rubric pass rate (intent captured, no identity guessing, chart takeaway present). A frequent operational issue is “alt text exists but is unhelpful”; a rubric-based review catches this better than a binary check.
For reading supports, track factual consistency issues found in review, and whether learners use the supports. Useful measures include: number of glossary lookups, time-on-page changes, and support feature adoption (TTS usage, summary views). Pair usage with learner satisfaction: short feedback prompts embedded in the UI (“Was this summary accurate and helpful?”) can surface failures that internal QA misses.
Operational metrics matter as well: turnaround time from upload to publish; percent of items meeting SLA; and rework rate (how often assets require a second review). Tie these to your review model: if sampling finds rising error rates, increase sample size or shift categories to full review. Finally, publish a monthly “top defects” list (e.g., misrecognized domain terms, missing non-speech cues) to drive prompt updates, vocabulary injection, and targeted training for editors.
Documentation is what turns a one-off success into an operation you can trust, delegate, and defend. The goal is lightweight governance: enough structure to produce consistent outputs and satisfy audits, without creating a bureaucratic bottleneck. Focus on three artifacts: prompt libraries, rubrics, and audit trails.
Prompt libraries are versioned templates for recurring tasks: caption cleanup, speaker labeling, alt text for charts, safe simplification, and glossary generation. Store prompts with context: intended use, constraints (“do not infer identity,” “do not add facts”), examples of good outputs, and known failure modes. When a defect occurs, you should be able to update the prompt and know which assets were generated under the old version.
Rubrics translate acceptance criteria into scoring guides reviewers can apply consistently. Keep rubrics short and observable: “captures instructional intent,” “accurate key terms,” “no privacy leakage,” “chart takeaway included,” “TTS-ready formatting.” Attach “stop-ship” conditions so reviewers don’t negotiate critical issues under deadline pressure. A common mistake is rubrics that are too abstract; include concrete examples from your own content types (math lecture vs coding demo vs discussion-based seminar).
Audit trails are your evidence of compliance and responsible AI use. At minimum, log: source asset ID, tool/vendor, model/version (if available), prompt version, reviewer(s), review outcome, defects found, and final publish date. If you support incident response, add a link to the ticket and remediation notes.
Plan for incidents explicitly. An accessibility defect—missing captions, incorrect transcript for an assessment, harmful alt text—should trigger a predictable response: severity classification, temporary mitigation (unpublish or replace), correction timeline, and a post-incident review that updates checklists, prompts, and sampling rules. This is also where you create a lightweight stakeholder accessibility report: coverage rates, quality metrics, known risks, incidents resolved, and next improvements. Stakeholders don’t need every detail; they need to see that accessibility is measured, owned, and continuously improved.
1. What is the main shift described in Chapter 5 after AI generates captions, alt text, or reading supports?
2. Which practice most directly prevents teams from failing due to a lack of shared expectations and post-publication defect handling?
3. How does the chapter recommend balancing review effort with risk and learner impact?
4. What is the purpose of making checklists both specific and flexible?
5. Why does Chapter 5 emphasize incident response and lightweight reporting together?
This capstone is where your accessibility-first workflow stops being “good practice” and becomes a ship-ready artifact a team can adopt. You will assemble a complete support pack for one mini-lesson (a 3–7 minute instructional segment), run final QA and publish-ready exports, and write implementation notes for a handoff. You’ll also present outcomes with before/after examples and a small set of metrics, then plan iteration with a backlog and continuous improvement loop.
Think like a production engineer and an accessibility specialist at the same time: your job is not just to generate outputs, but to control risk. AI can accelerate captions, alt text, and reading supports—but it also introduces predictable errors (misheard terms, hallucinated definitions, incorrect speaker labels, overly chatty summaries). Your capstone must show governance: prompts, rubrics, acceptance criteria, human review steps, and an “error budget” that defines what must be fixed before release versus what can wait.
Throughout this chapter, you’ll build three deliverables—(1) captions + transcript package, (2) alt text + one long description, and (3) a reading supports bundle—then package them for an LMS/repository. Finally, you’ll frame your work as a portfolio-ready case study: not “I used AI,” but “I shipped an accessibility-first support pack with measurable quality controls.”
Practice note for Assemble a complete support pack for one mini-lesson: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.
Practice note for Run final QA and publish-ready exports: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.
Practice note for Write implementation notes for a team handoff: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.
Practice note for Present outcomes: before/after examples and metrics: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.
Practice note for Plan iteration: backlog and continuous improvement: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.
Practice note for Assemble a complete support pack for one mini-lesson: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.
Practice note for Run final QA and publish-ready exports: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.
Practice note for Write implementation notes for a team handoff: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.
Practice note for Present outcomes: before/after examples and metrics: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.
Practice note for Plan iteration: backlog and continuous improvement: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.
Your capstone starts with a brief that makes the work real. Choose one mini-lesson: a short video with at least one visual aid (slide, diagram, or screenshot) and a short accompanying reading (script, handout, or outline). List all assets you will touch: source video, raw audio (if available), slide deck/images, any on-screen text, instructor notes, and a glossary of domain terms if the course already has one.
Next, define constraints. Common ones: tight turnaround (e.g., 2 hours), a target platform (YouTube, Kaltura, Panopto, LMS), and a baseline accessibility standard (WCAG 2.2 AA where applicable; UDL principles for multiple means of representation). Also note learner context: are they novices, multilingual learners, or professionals skimming for reference? This context should influence your reading supports and alt text detail.
Write success criteria as acceptance tests, not aspirations. Example criteria you can reuse:
Common mistake: starting with AI prompts before writing acceptance criteria. Your criteria act as a guardrail when the AI output feels fluent but is wrong. End this brief with your “definition of done” and who must sign off (you, SME, accessibility reviewer).
For the captions package, your goal is publish-ready captions plus evidence that you checked them. Start by generating a first-pass transcript using your preferred tool (ASR, then LLM cleanup). Immediately lock in a terminology list: product names, acronyms, instructor names, and any specialized vocabulary. Feed that list back into your editing pass so the model has a controlled reference.
Edit in two layers: (1) semantic accuracy and (2) captioning conventions. Semantic accuracy means the words match the audio and intent—no paraphrasing, no “helpful” expansions. Captioning conventions include line breaks, punctuation that supports readability, and timing that avoids flashing too quickly. If you can’t edit timing directly, at minimum ensure the text is segmented into caption-friendly chunks.
Create a QA log. This is the artifact that proves your workflow is accessibility-first rather than AI-first. Include:
Export formats should match publishing needs: SRT or VTT for captions, and a clean transcript (TXT/DOCX/PDF) for learners who prefer reading. Keep the transcript TTS-friendly: avoid unusual punctuation, keep speaker labels consistent, and ensure headings are meaningful if you format it.
Common mistakes: letting the model “clean up” filler words into different meaning; failing to verify numbers (2.2 vs 2.1), URLs, or code snippets; and forgetting that captions are not just a transcript—they are timed reading supports.
Your second deliverable is an alt text set for all visuals in the mini-lesson plus one long description for a complex visual (a chart, process diagram, or multi-step interface screenshot). Start by inventorying visuals and classifying them: decorative, informative, functional (buttons/links), and complex. Decorative items get null alt (empty alt attribute) in HTML contexts; informative and functional must communicate purpose; complex visuals need both succinct alt text and a longer structured description nearby.
Use AI to draft, but you must supply context: “What is the learning objective of this visual?” and “What should the learner be able to do after seeing it?” Provide surrounding narration or slide notes to prevent AI from describing irrelevant details.
For standard informative images, aim for one to two sentences. Example pattern: What it is + what matters. Avoid redundancies like “image of.” For functional images (icons), name the action: “Search” or “Download transcript (PDF).”
For the complex visual, write a long description that is scannable and complete. A practical structure:
QA the alt text against intent: if a learner only had the text, could they complete the learning task? Common mistakes: over-describing colors or layout while missing the point; inventing data in a chart; and failing to align with on-screen text (if text is already present, don’t duplicate it verbatim—summarize its function).
Include reviewer notes in your QA log: what changed from AI draft to final, and why. This demonstrates engineering judgment, not just output generation.
Reading supports are where AI can help the most—and also mislead the fastest. Your bundle should include: a short summary, a glossary of key terms, and a plain-language version of the lesson text (or a simplified companion). Tie each artifact to a learner need: quick review, vocabulary support, and reduced cognitive load.
Start from a single “source of truth”: the final transcript or instructor script. Generate the summary from that text, not from memory, and enforce a “no new claims” rule. A useful governance technique is a traceability check: pick 5–10 sentences from the summary and confirm each is directly supported by the transcript. If any sentence can’t be traced, revise or remove it.
For the glossary, limit to terms that matter for comprehension (usually 6–12 for a mini-lesson). Require definitions to be:
For plain language, do not “dumb down.” Preserve technical meaning while improving clarity: shorter sentences, active voice, defined acronyms, and explicit referents (replace “this” with “this checklist”). Keep structure: headings, bullets, and numbered steps. Ensure TTS readiness: avoid odd punctuation, keep abbreviations expanded on first use, and avoid ambiguous symbols.
Common mistakes: AI summaries that sound polished but overgeneralize; glossaries that hallucinate confident-sounding definitions; and simplifications that remove critical constraints (“always” vs “often”). Your QA should include a quick SME spot-check and a readability pass (e.g., grade level as a signal, not a requirement).
Now make it shippable. Packaging is where accessibility work often fails—files exist, but nobody can find, trust, or update them. Create a folder structure that mirrors how teams actually work (lesson-based, with a clear version). Example:
Adopt naming conventions that survive handoffs: include lesson ID, locale, and version. For example: L06_MiniLesson_en-US_captions_v1.1.vtt. If your team uses Git, treat these as version-controlled text assets where possible (VTT, MD, CSV). If not, maintain a CHANGELOG that records what changed, who changed it, and why.
Create publish-ready exports for the LMS: captions uploaded to the video host, transcript linked adjacent to the video, long description placed near the complex visual, and reading supports downloadable and/or embedded below the lesson. Validate that links work and that the LMS doesn’t strip formatting needed for readability (lists, headings).
Final QA before publishing should include at least: playback check with captions on, transcript download/open check, screen reader spot-check for long description placement, and a quick mobile view check. Common mistakes: mismatched caption files (old version uploaded), transcripts not updated after caption edits, and long descriptions buried in a separate file learners won’t discover.
Your capstone becomes career leverage when you frame it as impact plus rigor. Capture “before/after” examples: a 15–30 second caption segment before edits (with errors highlighted) and after; an AI-drafted alt text vs final alt text; an original paragraph vs plain-language rewrite. Keep examples small and anonymized if needed.
Define lightweight metrics that signal quality without pretending to be perfect science. Examples:
Write 2–3 resume bullets that show ownership and governance. Example patterns: “Shipped…,” “Implemented QA…,” “Reduced errors by…,” “Built a reusable prompt + rubric….” Avoid vague claims like “improved accessibility.” Name the artifacts and acceptance criteria you enforced.
For a short case study, use a clear structure: context → constraints → workflow → deliverables → QA findings → outcomes → next iteration. In interviews, your talk track should highlight engineering judgment: where you did not trust AI, how you validated outputs, and how you designed the process so others can repeat it. End with an iteration plan: a backlog of improvements (e.g., term dictionary expansion, automated reading-rate checks, localization support) and a continuous improvement cadence (monthly audits, error budget review, stakeholder feedback loop).
1. What makes the Chapter 6 capstone “ship-ready” rather than just an example of good practice?
2. Why does the chapter emphasize thinking like both a production engineer and an accessibility specialist?
3. Which set of deliverables is required in the capstone support pack?
4. Which approach best demonstrates “governance” for AI-assisted accessibility work in the capstone?
5. How should outcomes be presented to make the capstone portfolio-ready according to the chapter?