HELP

+40 722 606 166

messenger@eduailast.com

DP-100 Azure Machine Learning & MLflow: Hands-On Exam Guide

AI Certification Exam Prep — Beginner

DP-100 Azure Machine Learning & MLflow: Hands-On Exam Guide

DP-100 Azure Machine Learning & MLflow: Hands-On Exam Guide

Master DP-100 skills in Azure ML + MLflow with hands-on, exam-style practice.

Beginner dp-100 · microsoft · azure · azure-machine-learning

Prepare with a DP-100 blueprint built for hands-on learners

This course is a practical, exam-aligned guide for the Microsoft DP-100 certification exam (Azure Data Scientist Associate). You’ll learn how to work end-to-end in Azure Machine Learning and MLflow—from designing a solution and running experiments to deploying models and optimizing language models for AI applications. The goal is simple: help you build real skills while training you to recognize the patterns Microsoft uses in DP-100 questions.

Official DP-100 exam domains covered (end to end)

The curriculum is structured as a 6-chapter “book” that maps directly to the official exam domains:

  • Design and prepare a machine learning solution
  • Explore data and run experiments
  • Train and deploy models
  • Optimize language models for AI applications

Each domain chapter includes clear decision frameworks (what to choose and why), common DP-100 traps, and exam-style practice sets to reinforce the objective names and expected outcomes.

How the 6 chapters are organized

Chapter 1 is your exam onboarding: registration options, scoring and question types, and a realistic study plan for beginners. You’ll also set up your environment (Azure subscription considerations, budgets, and the Azure ML tools you’ll use) so you can follow along confidently.

Chapters 2–5 each focus on one official domain (or a tightly related set of objectives). You’ll move from solution design (workspaces, data access, compute, environments) into experimentation (EDA, jobs, MLflow tracking, AutoML), then into training and deployment (pipelines, registries, endpoints), and finally into the domain on language model optimization (prompting, evaluation, RAG and fine-tuning decision points, and safe operationalization).

Chapter 6 is a full mock exam experience split into two parts. It includes rationales mapped back to objectives, a weak-spot analysis workflow, and an exam-day checklist so you walk in with a repeatable strategy.

Why this course helps you pass DP-100

  • Objective-first design: every chapter references the official domain names so you always know what you’re studying and why.
  • Hands-on mental models: you’ll learn how Azure ML assets relate (data assets, jobs, pipelines, models, endpoints) and how MLflow fits into tracking and model packaging.
  • Exam-style practice: scenario questions emphasize tradeoffs (cost vs performance, security vs convenience, online vs batch deployment) that show up repeatedly on DP-100.
  • Beginner-friendly pacing: no prior certification experience assumed—only basic IT literacy and willingness to practice.

Get started on Edu AI

If you’re ready to begin, create your account and start learning: Register free. If you want to compare options first, you can also browse all courses.

By the end of this course, you’ll have a clear DP-100 study plan, practical Azure ML + MLflow workflows you can repeat, and enough exam-style practice to approach the test with confidence.

What You Will Learn

  • Design and prepare a machine learning solution in Azure Machine Learning (DP-100 domain)
  • Explore data and run experiments using Azure ML, notebooks, SDK v2, and MLflow tracking (DP-100 domain)
  • Train and deploy models with pipelines, registries, managed endpoints, and batch scoring (DP-100 domain)
  • Optimize language models for AI applications using prompt design, evaluation, and deployment patterns (DP-100 domain)
  • Apply governance, security, and responsible AI practices across the ML lifecycle in Azure ML (DP-100 domain)
  • Answer DP-100 exam-style questions with strong time management and elimination strategies (all domains)

Requirements

  • Basic IT literacy (files, networking basics, using a web browser)
  • Comfort with basic Python concepts (variables, functions, reading data files) helpful but not required
  • No prior certification experience needed
  • A Microsoft account and ability to access Azure (free/paid options discussed in Chapter 1)

Chapter 1: DP-100 Exam Orientation and Study Game Plan

  • Understand DP-100 format, domains, and skills measured
  • Register for the exam and set up your study environment
  • Build a 2–4 week study plan with hands-on checkpoints
  • Baseline assessment: identify strengths and weak spots

Chapter 2: Design and Prepare a Machine Learning Solution (Domain 1)

  • Choose the right Azure ML components for a solution design
  • Set up data access, compute, and environments correctly
  • Design feature engineering, data prep, and governance approach
  • Domain 1 practice set: design-and-prepare exam questions

Chapter 3: Explore Data and Run Experiments (Domain 2)

  • Perform EDA and validate data quality for training readiness
  • Run experiments using Azure ML jobs and MLflow tracking
  • Use AutoML responsibly and interpret metrics
  • Domain 2 practice set: experiments and data exploration questions

Chapter 4: Train and Deploy Models (Domain 3)

  • Train models with scripts, pipelines, and distributed options
  • Register, version, and manage models with MLflow and registries
  • Deploy to managed online endpoints and batch endpoints
  • Domain 3 practice set: training, deployment, and MLOps exam questions

Chapter 5: Optimize Language Models for AI Applications (Domain 4)

  • Select an LLM approach: prompt-first vs fine-tune vs RAG
  • Evaluate, optimize, and monitor LLM application quality
  • Operationalize LLM apps with Azure ML deployment patterns
  • Domain 4 practice set: LLM optimization exam questions

Chapter 6: Full Mock Exam and Final Review

  • Mock Exam Part 1
  • Mock Exam Part 2
  • Weak Spot Analysis
  • Exam Day Checklist

Renee Caldwell

Microsoft Certified Trainer (MCT) | Azure Data Scientist Associate

Renee Caldwell is a Microsoft Certified Trainer who helps learners prepare for Microsoft role-based certifications with practical labs and exam-focused coaching. She specializes in Azure Machine Learning, MLOps, and responsible model deployment aligned to DP-100 objectives.

Chapter 1: DP-100 Exam Orientation and Study Game Plan

DP-100 is a practitioner’s exam: it rewards candidates who can translate a loosely defined business ask into an Azure Machine Learning implementation that is secure, traceable, cost-aware, and deployable. This chapter sets your “exam frame”—what Microsoft expects you to do on the job, how the exam measures it, and how to build a short, hands-on study plan that targets the skills measured instead of drifting into general ML theory.

You will see recurring patterns throughout this course: choose the correct Azure ML asset (data, environment, compute, job, model, endpoint), connect it with reproducible experimentation (MLflow tracking), and apply governance (RBAC, managed identities, responsible AI) across the lifecycle. The DP-100 exam rarely asks for long math derivations; instead, it tests decision-making: which service, which configuration, which artifact, which deployment pattern, and which operational guardrails.

Exam Tip: Treat every question as a mini scenario: identify the “primary constraint” (cost, latency, governance, reproducibility, portability, scale) and eliminate options that violate it. DP-100 is filled with distractors that are technically possible but operationally wrong.

This chapter also introduces your study game plan: set up an environment that mirrors the exam’s world, run a baseline assessment to find weak spots, then progress through labs-first checkpoints mapped to the official objectives.

Practice note for Understand DP-100 format, domains, and skills measured: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Register for the exam and set up your study environment: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Build a 2–4 week study plan with hands-on checkpoints: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Baseline assessment: identify strengths and weak spots: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Understand DP-100 format, domains, and skills measured: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Register for the exam and set up your study environment: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Build a 2–4 week study plan with hands-on checkpoints: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Baseline assessment: identify strengths and weak spots: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Understand DP-100 format, domains, and skills measured: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Sections in this chapter
Section 1.1: What DP-100 tests: roles, scenarios, and exam domains

DP-100 targets the day-to-day work of a machine learning engineer using Azure Machine Learning. The exam is scenario-driven: you are often placed in a team context (data scientists, security, platform engineers) and asked to make the right implementation choices under constraints. The skill is not “can you train a model,” but “can you operationalize training and deployment in Azure ML with traceability and controls.”

Expect the exam domains to orbit a lifecycle: (1) design and prepare an ML solution, (2) explore data and run experiments, (3) train models and deploy solutions, and (4) manage, govern, and monitor. In this course, we also emphasize MLflow because DP-100 commonly tests experiment tracking, reproducibility, and artifact management—even if the question does not explicitly say “MLflow.”

  • Design/prepare: choosing compute, networking posture, identity, and data access patterns.
  • Experimentation: jobs, components, notebooks, metrics logging, and repeatable runs.
  • Training/deployment: pipelines, registries, managed online endpoints, and batch scoring.
  • Governance: RBAC, data protection, model lineage, and responsible AI practices.

Common trap: studying “Azure AI services” broadly (Vision, Speech, etc.) at the expense of Azure Machine Learning core assets. DP-100 is about Azure ML as the platform for training and MLOps, not about calling prebuilt cognitive APIs (though you may integrate them in real projects).

Exam Tip: When an option mentions “quickest” or “simplest,” verify it still meets enterprise requirements: reproducibility (tracking), security (identity/network), and deployment operations (rollback/monitoring). DP-100 tends to prefer scalable, managed patterns over ad-hoc scripts.

Section 1.2: Registration, delivery options, policies, and accommodations

Registering correctly is not just logistics—it affects your stress level, scheduling, and the time window you have for a focused 2–4 week sprint. DP-100 is delivered through Microsoft’s exam providers with two typical delivery options: testing center or online proctored. Choose the mode that minimizes risk for you.

Testing center advantages: stable hardware/network, fewer proctoring interruptions, and a controlled environment. Online proctored advantages: flexible scheduling and no commute, but you must control your environment (clean desk, no secondary monitors, stable internet, and a quiet room). Many candidates lose time dealing with check-in issues rather than questions.

  • Verify your ID requirements and name match exactly.
  • Schedule at a time when you are mentally sharp (not “after work if possible”).
  • Read policies on breaks, scratch paper/whiteboard, and allowable items.

Accommodations and accessibility options exist—request them early. If you need additional time, screen reader support, or other accommodations, do not wait until the week of the exam; approval workflows can take time.

Exam Tip: Before exam day, do a “dry run”: confirm your login, system test (if online), and that you can access the exam provider without corporate VPN restrictions. Avoid last-minute surprises that drain focus.

Common trap: underestimating policy friction. Candidates sometimes reschedule late and lose momentum. Build your study plan backward from your exam appointment, and include a buffer day for a final review rather than cramming.

Section 1.3: Scoring, question types, case studies, and time management

DP-100 questions are designed to test applied judgment. You may see multiple-choice, multi-select (“choose all that apply”), drag-and-drop ordering/matching, and case studies with several questions tied to a single scenario. Case studies frequently include a mix of requirements (security, cost, performance) plus existing constraints (current workspace setup, network rules, or organizational policies).

Scoring is pass/fail with a scaled score. Practically, your goal is consistent accuracy across domains rather than perfection in one area. Time management matters: many candidates “overspend” time on early questions and then rush case study items, where points accumulate quickly.

  • First pass: answer what you know confidently; mark uncertain items for review.
  • Second pass: return to marked items, using elimination and scenario constraints.
  • Final pass: ensure multi-select answers align with “must” requirements (not nice-to-haves).

Exam Tip: Watch for absolute wording: “must,” “only,” “minimum cost,” “without code changes,” “no public internet,” “auditable lineage.” These words define the constraint that determines the correct Azure ML feature (for example, managed identity vs. keys, private endpoints vs. public access, model registry vs. local artifacts).

Common traps include misreading multi-select prompts (selecting too few/many), confusing similarly named constructs (workspace vs. registry, endpoint vs. deployment), and assuming “more complex = more correct.” DP-100 often rewards the simplest solution that meets all constraints—especially around governance and cost control.

Build exam stamina: do timed practice blocks to learn your pace. A useful heuristic is to keep a steady cadence and avoid “debugging” a question as if you were in a live notebook session. The exam tests decisions, not troubleshooting.

Section 1.4: Azure setup: subscriptions, budgets, quotas, and cost control

Your hands-on practice environment should mirror real Azure constraints. DP-100 tasks can involve compute clusters, managed endpoints, and data storage—all of which can incur cost. Set up your Azure subscription with guardrails so you can practice freely without fear of a surprise bill.

Start with a dedicated subscription or a dedicated resource group for the course. Use budgets and alerts to enforce discipline. Azure Machine Learning can scale quickly if you leave compute running or deploy endpoints with multiple instances.

  • Budgets: create a monthly budget with email alerts at 50/75/90% thresholds.
  • Compute hygiene: set idle shutdown where available; delete unused compute instances.
  • Quotas: check regional vCPU/GPU quotas early; request increases before you “need them.”
  • Endpoint control: scale deployments to zero (when supported) or delete endpoints after labs.

DP-100 questions often hide cost control inside operational requirements: “minimize cost,” “avoid idle compute,” or “support bursty traffic.” You should recognize which levers are appropriate: autoscaling settings, choosing CPU vs. GPU, picking serverless/managed options when available, and selecting batch scoring when real-time latency is not needed.

Exam Tip: When you see “minimize cost” plus “infrequent predictions,” lean toward batch endpoints or on-demand patterns rather than a continuously running online endpoint. Conversely, if the requirement is “low latency,” an online endpoint is usually the expected construct.

Common trap: assuming quotas are purely administrative. On the exam, quota constraints can invalidate an otherwise “correct” compute choice (e.g., selecting a GPU SKU that is not available in the region). Learn to check region/SKU availability as part of your design thinking.

Section 1.5: Tooling overview: Azure ML studio, SDK v2, CLI, MLflow

DP-100 expects you to be fluent across interfaces: Azure ML studio (UI), notebooks, the Azure ML SDK v2 (Python), the Azure ML CLI v2, and MLflow for tracking. The exam does not require you to memorize every command, but it does test whether you know which tool is appropriate and what it produces (assets, runs, artifacts, lineage).

Azure ML studio is where you validate assets visually: data, compute, jobs, models, endpoints, monitoring, and responsible AI dashboards. SDK v2 and CLI v2 are where you operationalize: define jobs, environments, pipelines/components, and deployments as code. MLflow is the connective tissue for experiment tracking—logging parameters, metrics, and artifacts in a consistent, queryable way.

  • Studio: fastest for inspection, troubleshooting, and configuration checks.
  • SDK v2: Python-first automation; strong for pipelines and programmatic deployments.
  • CLI v2: YAML-driven reproducibility; great for CI/CD and consistent environments.
  • MLflow: experiment tracking and model packaging; helps standardize training across tools.

Exam Tip: If a scenario emphasizes “reproducible runs,” “auditable experiments,” or “compare runs,” your mental keyword is tracking—typically MLflow and Azure ML job history. If it emphasizes “repeatable deployment across environments,” your keyword is infrastructure-as-code style definitions (CLI/SDK with YAML) and registries.

Common trap: mixing SDK v1 terminology with SDK v2 concepts. DP-100 is aligned with modern Azure ML patterns: jobs, components, environments, registries, and managed endpoints. When you study, anchor on v2 language to avoid selecting an answer that uses older constructs in a misleading way.

As you progress, deliberately practice the same task in two ways (studio + SDK/CLI). The exam often tests conceptual understanding: knowing that both methods exist and which is preferable under governance, repeatability, or team collaboration requirements.

Section 1.6: Study strategy: labs-first plan, review cadence, and objective mapping

Your goal is a short, high-yield study cycle: 2–4 weeks is realistic if you prioritize labs and map every hour to an objective. The DP-100 candidate who wins is not the one who reads the most, but the one who repeatedly implements the lifecycle: data → experiment → train → register → deploy → monitor/govern.

Start with a baseline assessment to identify strengths and weak spots. Do not guess your gaps—measure them. After the baseline, build your plan around hands-on checkpoints that produce tangible artifacts: a tracked experiment with MLflow, a pipeline/job definition, a registered model in a registry, and a deployment to a managed endpoint (online and/or batch).

  • Week 1 (foundation): workspace setup, data access patterns, compute setup, first tracked runs.
  • Week 2 (MLOps core): environments, jobs/components, pipelines, model registry and lineage.
  • Week 3 (deployment): managed online endpoints, batch scoring, scaling, and troubleshooting patterns.
  • Week 4 (governance + polish): RBAC/identity, networking posture, responsible AI, timed reviews.

Use a simple review cadence: daily micro-review (15–20 minutes of notes + command patterns), and a weekly consolidation session where you rewrite “objective maps” from memory. Objective mapping means you can point to each skill and say, “I can do it in studio, and I can do it as code.”

Exam Tip: Maintain a “trap log.” Every time you miss a practice item, record the exact misunderstanding (e.g., endpoint vs. deployment, workspace vs. registry, batch vs. online). Review the trap log before any timed practice to reduce repeat errors.

Finally, keep the course outcomes in view: this exam is not only about training models. It is about deploying them safely, tracking them reliably (MLflow), and operating them responsibly in Azure ML. If your study sessions always end with an asset you can point to in Azure ML, you are studying in the direction the exam measures.

Chapter milestones
  • Understand DP-100 format, domains, and skills measured
  • Register for the exam and set up your study environment
  • Build a 2–4 week study plan with hands-on checkpoints
  • Baseline assessment: identify strengths and weak spots
Chapter quiz

1. You are advising a team starting DP-100 prep. They keep studying generic ML theory (derivations, proofs) and are not improving on practice questions. Based on how DP-100 is designed, which study adjustment best aligns with the exam’s intent?

Show answer
Correct answer: Shift focus to scenario-based decisions in Azure ML (assets, compute, jobs, deployment, governance) and validate with hands-on labs
DP-100 is a practitioner exam that emphasizes choosing the right Azure ML configuration and operational approach (security, traceability, deployability, cost awareness) rather than long math derivations. Option B is wrong because the exam rarely asks for lengthy mathematical proofs; it tests implementation and decision-making. Option C is wrong because cost questions are typically scenario-driven (tradeoffs/constraints), not pure memorization of pricing tables.

2. A company asks you to create a 3-week DP-100 study plan for new hires. They have limited time and want measurable progress. Which plan structure best matches the skills-measured approach described in the course?

Show answer
Correct answer: Map weekly checkpoints to official objectives and complete labs that produce Azure ML artifacts (data assets, environments, computes, jobs, models, endpoints)
DP-100 rewards candidates who can implement solutions in Azure ML; a short plan should be objectives-driven with labs-first checkpoints that create and connect the core Azure ML assets. Option B is wrong because postponing hands-on work delays the operational practice DP-100 tests. Option C is wrong because the exam focuses on Azure ML implementation patterns rather than building custom libraries outside the platform.

3. You are taking a baseline assessment to identify DP-100 weak spots. Your results show you understand modeling concepts but frequently miss questions about traceability and reproducibility. Which capability should you prioritize practicing to address that gap?

Show answer
Correct answer: MLflow tracking for experiment logging and reproducible runs tied to Azure ML jobs
Traceability and reproducibility in the DP-100 context are strongly associated with consistent experiment tracking and lifecycle artifacts (for example, using MLflow tracking integrated with Azure ML jobs). Option B is wrong because manual calculations are not the main DP-100 focus and do not directly improve traceability. Option C is wrong because data augmentation is a modeling tactic and does not address governance/traceability requirements.

4. You are reviewing a practice DP-100 question that describes a loosely defined business ask and provides multiple technically possible solutions. According to the chapter’s exam strategy, what should you do first to increase your chance of choosing the correct answer?

Show answer
Correct answer: Identify the primary constraint (for example, cost, latency, governance, reproducibility, portability, or scale) and eliminate options that violate it
DP-100 questions often hinge on an operational constraint and include distractors that are feasible but operationally wrong. Option B is wrong because more services can increase complexity/cost and may violate the scenario constraint. Option C is wrong because the exam commonly prioritizes deployability, governance, reproducibility, and cost-aware design over maximum theoretical accuracy.

5. A healthcare organization wants an Azure ML solution that supports secure access and governance across the ML lifecycle (from data to endpoints). Which approach best fits DP-100’s expected operational guardrails?

Show answer
Correct answer: Apply RBAC and managed identities to Azure ML resources and integrate responsible AI considerations during development and deployment
DP-100 expects candidates to apply governance throughout the lifecycle, commonly using RBAC and managed identities for secure, auditable access and incorporating responsible AI practices. Option B is wrong because shared admin accounts reduce traceability and violate least-privilege expectations. Option C is wrong because deferring governance contradicts the exam’s emphasis on secure, traceable, and compliant implementations from the start.

Chapter 2: Design and Prepare a Machine Learning Solution (Domain 1)

Domain 1 of DP-100 is where the exam checks whether you can design a workable Azure Machine Learning (Azure ML) solution before you ever train a model. The test is less about “what algorithm” and more about choosing the right Azure ML components, setting up data access and compute correctly, and ensuring reproducibility and governance. In real projects, weak design decisions here create the most expensive failures later (slow iteration, broken deployments, data leakage, or security blocks). On the exam, these topics appear as scenario prompts asking you to pick the best component, configuration, or access pattern.

This chapter ties together the design-and-prepare workflow you’ll use in Azure ML Studio and the SDK v2, while also reflecting how teams track runs with MLflow. Expect questions that blend multiple constraints: cost, network isolation, identity, data sensitivity, reproducibility, and scale. Your job is to identify the constraint that “drives” the design, eliminate choices that violate it, then select the option that satisfies the most requirements with the least operational overhead.

Exam Tip: When a scenario mentions “repeatable,” “auditable,” “promote to production,” or “compare experiments,” immediately think: versioned data assets + environment pinning + MLflow/experiment tracking. When it mentions “no public internet,” “data exfiltration,” or “private,” think: private endpoints, managed identity, and workspace-managed networking patterns.

Practice note for Choose the right Azure ML components for a solution design: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Set up data access, compute, and environments correctly: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Design feature engineering, data prep, and governance approach: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Domain 1 practice set: design-and-prepare exam questions: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Choose the right Azure ML components for a solution design: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Set up data access, compute, and environments correctly: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Design feature engineering, data prep, and governance approach: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Domain 1 practice set: design-and-prepare exam questions: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Choose the right Azure ML components for a solution design: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Sections in this chapter
Section 2.1: Workspace and assets: datastores, data assets, components, registries

Azure ML workspaces are the control plane for ML assets: connections to data, compute definitions, environments, models, and deployments. DP-100 frequently tests whether you can distinguish between a datastore (a workspace-linked connection to storage) and a data asset (a versioned, reusable pointer to data used in jobs). Datastores abstract credentials and endpoints; data assets add discoverability, lineage, and versioning for repeatable pipelines.

In SDK v2 terms, you typically register a data asset that points to a path in a datastore (for example, an Azure Blob container). Then you reference the data asset in jobs and pipelines, which is exactly what the exam expects when it asks for “reusable across experiments” or “track which dataset version trained the model.” If the question emphasizes cross-workspace reuse, look for registries: an Azure ML registry is designed to share models, environments, and components across multiple workspaces (often across teams or environments).

  • Datastores: workspace connection to storage (Blob, ADLS Gen2, etc.); used for mounting or downloading data securely.
  • Data assets: versioned references to data (URI file/folder/table); key for lineage and reproducibility.
  • Components: reusable steps in pipelines (command components); enable modular design and standardization.
  • Registries: central distribution of assets (models/environments/components) across workspaces.

A common exam trap is choosing “datastore” when the scenario is really about versioning and lineage. Another trap is choosing “model registry” concepts from other platforms; on DP-100, the Azure ML registry is the right answer when you need enterprise sharing and governance for assets beyond a single workspace.

Exam Tip: If the prompt says “standardize preprocessing across multiple pipelines,” think component. If it says “share approved training environment across dev/test/prod workspaces,” think registry (and environment versioning).

Section 2.2: Compute strategy: clusters vs instances, serverless, quotas, sizing

Compute design is a high-frequency Domain 1 area because it impacts cost, iteration speed, and operational risk. The exam commonly contrasts compute instances (single VM for interactive development) with compute clusters (autoscaling pools for jobs). Compute instances are ideal for notebooks, debugging, and ad-hoc exploration. Compute clusters are built for repeatable training/inference jobs and pipeline steps, where you want scale-out and automatic scale-to-zero.

Serverless options may appear in questions emphasizing “no cluster management” or “quick experimentation,” but be careful: the correct answer still depends on constraints like network isolation, GPU needs, or organizational policies. If the scenario mentions multiple concurrent training runs or scheduled pipelines, clusters are typically the best fit because they handle parallelism and autoscaling more predictably.

  • Compute instance: interactive; user-centric; great for notebooks; can become a cost trap if left running.
  • Compute cluster: job-centric; autoscale; better for CI/CD and pipelines.
  • Sizing: match CPU/GPU, memory, and disk to workload; avoid overprovisioning.
  • Quotas: regional core/GPU limits can block deployments; plan early.

DP-100 often includes a subtle quota angle: you choose a GPU cluster, but the subscription lacks GPU quota in that region, so the design should either request quota increase or choose a different region/SKU. Another trap is picking a compute instance for production training because it “works,” but it fails scalability, multi-run scheduling, and governance expectations.

Exam Tip: When you see “cost control” plus “sporadic training,” pick a cluster with autoscale and scale-to-zero. When you see “data scientist needs a notebook,” pick a compute instance. When you see “many runs in parallel,” the answer is rarely a single instance.

Section 2.3: Data access patterns: RBAC, managed identity, networking, private endpoints

Security and access design is a classic DP-100 differentiator: many options look plausible, but only one aligns with Azure best practices. The exam expects you to recognize that Azure ML should access data using Azure RBAC and managed identities whenever possible, avoiding embedded keys and connection strings. A managed identity (system-assigned or user-assigned) lets the compute authenticate to storage securely, supporting rotation-free access and centralized policy enforcement.

Networking appears when prompts mention “no public internet,” “private network,” “exfiltration prevention,” or “regulated data.” In those scenarios, the correct design typically involves private endpoints for the workspace and dependent resources (storage, Key Vault, container registry), and careful outbound rules. The tricky part is that you must still enable required dependencies for training jobs (for example, pulling images from an Azure Container Registry). The exam may test whether you understand that “private” does not mean “no dependencies,” it means “dependencies over private links.”

  • RBAC: grant least privilege (Storage Blob Data Reader/Contributor, etc.) to identities used by compute/jobs.
  • Managed identity: preferred for data access; reduces secret management risk.
  • Private endpoints: keep traffic on private IPs; often required for regulated environments.
  • Common misstep: using SAS tokens in code for production pipelines.

A frequent exam trap is confusing workspace permissions with data-plane permissions. Even if a user can open Azure ML Studio, the job still needs storage permissions via identity/RBAC. Another trap is choosing “shared key” access because it is quick; DP-100 usually favors managed identity plus RBAC unless the question explicitly constrains you otherwise.

Exam Tip: If the scenario says “rotate secrets is difficult” or “no credentials in code,” it’s almost always managed identity + RBAC. If it says “must remain private,” add private endpoints and validate all dependent resources are reachable privately.

Section 2.4: Reproducibility: environments, dependencies, Docker/conda, versioning

Reproducibility is the backbone of credible ML, and DP-100 tests it directly. Azure ML jobs run in environments that define the runtime: Docker image, conda dependencies, and sometimes custom setup. The exam wants you to design so that someone can rerun training months later and get the same dependency stack. Practically, this means pinning package versions (not “latest”), versioning environments, and avoiding “it works on my notebook” dependency drift.

MLflow is often the tracking layer: you log parameters, metrics, and artifacts for comparison. But MLflow tracking alone doesn’t guarantee reproducibility if the environment changes. The strongest design pairs MLflow run history with a versioned Azure ML environment and versioned data assets. In pipelines, each component should declare its environment so that changes are explicit and reviewable.

  • Environment versioning: publish v1, v2, etc.; update only when needed.
  • Conda vs Docker: conda is common for Python deps; Docker is needed for full OS-level control.
  • Pin dependencies: exact versions reduce non-determinism across runs.
  • Artifact logging: log model files, preprocessing objects, and evaluation outputs.

Common traps include using unpinned dependencies, relying on a compute instance’s local environment, or updating a single environment in-place so old runs become unreproducible. Another trap: confusing “registered model” with full reproducibility—models need lineage to data and environment as well.

Exam Tip: When the question emphasizes “auditability” or “recreate the run,” choose answers that combine: versioned data asset + versioned environment + tracked runs (MLflow/Azure ML). One element alone is usually insufficient.

Section 2.5: Data preparation design: splits, leakage prevention, labeling basics

Data preparation questions on DP-100 are usually about designing the approach rather than writing code. The exam expects you to prevent data leakage, choose correct split strategies, and understand labeling workflow basics. Leakage happens when training has access to information that would not be available at prediction time (for example, aggregations computed using the full dataset, or target-derived features). In Azure ML designs, leakage prevention often translates into “fit preprocessing only on training data,” encapsulate prep in a pipeline step, and ensure consistent transforms for validation/test.

Splitting strategy must match the data’s structure. Random splits work for i.i.d. data, but time series commonly requires temporal splits, and user-level data may require group-aware splits to prevent the same entity appearing in both train and test. In scenario questions, look for hints like “future data,” “sessions,” “patients,” or “devices”—those are cues that random split is a trap.

  • Train/val/test: use validation for tuning; keep test set untouched for final evaluation.
  • Stratification: important for imbalanced classification to preserve label distribution.
  • Group/time splits: prevent entity or temporal leakage.
  • Labeling basics: plan human-in-the-loop labeling, quality checks, and versioned labeled datasets.

Labeling appears in Domain 1 as process design: how to collect labels, track labeled data versions, and integrate labeling outputs into training. The exam may probe whether you’ll treat labeled data as a versioned asset and whether you plan for inter-annotator disagreement or label quality. Avoid answers that assume labels are perfect or static.

Exam Tip: If the prompt mentions “predict next month,” “forecast,” or “future,” avoid random split. If it mentions “multiple records per customer/patient,” avoid record-level split and prefer group split to prevent leakage.

Section 2.6: Domain 1 exam-style practice: scenario-based design decisions and rationales

In Domain 1 scenarios, you will rarely be asked for a single isolated fact. Instead, you’ll be given a partial solution design and asked what to change or choose. Your winning approach is to identify the dominant constraint (security, scale, reproducibility, or reuse), then map it to the correct Azure ML asset or configuration. If two answers both “work,” pick the one that is most aligned with enterprise best practices: least privilege, managed identities, versioning, and scalable compute.

Typical design signals and the matching rationale:

  • “Reuse across multiple workspaces/teams” → choose registries for sharing approved components/environments/models; avoid copying assets manually.
  • “Track exactly which data trained the model” → choose versioned data assets (not just a datastore path), plus MLflow/experiment tracking for run lineage.
  • “Interactive exploration and debugging” → compute instance; but switch to clusters for scheduled jobs/pipelines and parallel runs.
  • “No credentials in code / rotate-free access” → managed identity + RBAC; avoid SAS/shared keys unless explicitly required.
  • “Private network only” → private endpoints for workspace and dependencies; ensure storage/ACR/Key Vault are reachable privately.
  • “Reproduce a run months later” → pinned dependencies in a versioned environment + versioned data asset + logged artifacts/parameters.

Common elimination strategy: remove options that violate a stated policy (for example, “no public internet” eliminates public endpoints; “no secrets in code” eliminates connection strings). Then compare remaining options for operational simplicity (autoscale clusters vs always-on VMs) and governance (versioned assets and registries vs ad-hoc paths).

Exam Tip: When stuck between two plausible answers, choose the one that improves lineage (versioning, tracking, reusable assets) and least privilege (RBAC, managed identity). DP-100 consistently rewards designs that are production-ready, not just functional.

Chapter milestones
  • Choose the right Azure ML components for a solution design
  • Set up data access, compute, and environments correctly
  • Design feature engineering, data prep, and governance approach
  • Domain 1 practice set: design-and-prepare exam questions
Chapter quiz

1. You are designing an Azure ML solution for a regulated team. Training must run in a managed compute cluster, read data from an Azure Data Lake Storage Gen2 account, and the solution must avoid storing secrets in code. The workspace and storage are in the same subscription. Which approach best satisfies the requirement?

Show answer
Correct answer: Grant the Azure ML workspace managed identity RBAC access (for example, Storage Blob Data Reader) to the ADLS Gen2 account and use identity-based access from jobs
Using managed identity + RBAC is the recommended design for secretless, auditable access and aligns with Domain 1 identity/governance expectations. Embedding account keys in code or environment variables violates the “avoid secrets in code” requirement and increases leakage risk. Storing a long-lived SAS token in source control is also a secret management anti-pattern and is harder to rotate and govern.

2. A data science team must train models in an Azure ML workspace where outbound internet access is prohibited. They still need to pull curated training data and register datasets for repeatable experiments. Which design choice best aligns with these constraints?

Show answer
Correct answer: Use private endpoints to connect the workspace to the storage account and keep data as versioned Azure ML data assets
Private endpoints support network isolation (“no public internet”) while still enabling workspace-to-storage connectivity, and versioned data assets support repeatability/auditability. Copying data to node local disks breaks governance and repeatability because the data used by a run is not centrally tracked/versioned. Public network access, even with SAS, conflicts with the scenario’s prohibition on public internet access and increases exposure.

3. You need reproducible training runs that can be promoted from dev to prod. The team uses MLflow to compare experiments and requires that the same dependencies are used across runs, even months later. Which action most directly supports this requirement in Azure ML?

Show answer
Correct answer: Create and reference a versioned Azure ML environment (conda/Docker) in the job definition and track runs with MLflow
A versioned Azure ML environment (or image) referenced by jobs is central to reproducibility and promotion, and MLflow tracking supports experiment comparison. Installing packages at runtime can yield different dependency graphs over time and is a common cause of non-reproducible runs. Using an unpinned default curated environment can change between updates and does not guarantee the same dependencies months later.

4. A company wants to operationalize feature engineering so that training and inference use the same transformations. They also need governance to reduce data leakage and ensure the transformations are reviewable and repeatable. Which approach best meets these goals in Azure ML?

Show answer
Correct answer: Implement feature engineering as a reusable Azure ML pipeline component (or pipeline step) and version the pipeline assets
Reusable pipeline components/steps make transformations consistent across runs, versionable, and reviewable, which supports governance and reduces leakage risk. Notebook-only, exported CSV workflows are harder to audit and often lead to drift between what was done and what is repeated later. Doing transformations only at scoring time creates a train/serve skew (training doesn’t see the same features as inference) and increases deployment risk.

5. You are selecting Azure ML compute for a team that will run many short-lived experiments during business hours and wants to minimize cost when idle. Jobs must be queued and executed in parallel when demand spikes. Which compute option should you choose?

Show answer
Correct answer: Azure ML compute cluster with autoscaling and a minimum node count of 0
A compute cluster supports job scheduling/queuing and parallelism, and autoscaling to zero minimizes idle cost—common Domain 1 design guidance. Compute instances are primarily for individual interactive development and are not ideal as a shared scalable training target. A fixed-size cluster can meet performance needs but wastes cost during idle periods and does not satisfy the cost-minimization requirement.

Chapter 3: Explore Data and Run Experiments (Domain 2)

Domain 2 of DP-100 tests whether you can move from “I have data” to “I have trustworthy experimental evidence,” using Azure Machine Learning tooling in a repeatable way. This chapter focuses on what the exam expects you to do in practice: perform exploratory data analysis (EDA) to validate training readiness, run experiments with Azure ML jobs, track everything with MLflow, and use AutoML responsibly while interpreting metrics correctly.

A recurring exam theme is that Azure ML is not just a notebook environment; it is an experiment platform with assets (data, code, environment), compute, tracked runs, and lineage. If a scenario asks how to make results repeatable or auditable, the correct answer usually involves formalizing inputs/outputs (data assets, job inputs/outputs), capturing parameters/metrics, and attaching tags/metadata for traceability. Another pattern: many questions are designed to see whether you recognize “local notebook success” does not equal “cloud job reproducibility.”

Exam Tip: When you see keywords like reproducible, traceable, compare runs, audit, governance, think: job-based execution (not ad-hoc), MLflow tracking, tags, and registered assets (data/model/environment) rather than manual file handling.

Practice note for Perform EDA and validate data quality for training readiness: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Run experiments using Azure ML jobs and MLflow tracking: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Use AutoML responsibly and interpret metrics: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Domain 2 practice set: experiments and data exploration questions: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Perform EDA and validate data quality for training readiness: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Run experiments using Azure ML jobs and MLflow tracking: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Use AutoML responsibly and interpret metrics: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Domain 2 practice set: experiments and data exploration questions: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Perform EDA and validate data quality for training readiness: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Run experiments using Azure ML jobs and MLflow tracking: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Sections in this chapter
Section 3.1: Exploratory data analysis: distributions, missing data, outliers, drift signals

Section 3.1: Exploratory data analysis: distributions, missing data, outliers, drift signals

On DP-100, EDA is less about fancy plots and more about proving the dataset is usable for training and that your evaluation will be meaningful. Expect scenario questions that hint at quality issues (unexpected nulls, skewed labels, rare categories, extreme values) and ask what you should do next. Your mental checklist should include: target distribution (class balance or regression range), feature distributions (skew, cardinality), missing data patterns (MCAR/MAR hints), outliers (data errors vs real rare events), and leakage risk (features that encode the label or future information).

Drift signals may appear even before deployment. If training data comes from multiple time periods or sources, compare distributions across splits or segments (e.g., last month vs this month). The exam often expects you to detect that random splitting is wrong when time ordering matters. If you ignore temporal structure, you can “predict the past” and inflate metrics.

  • Check missingness by column and by segment (e.g., by region). Segment-level missingness can indicate upstream pipeline issues.
  • Check outliers with robust summaries (median/IQR) and validate units (e.g., age=999).
  • Look for label imbalance and choose metrics accordingly (accuracy is a trap for imbalanced classification).
  • Validate train/test split strategy: random split for i.i.d. data; time-based split for time-dependent data.

Exam Tip: If the stem mentions “high accuracy but poor minority performance,” the correct fix is typically metric selection (AUC/PR, F1, recall) and stratified sampling—not just “collect more data.”

Common trap: Treating outlier removal as always beneficial. In fraud/anomaly-like tasks, outliers can be the signal. The exam rewards answers that first verify whether extreme values are invalid records, measurement errors, or valid rare cases.

Section 3.2: Azure ML notebooks and SDK v2 workflow: jobs, inputs/outputs, artifacts

Section 3.2: Azure ML notebooks and SDK v2 workflow: jobs, inputs/outputs, artifacts

DP-100 expects you to know the modern Azure ML SDK v2 workflow: author code locally/in notebooks, then submit an Azure ML job to remote compute with declared inputs/outputs. The exam tests whether you understand what belongs inside the job (training script), what should be parameterized (hyperparameters, paths), and how artifacts are captured (job outputs, logs, registered models).

In SDK v2, you typically define a command job with an environment, compute target, code directory, and inputs/outputs. Inputs can reference data assets (recommended) and outputs can be mounted/uploaded to storage and automatically tracked as artifacts. Many questions revolve around “my notebook works, but the job can’t find the file.” That usually means your code relies on local paths instead of job inputs.

  • Use inputs for data locations and configs; avoid hardcoding absolute paths.
  • Use outputs for models and metrics files so Azure ML captures them as artifacts.
  • Pin environments (conda/docker) to avoid “it worked yesterday” dependency drift.
  • Choose compute appropriately: CPU for basic training/EDA; GPU for deep learning; consider cost and queue time.

Exam Tip: If the question asks how to ensure the training script can run identically on different computes, the best answer is usually “package code + environment + declared inputs into a job,” not “rerun the notebook on the new cluster.”

Common trap: Confusing datastores with data assets. Datastores are storage connections; data assets are versionable references used for repeatable jobs. When the exam emphasizes repeatability and versioning, pick data assets.

Section 3.3: MLflow in Azure ML: tracking URI, runs, parameters, metrics, model logging

Section 3.3: MLflow in Azure ML: tracking URI, runs, parameters, metrics, model logging

MLflow is a core “how” for Domain 2: tracking experiments and comparing runs. In Azure ML, MLflow tracking is integrated so that runs created by Azure ML jobs can be viewed in Studio, and your code can log parameters, metrics, and artifacts using the MLflow API. The exam often checks whether you know what should be logged and why: parameters (hyperparameters and dataset versions), metrics (evaluation values), and models (for later registration/deployment).

A common scenario: you migrate from local MLflow to Azure ML and need to ensure logs appear under the correct workspace experiment. That’s about setting the tracking URI (or using the Azure ML job context where it’s set automatically). The principle tested: tracking must be centralized and tied to the run, not written to local disk only.

  • Log parameters for anything you might tune or need to reproduce (learning rate, max_depth, feature set).
  • Log metrics with clear names (e.g., auc, f1, rmse) and at the right granularity (per-epoch if needed).
  • Log artifacts such as confusion matrices, feature importance plots, and serialized preprocessing objects.
  • Log models with MLflow model flavors (e.g., sklearn, pyfunc) to standardize packaging.

Exam Tip: If asked how to compare experiments across teams, choose MLflow tracking with consistent metric names and tags. Consistency is what makes filtering and run comparison possible in Studio.

Common trap: Logging metrics only in stdout. Azure ML captures logs, but MLflow metrics are structured and queryable; DP-100 questions about “analyze and compare” usually imply MLflow logging.

Section 3.4: Experiment management: tags, lineage, datasets vs data assets, reproducibility

Section 3.4: Experiment management: tags, lineage, datasets vs data assets, reproducibility

Experiment management is where DP-100 blends engineering discipline with ML practice. The exam wants you to demonstrate control over lineage: which code version, which data version, which environment, which parameters produced a given model. Azure ML provides this through jobs, MLflow runs, and metadata such as tags and properties.

Tags are lightweight but powerful. They help you filter and compare runs by scenario dimensions: “baseline vs tuned,” “feature_set=v2,” “data=customer_churn_2026_02,” “split=time_based,” or “region=EU.” If a question asks how to quickly locate runs matching a condition, tags are typically the correct mechanism.

You also need to distinguish older “datasets” terminology from newer “data assets.” DP-100 increasingly aligns with SDK v2 concepts: data assets are versioned entities (URI file/table) that can be used as job inputs and referenced later for auditability.

  • Prefer data assets with versions for repeatable training and easy rollback.
  • Capture lineage by using job inputs/outputs instead of reading arbitrary blob paths in code.
  • Use tags to encode experiment intent; use parameters for tunable values.
  • Record split strategy and random seeds to reduce “non-reproducible improvements.”

Exam Tip: When the question emphasizes “reproduce the model from last quarter,” the best answer is not just “save the model file.” You need the data version + code/environment + logged parameters/metrics (lineage end-to-end).

Common trap: Treating “registering a model” as equivalent to “tracking an experiment.” Registration is an output lifecycle step; experiment tracking is the evidence trail. The exam may offer both—choose the one that matches the goal (compare runs vs deploy a chosen artifact).

Section 3.5: AutoML: configuration, featurization, early stopping, model explainability outputs

Section 3.5: AutoML: configuration, featurization, early stopping, model explainability outputs

AutoML is tested as a productivity tool, not a magic button. DP-100 questions typically probe whether you can configure AutoML responsibly: choose the task type, define the primary metric, set validation strategy, manage featurization, and constrain compute/time. You should also know how to interpret outputs like best model, metrics, and explainability artifacts (when enabled).

Configuration levers that frequently appear in exam stems include: timeout settings, max trials, concurrency, early termination (early stopping), featurization on/off, and whether to allow ensembling. If data is imbalanced, you may need to choose a primary metric aligned to business risk (e.g., AUC/PR or recall) and possibly enable class weighting or sampling strategies (depending on task support).

  • Set an appropriate primary metric (e.g., normalized RMSE for regression; AUC/weighted F1 for classification).
  • Use early stopping to save cost when many candidates underperform.
  • Control featurization: automatic featurization can help, but you must validate it doesn’t introduce leakage or invalid transformations.
  • Enable model explainability outputs when governance requires interpretability; understand that explainability may add runtime.

Exam Tip: If the scenario says “limited budget/time,” look for constraints: timeout, max_trials, and early termination policies. AutoML questions often hide the cost-control requirement in one sentence.

Common trap: Choosing “accuracy” as the primary metric for imbalanced classification. The exam often expects AUC or F1 (or precision/recall focus) depending on whether false positives or false negatives are more costly.

Section 3.6: Domain 2 exam-style practice: troubleshooting runs, metric selection, experiment design

Section 3.6: Domain 2 exam-style practice: troubleshooting runs, metric selection, experiment design

Domain 2 questions are frequently operational: a run fails, metrics look suspicious, or experiments are hard to compare. The exam is less interested in your ability to memorize error codes and more interested in whether you apply a systematic approach. Start by classifying the issue: environment/dependency, data access, compute/quota, code bug, or evaluation design.

For troubleshooting runs, the highest-yield evidence sources are: job/run logs, standard output/error, and the captured environment details. If the stem mentions “works locally but fails on compute,” suspect missing dependencies, incorrect paths, missing secrets/permissions to storage, or reliance on interactive notebook state. If it mentions “job can’t read input,” suspect you didn’t declare an input (or used the wrong asset type/URI).

Metric selection is a top exam discriminator. Tie metrics to business impact and data properties: imbalanced classification favors PR-AUC/F1/recall; regression with outliers may prefer MAE over RMSE; ranking/recommendation needs MAP/NDCG-like thinking (if offered). Also confirm your validation strategy: k-fold for smaller i.i.d. datasets, holdout for large datasets, time-series split for temporal dependence.

  • Design experiments with a baseline, a single variable change, and consistent splits for fair comparison.
  • Use tags to encode the “hypothesis” of each run and to filter later.
  • Log confusion matrices or residual plots as artifacts to diagnose metric anomalies.
  • Prefer job-based runs with declared inputs/outputs to avoid “hidden state” variability.

Exam Tip: When answer choices include “change multiple things at once” versus “one controlled change,” choose controlled experiment design unless the stem explicitly demands rapid exploration under time pressure.

Common trap: Treating a higher metric as automatically better without checking leakage and split correctness. If performance is “too good to be true,” the exam often expects you to suspect leakage, duplicated records across splits, or time-based contamination.

Chapter milestones
  • Perform EDA and validate data quality for training readiness
  • Run experiments using Azure ML jobs and MLflow tracking
  • Use AutoML responsibly and interpret metrics
  • Domain 2 practice set: experiments and data exploration questions
Chapter quiz

1. You are preparing a training dataset for a DP-100 project. In a notebook EDA, you find that 12% of rows have missing values in a feature that is strongly correlated with the target. The exam requirement is to ensure the dataset is training-ready and the decision is auditable. What should you do first?

Show answer
Correct answer: Profile the missingness pattern (by time/segment/label), document it as a data quality issue, and decide on an imputation/removal strategy before training
DP-100 Domain 2 emphasizes EDA and data quality validation before experimentation. First, you should understand whether missingness is systematic (e.g., correlated with label or a segment), which affects bias and model validity, and then choose and document a remediation strategy. Dropping the feature (B) may unnecessarily remove predictive signal and does not address whether missingness indicates a collection issue. Proceeding without investigation (C) undermines training readiness and auditability; even if some algorithms can handle missing values, you still need to validate and document data quality.

2. A team trained a model successfully in an interactive notebook on a compute instance. In production review, they are asked to make the experiment reproducible and comparable across runs. Which approach best meets DP-100 expectations?

Show answer
Correct answer: Submit an Azure ML job that uses registered data assets and a defined environment, and track parameters/metrics with MLflow for each run
Domain 2 focuses on repeatable experiments using Azure ML jobs, formal inputs/outputs (data assets), defined environments, and MLflow tracking for parameters/metrics so runs are comparable and auditable. Manual file handling and spreadsheets (B) are not reliable lineage/traceability and are error-prone. An HTML export (C) documents code but does not ensure the same data/environment/compute were used, nor does it provide structured run comparison and tracking.

3. You run multiple Azure ML jobs and want to quickly filter and compare runs across different feature sets and business segments in the Azure ML UI using MLflow tracking. What should you add to each run to enable this filtering without changing your metric schema?

Show answer
Correct answer: Run tags (for example, feature_set=v2 and segment=retail)
In DP-100 Domain 2, tags are a standard way to attach metadata for traceability and filtering in experiment tracking (including MLflow/Azure ML). Metrics are intended for numeric measurements; encoding category identifiers as metrics (B) is not appropriate and may not work for filtering/comparison as expected. Output files (C) can hold details but are not easily queryable for filtering across runs in the UI, reducing comparability and audit efficiency.

4. A company uses AutoML for a binary classification problem where only 2% of cases are positive. The team reports high accuracy and wants to ship the model. What is the most appropriate exam-aligned next step?

Show answer
Correct answer: Review metrics suitable for class imbalance (for example, AUC, precision/recall, F1) and examine the confusion matrix and threshold behavior before deciding
DP-100 expects responsible AutoML usage and correct metric interpretation. With severe class imbalance, accuracy can be misleading (a trivial model can achieve ~98% accuracy by predicting the majority class). You should evaluate imbalance-robust metrics and error trade-offs (A). AutoML cross-validation does not make accuracy an appropriate primary metric in this scenario (B). Changing the ML task type (C) is incorrect because it misrepresents the problem and does not address evaluation quality.

5. You need to ensure experiment results are auditable. An internal auditor asks you to prove which dataset version and training code produced a specific run and its metrics. Which practice best satisfies this requirement in Azure ML?

Show answer
Correct answer: Use a job that references a versioned data asset as an input, store code in source control (or as a job artifact), and log parameters/metrics to MLflow for the run
Auditability requires lineage: versioned inputs (data assets), traceable code, and recorded parameters/metrics tied to a specific run (MLflow tracking) as emphasized in Domain 2. Re-running later (B) does not prove the original inputs/environment and may produce different results. Local timestamped files and email (C) do not provide governed lineage, are not reproducible, and are not tied to tracked run metadata.

Chapter 4: Train and Deploy Models (Domain 3)

Domain 3 of DP-100 rewards candidates who can translate “I trained a model” into repeatable, governable, and deployable engineering work in Azure Machine Learning. The exam is less interested in which algorithm you chose and far more interested in whether you can run training reliably (jobs, sweeps, distributed options), capture artifacts with MLflow, register/version models in the right place, and deploy appropriately (managed online endpoints vs batch endpoints), including basic MLOps patterns like blue/green rollout and monitoring-ready configuration.

This chapter maps directly to the DP-100 skills around training (command and sweep jobs), pipelines (components, caching, and binding data/compute), model management (MLflow model format and registries), and deployment (online and batch). The common trap is treating these as separate features. The exam tests whether you can connect them: a training job produces an MLflow-tracked model artifact, a pipeline composes reproducible steps, a registry makes the model discoverable and reusable across workspaces, and an endpoint consumes the registered model with correct auth and scaling choices.

Exam Tip: When you see a scenario, first classify it into (1) training orchestration, (2) artifact/model governance, or (3) serving pattern (online vs batch). Many answer choices are “true statements” but for the wrong layer.

Practice note for Train models with scripts, pipelines, and distributed options: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Register, version, and manage models with MLflow and registries: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Deploy to managed online endpoints and batch endpoints: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Domain 3 practice set: training, deployment, and MLOps exam questions: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Train models with scripts, pipelines, and distributed options: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Register, version, and manage models with MLflow and registries: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Deploy to managed online endpoints and batch endpoints: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Domain 3 practice set: training, deployment, and MLOps exam questions: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Train models with scripts, pipelines, and distributed options: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Sections in this chapter
Section 4.1: Training jobs: command jobs, sweep jobs, hyperparameter tuning

Section 4.1: Training jobs: command jobs, sweep jobs, hyperparameter tuning

In Azure ML SDK v2, the exam expects you to know how training is operationalized through jobs, not interactive notebook state. A command job runs a script (for example, train.py) in a defined environment on a defined compute target, with explicit inputs/outputs. This is the default pattern for reproducible training and is frequently the correct choice when a question emphasizes repeatability, auditability, and separation from developer machines.

Sweep jobs add hyperparameter tuning on top of a base command job. You define a search space (discrete choices, uniform/log-uniform ranges), an objective metric to maximize/minimize, and a sampling strategy (random, grid, Bayesian). The exam will often hide the key clue in the wording: “find best learning rate under budget” implies early termination and a tuning strategy; “run training once with fixed parameters” implies a command job.

Distributed options appear when data/model size or training time is the constraint. In practice, DP-100 focuses less on deep distributed theory and more on selecting the right compute (GPU vs CPU, multi-node), and ensuring your job can scale (for example, using frameworks that support distributed training). A frequent failure mode is selecting a GPU cluster but forgetting the environment dependencies or expecting multi-node behavior without configuring it in the job definition.

  • Common trap: Confusing MLflow tracking with hyperparameter tuning. MLflow logs metrics; a sweep job uses metrics to optimize across trials.
  • Common trap: Treating notebook runs as equivalent to jobs. Exam scenarios about production or team collaboration almost always prefer jobs.

Exam Tip: For hyperparameter tuning questions, look for (1) objective metric name (e.g., val_auc), (2) search space definition, and (3) early termination/budget control. If any of those are missing, a sweep is probably not the best answer.

Section 4.2: Pipelines: components, DAG design, caching, reuse, and data/compute binding

Section 4.2: Pipelines: components, DAG design, caching, reuse, and data/compute binding

Azure ML pipelines formalize multi-step workflows as a directed acyclic graph (DAG): data prep → feature engineering → training → evaluation → registration. On DP-100, pipelines are a “Domain 3 glue” concept: they connect training to deployment readiness. The key object is the component, a reusable step with well-defined inputs/outputs and an environment. Components can be command-based (script-driven) and are composed into a pipeline job.

Expect exam items about designing for reuse and cost control. If two models share the same preprocessing step, that step should be a component so you can reuse it across pipelines. Caching matters: when inputs and component code/environment haven’t changed, Azure ML can reuse previous outputs. This is frequently the correct reasoning when the question mentions “avoid re-running expensive preprocessing when data is unchanged.”

Data and compute binding is another tested nuance. You can bind pipeline inputs to data assets (for versioned datasets) and bind steps to compute (CPU cluster for preprocessing, GPU for training). Questions often include a hidden constraint: “limited GPU quota” or “team uses separate dev/test workspaces.” The best design binds only the training step to GPU and keeps the rest on cheaper CPU compute. Similarly, use data assets to make runs reproducible and auditable.

  • Common trap: Putting everything in one monolithic training script. Pipelines exist to isolate responsibilities and enable caching and independent scaling.
  • Common trap: Assuming pipelines are only for batch inference. DP-100 uses pipelines heavily for training orchestration.

Exam Tip: If the scenario highlights “repeatable multi-step workflow,” “reuse,” “audit,” or “only re-run what changed,” a pipeline with components and caching is usually the intended answer.

Section 4.3: Model management: MLflow model format, registration, stages, governance

Section 4.3: Model management: MLflow model format, registration, stages, governance

DP-100 assumes you can manage models as first-class assets. MLflow is central: you track runs (params, metrics, artifacts) and often log a model in MLflow’s standardized format (flavors such as sklearn, pytorch, or pyfunc). The practical outcome is portability: a downstream deployment can load the model consistently, and the lineage from data/code to model artifacts is preserved.

Registration turns an experiment artifact into a named, versioned model. The exam likes scenarios where multiple teams or workspaces need to share approved models; this is where a registry becomes important. A registry provides centralized model governance and controlled access, enabling promotion workflows that are harder to manage when models live only inside a single workspace.

Model “stages” are frequently tested conceptually (even if your organization implements approvals differently). You should think in terms of lifecycle: candidate → validated → production. Governance includes who can register/promote, what metadata is required (description, tags, metrics thresholds), and traceability (linking to training run ID and data versions). If a question includes compliance language (“must demonstrate which data version trained the production model”), the correct approach involves MLflow tracking plus registration with rich metadata.

  • Common trap: Confusing a model artifact in a run with a registered model. Only registered models are easily discoverable, versioned, and governed.
  • Common trap: Ignoring the need for a consistent inference interface. MLflow pyfunc can standardize prediction APIs across frameworks.

Exam Tip: When you see “share across workspaces,” “central catalog,” or “enterprise governance,” prefer a registry-backed model management answer over “download the model file from the run outputs.”

Section 4.4: Online deployment: managed endpoints, deployments, blue/green, auth, scaling

Section 4.4: Online deployment: managed endpoints, deployments, blue/green, auth, scaling

Managed online endpoints are for low-latency, request/response inference. On the exam, you must distinguish the endpoint (a stable URL/DNS and auth surface) from a deployment (a specific model+environment+compute configuration behind that endpoint). This distinction enables rollout strategies: you can attach multiple deployments to one endpoint and shift traffic between them.

Blue/green (or canary) rollout is a classic DP-100 pattern. You deploy a new version (green) alongside the existing one (blue), validate, then gradually route traffic. The exam often tests how to reduce risk while updating models: traffic splitting is usually the right concept, not “delete and redeploy.”

Authentication and authorization show up in scenario questions. Common options include key-based access and Azure Active Directory (Microsoft Entra ID) for stronger governance. If a question emphasizes “enterprise app access control,” “user identities,” or “no shared secrets,” Entra ID-based auth is typically favored. Scaling is also tested: choose instance types (CPU/GPU), configure autoscaling, and understand that model latency and concurrency drive scale needs. If the prompt mentions spiky traffic, autoscale is your friend; if it mentions predictable steady load, fixed replicas may be cost-stable.

  • Common trap: Confusing online endpoints with batch endpoints because both “deploy a model.” The latency requirement is the giveaway: online is real-time.
  • Common trap: Routing all traffic to a new deployment immediately. DP-100 scenarios about reliability expect blue/green or canary.

Exam Tip: If you see “minimize downtime,” “A/B testing,” or “rollback quickly,” choose a solution involving multiple deployments under one managed endpoint with traffic splitting.

Section 4.5: Batch scoring: batch endpoints, parallelization, cost/performance tradeoffs

Section 4.5: Batch scoring: batch endpoints, parallelization, cost/performance tradeoffs

Batch scoring targets asynchronous, high-throughput inference: scoring a file set in storage, nightly predictions, or backfills. Batch endpoints let you submit jobs that read input data, run parallel inference, and write outputs—without needing always-on serving infrastructure. On DP-100, batch is often the correct answer when the scenario mentions “thousands/millions of rows,” “daily schedule,” “no real-time response needed,” or “optimize cost.”

Parallelization is a key lever: batch scoring can split input into mini-batches and process across nodes/cores. Exam questions may test your understanding of throughput vs cost: more parallelism reduces wall-clock time but increases compute cost. Another lever is choosing CPU vs GPU. Many tabular models score efficiently on CPU; using GPU for batch inference may be wasteful unless the model is deep learning and GPU-optimized.

Operationally, batch endpoints fit MLOps patterns: you can pin to a registered model version, rerun a historical scoring job for reproducibility, and store outputs for downstream systems. A classic failure mode is choosing online endpoints for bulk scoring and then being surprised by cost and throttling. Another is ignoring data movement and I/O: for very large inputs, storage proximity and efficient formats matter more than raw compute.

  • Common trap: Using online endpoints for scheduled batch workloads. The exam expects you to pick batch endpoints for cost-efficient, asynchronous processing.
  • Common trap: Overprovisioning parallelism without considering queueing, storage limits, or budget.

Exam Tip: If the requirement includes “results can take minutes/hours” or “run overnight,” batch endpoints are usually intended. If it says “must respond under 200 ms,” it’s online.

Section 4.6: Domain 3 exam-style practice: deployment choices, failure modes, and fixes

Section 4.6: Domain 3 exam-style practice: deployment choices, failure modes, and fixes

Domain 3 questions often look like troubleshooting tickets or architecture decisions. Your job is to identify the layer where things went wrong: training job configuration, pipeline design, model registration, or endpoint deployment. A reliable elimination strategy is to map symptoms to the lifecycle stage. For example, “metrics are missing” is usually a tracking/logging issue (MLflow logging not executed, wrong metric name), while “endpoint returns 500” is often an environment or scoring script issue (missing dependency, incompatible model format, wrong input schema).

Deployment choice questions usually hinge on latency, traffic shape, and governance. If the scenario calls for real-time predictions, managed online endpoints win; if it calls for scheduled scoring, batch endpoints win. If it calls for safe rollout, blue/green traffic splitting wins. If it calls for cross-workspace sharing and approvals, registry-backed model management wins. The exam frequently includes distractors that are “technically possible” but operationally poor (for example, manually copying model files rather than registering them).

Know common failure modes and the fastest fix: (1) Training succeeded but cannot deploy: ensure you logged/registered an MLflow model with the right flavor and that the inference environment includes dependencies. (2) Pipeline reruns everything every time: check component boundaries and caching eligibility; ensure inputs are stable and versioned. (3) New model version causes regressions: use multiple deployments with traffic splitting and rollback rather than in-place replacement. (4) Batch scoring too slow or expensive: tune parallelism, choose CPU where appropriate, and reduce I/O bottlenecks.

  • Common trap: Choosing the most “advanced” feature instead of the right one. The exam rewards fit-for-purpose designs.
  • Common trap: Ignoring governance requirements. If the prompt mentions audit, approval, or separation of duties, model registration and registries are not optional.

Exam Tip: When two answers both “work,” pick the one that (a) is repeatable via jobs/pipelines, (b) preserves lineage via MLflow tracking and registration, and (c) matches serving requirements (online vs batch) with a safe rollout strategy.

Chapter milestones
  • Train models with scripts, pipelines, and distributed options
  • Register, version, and manage models with MLflow and registries
  • Deploy to managed online endpoints and batch endpoints
  • Domain 3 practice set: training, deployment, and MLOps exam questions
Chapter quiz

1. You need to run nightly training for a model in Azure Machine Learning. The process must be repeatable, capture metrics and artifacts, and allow hyperparameter tuning with minimal code changes. Which approach best aligns with DP-100 Domain 3 expectations?

Show answer
Correct answer: Create an Azure ML command job that runs a training script instrumented with MLflow, and optionally run it as a sweep job for hyperparameter tuning
Domain 3 emphasizes reliable training orchestration (command jobs/sweeps) and artifact tracking (MLflow). A command job (and sweep when needed) provides repeatability and captures outputs/metrics in a run. Training in an interactive notebook is not a repeatable orchestrated job and commonly misses governed artifact capture. Online endpoints are for serving inference, not for orchestrating training jobs or persisting trained models as governed artifacts.

2. A team has multiple Azure ML workspaces (dev/test/prod). They want a single, governed location to register and version MLflow models so the same model can be deployed from prod without re-registering in each workspace. What should they use?

Show answer
Correct answer: An Azure Machine Learning registry to register and version the MLflow model centrally for reuse across workspaces
DP-100 Domain 3 expects understanding of model governance: registries are designed for cross-workspace discovery, controlled reuse, and versioning. Storing models inside endpoint images couples serving with model storage and does not provide registry governance/versioning. Using a datastore folder is just file storage; it does not provide model registration semantics (versioning, lineage, stage/approval patterns) expected for governed model management.

3. You must deploy a model to serve low-latency predictions for an application. Traffic is variable, and you need autoscaling and the ability to perform a blue/green rollout with minimal downtime. Which deployment option is most appropriate?

Show answer
Correct answer: Deploy to a managed online endpoint with multiple deployments and allocate traffic percentages between them
Managed online endpoints are the DP-100-relevant serving pattern for real-time, low-latency inference, including autoscaling and blue/green (multiple deployments with traffic splitting). Batch endpoints are intended for asynchronous, large-scale scoring and do not provide the same low-latency request/response pattern. Calling a pipeline per request adds orchestration overhead and is not designed for real-time serving.

4. A retail company needs to score 50 million records each night and write predictions to storage. Individual requests are not latency-sensitive, but throughput and reliability matter. Which Azure ML deployment pattern should you recommend?

Show answer
Correct answer: Use a batch endpoint to run batch scoring jobs against the registered model and write outputs to a datastore
Batch endpoints are designed for high-throughput, asynchronous scoring on large datasets and integrate with job-based execution and output handling. Using an online endpoint for tens of millions of individual REST calls is inefficient and often cost-ineffective, and it introduces unnecessary request/response overhead. Manual scoring on a compute instance is not an operationally reliable or governed deployment approach for recurring production workloads.

5. You built an Azure ML pipeline with two components: data prep and training. The data prep output rarely changes, but training is run often with different hyperparameters. You want to reduce runtime and cost by avoiding unnecessary re-execution. What should you do?

Show answer
Correct answer: Enable component/pipeline step caching so the data prep step is reused when inputs and code have not changed
Pipeline/component caching is the DP-100-aligned way to avoid rerunning unchanged steps: when inputs, code, and environment are unchanged, Azure ML can reuse cached outputs (e.g., data prep), reducing cost and runtime. MLflow logging controls tracking and artifacts, not pipeline execution decisions, so disabling autologging will not enable step reuse. Registering a model before training does not make sense (the model artifact doesn’t exist yet) and does not control whether upstream steps run.

Chapter 5: Optimize Language Models for AI Applications (Domain 4)

Domain 4 of DP-100 focuses on making language-model solutions effective, safe, and operational in Azure Machine Learning. The exam expects you to choose the right approach (prompt-first vs RAG vs fine-tuning), prove quality with repeatable evaluation, and deploy with patterns that support monitoring and governance. This chapter coaches you to translate business requirements into an optimization plan, anticipate failure modes (hallucinations, prompt injection, data leakage), and pick Azure ML constructs that fit production realities.

On test day, many items are scenario-driven: you are given a workload (customer support, policy Q&A, summarization, code assistance), constraints (latency, cost, privacy, regional compliance), and a symptom (inaccurate answers, inconsistent tone, outdated knowledge). Your job is to select the smallest effective change. DP-100 commonly rewards choosing prompt-first and evaluation-first before jumping to fine-tuning. It also rewards solutions that separate experimentation (tracked runs, datasets, eval sets) from operationalization (managed endpoints, monitoring, responsible AI controls).

Exam Tip: When you see “enterprise knowledge,” “source citations,” or “must be grounded in internal docs,” default to Retrieval-Augmented Generation (RAG) unless the scenario explicitly requires changing model style/behavior beyond what prompts can reliably enforce.

This chapter aligns to the lesson outcomes: selecting an LLM approach, evaluating/optimizing/monitoring quality, and operationalizing with Azure ML deployment patterns. You’ll also practice the exam mindset for Domain 4: identify the optimization lever (prompt, retrieval, tuning), the evaluation evidence, and the governance control that closes the risk.

Practice note for Select an LLM approach: prompt-first vs fine-tune vs RAG: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Evaluate, optimize, and monitor LLM application quality: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Operationalize LLM apps with Azure ML deployment patterns: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Domain 4 practice set: LLM optimization exam questions: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Select an LLM approach: prompt-first vs fine-tune vs RAG: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Evaluate, optimize, and monitor LLM application quality: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Operationalize LLM apps with Azure ML deployment patterns: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Domain 4 practice set: LLM optimization exam questions: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Select an LLM approach: prompt-first vs fine-tune vs RAG: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Sections in this chapter
Section 5.1: Use-case framing: task definition, constraints, safety, and success criteria

Section 5.1: Use-case framing: task definition, constraints, safety, and success criteria

Start by framing the LLM application as an ML system with measurable outcomes, not a “chatbot.” DP-100 scenarios often include ambiguous requests like “make responses better.” Convert that into a task definition (Q&A with citations, classification, summarization, extraction), then specify constraints: latency (interactive vs batch), cost ceilings, privacy requirements (PII handling), and allowed tools (search, database lookups, APIs). This framing determines whether prompt-first, RAG, or fine-tuning is appropriate.

Define success criteria that can be evaluated repeatedly. For example: grounded Q&A might require “answers cite at least one retrieved source” and “factuality score ≥ threshold,” while summarization might require “no sensitive entities” plus “coverage of key points.” In Azure ML terms, you should be thinking about creating an evaluation dataset (gold questions/answers, policy rules) and tracking results (MLflow metrics/artifacts) so you can compare iterations.

Safety is part of the framing, not an afterthought. Identify likely harms: hallucinations (fabricated policy), data leakage (training data exposure), prompt injection (malicious user content), and toxicity. The exam frequently tests whether you choose controls appropriate to the risk: content filtering, system prompt constraints, grounding with approved documents, and human review for high-impact outputs.

  • Task: What does “correct” output look like?
  • Context: What information is allowed to influence the answer?
  • Constraints: Latency, throughput, cost, token limits, compliance.
  • Risks: Hallucination, toxicity, prompt injection, privacy.
  • Criteria: Metrics and thresholds you can compute and trend.

Exam Tip: If the scenario mentions “regulated,” “auditable,” or “must explain decisions,” prioritize designs that preserve traceability (retrieved sources, logged prompts/outputs with redaction) and enable post-hoc review.

Common trap: Treating “accuracy” as a single metric. For LLM apps, accuracy splits into factuality/grounding, instruction-following, safety, and format compliance. DP-100 questions often expect you to name or imply multiple dimensions of quality.

Section 5.2: Prompt engineering: system/context, few-shot, tools/function calling concepts

Section 5.2: Prompt engineering: system/context, few-shot, tools/function calling concepts

Prompt engineering is the first optimization lever because it’s fast, cheap, and reversible. DP-100 will test that you can structure prompts into roles and segments: a system message (policy and behavior), developer/context instructions (task framing, output schema), and user input (untrusted). Clear separation matters for safety and for resisting prompt injection: user text should not be allowed to override system constraints.

Few-shot prompting (providing exemplars) improves format adherence and style consistency. Use it when outputs must follow a template (JSON fields, bullet summaries, classification labels). However, be mindful of token cost and latency. On the exam, if a scenario says “the model sometimes returns invalid JSON,” the best first move is usually to tighten instructions and add a few-shot example—before fine-tuning.

Tools/function calling concepts appear increasingly in Azure-based LLM solutions: instead of “hallucinating,” the model can call a function (search, lookup, calculator, ticket creation). Your prompt should define when tools are allowed and what to do if a tool fails. In operational terms, tool calls create observability: you can log tool usage, arguments, and results as part of the run for debugging and evaluation.

  • System prompt: non-negotiable rules (safety, refusal policies, tone).
  • Context: trusted content (retrieved docs, policies, schema definitions).
  • Few-shot: examples that demonstrate correct output, edge cases.
  • Tool use: explicit triggers and fallback behaviors.

Exam Tip: When an option says “add more training data” but the issue is output formatting or instruction-following, eliminate it. Prompt restructuring and few-shot examples are typically the intended fix.

Common trap: Putting untrusted user content into the same block as system rules (or labeling it as “system”). DP-100 questions may describe prompt injection symptoms; the correct answer often involves separating roles, sanitizing inputs, and constraining tool use.

Section 5.3: Retrieval-augmented generation basics: chunking, embeddings, grounding, evaluation

Section 5.3: Retrieval-augmented generation basics: chunking, embeddings, grounding, evaluation

RAG is the default pattern when you need answers grounded in changing or proprietary knowledge (internal policies, product docs) without retraining the model. The core idea: convert documents into embeddings, store them in a vector index, retrieve top-k chunks for a query, and place those chunks into the prompt as trusted context. DP-100 expects you to understand the moving parts and the main tuning knobs.

Chunking is a frequent exam theme. Smaller chunks improve retrieval precision but can lose context; larger chunks improve context but may dilute relevance and waste tokens. Overlap between chunks can reduce boundary loss but increases storage and may retrieve redundant text. Choose chunk size based on document structure and the expected question granularity. The exam may describe “answers miss a key clause buried in a long PDF”; that’s a cue to adjust chunking/overlap and retrieval strategy (e.g., reranking).

Embeddings quality impacts retrieval more than many candidates expect. Ensure you embed the same content you later quote to the model, and keep preprocessing consistent (lowercasing, stripping boilerplate). Grounding is the key quality lever: instruct the model to answer only from retrieved context, cite sources, and explicitly say “not found” when context lacks an answer.

Evaluation for RAG is two-layered: retrieval evaluation (did we fetch relevant chunks?) and generation evaluation (did the model use them correctly?). Track both in experiments. For example, compute recall@k for retrieval and factuality/attribution for generation. Store retrieved passages as artifacts so you can reproduce failures.

  • Chunking: size, overlap, section-aware splitting.
  • Retrieval: top-k, filters (date/product), reranking.
  • Grounding: citations, refusal when unsupported.
  • Evaluation: retrieval metrics + answer quality metrics.

Exam Tip: If a scenario says “model answers confidently but wrong, despite docs existing,” suspect retrieval failure (bad chunking, poor embeddings, wrong filters) before blaming the base model.

Common trap: Treating RAG as a one-time setup. DP-100 scenarios may mention “docs updated weekly.” The correct design includes re-indexing workflows and monitoring retrieval performance over time.

Section 5.4: Fine-tuning concepts: when to tune, data requirements, overfitting risks, cost

Section 5.4: Fine-tuning concepts: when to tune, data requirements, overfitting risks, cost

Fine-tuning changes model behavior by training on labeled examples. In DP-100, the key is knowing when it is justified. Fine-tune when you need consistent style, domain-specific writing patterns, or reliable adherence to specialized instructions that prompting cannot stabilize. Do not fine-tune just to “add knowledge” that changes frequently; RAG is better for that because it updates without retraining.

Data requirements are a common test point. You need high-quality examples that match production inputs and desired outputs, plus a held-out evaluation set. Noisy labels or misaligned examples can teach the model the wrong behavior. The exam may hint that you only have a small set of examples; then prompt-first or RAG is often the safer choice unless the task is narrow and well-defined (e.g., structured extraction with stable labels).

Overfitting risk increases when the dataset is small, repetitive, or too narrow. Symptoms include brittle performance on unseen phrasing and degraded safety/instruction-following outside the tuned domain. Cost is not just training compute: it includes data curation, governance review, regression testing, and ongoing maintenance when requirements shift.

From an Azure ML perspective, fine-tuning should be treated as a tracked experiment: log parameters (learning rate, epochs), data version, and evaluation metrics in MLflow. This traceability is often what DP-100 wants you to demonstrate: controlled iteration rather than ad-hoc “we trained a new model.”

  • Tune for behavior/style and consistent task performance.
  • Prefer RAG for changing factual knowledge.
  • Control overfitting with validation, early stopping, diverse examples.
  • Track runs, datasets, and comparisons with MLflow.

Exam Tip: If the business asks for “always respond in our brand voice” and “follow a strict response template,” fine-tuning becomes more plausible—but still verify whether few-shot prompting meets the requirement first.

Common trap: Assuming fine-tuning automatically reduces hallucinations. Without grounding, a tuned model can hallucinate more confidently. Hallucination control is usually achieved via RAG + refusal policies + evaluation.

Section 5.5: LLM evaluation and monitoring: quality metrics, hallucinations, toxicity, feedback loops

Section 5.5: LLM evaluation and monitoring: quality metrics, hallucinations, toxicity, feedback loops

Evaluation is the backbone of optimization in Domain 4. The exam expects you to move beyond anecdotal testing and implement repeatable measurement. Establish an offline evaluation set representing real user intents and edge cases (ambiguous questions, adversarial prompts, sensitive topics). Then measure multiple dimensions: task success (accuracy, extraction F1), grounding (citation correctness, supported-claim rate), safety (toxicity, policy violations), and operational performance (latency, cost per request).

Hallucinations require targeted evaluation. For grounded Q&A, track whether each claim is supported by retrieved context. If your system requires citations, evaluate citation presence and correctness, not just whether the answer “sounds right.” Toxicity and safety evaluation should include both user input and model output, because prompt injection or harmful content can flow through retrieval.

Monitoring continues after deployment. In Azure ML deployment patterns, you should capture inputs/outputs (with redaction for PII), model/version identifiers, retrieval traces (top-k documents), and user feedback signals. Feedback loops should be controlled: log user thumbs-up/down, route a subset for human review, and use the curated data to update prompts, retrieval, or fine-tuning datasets.

Exam Tip: When asked how to improve quality “in production,” pick options that include monitoring + evaluation pipelines (drift/quality regression) rather than only changing the prompt. DP-100 often tests lifecycle thinking.

  • Offline eval: gold set, edge cases, adversarial tests.
  • Online monitoring: latency, failure rates, safety flags, feedback.
  • Regression testing: compare prompt/RAG/tuned versions before rollout.
  • Governed feedback: human-in-the-loop for high-impact outputs.

Common trap: Using only a single automated judge score. The exam may present “LLM-as-a-judge” as an option; it can help, but you still need task-grounded metrics and spot-checking to prevent systematic bias or blind spots.

Section 5.6: Domain 4 exam-style practice: choose best optimization path and governance controls

Section 5.6: Domain 4 exam-style practice: choose best optimization path and governance controls

Domain 4 questions typically bundle three decisions: (1) optimization path (prompt-first vs RAG vs fine-tune), (2) evidence (how you will evaluate), and (3) governance controls (how you will reduce risk). The fastest way to score points is to identify what is actually broken: knowledge freshness, retrieval relevance, instruction adherence, or safety. Then choose the smallest change that addresses the failure while meeting constraints.

Use the following elimination strategy. If the scenario requires up-to-date internal knowledge or citations, eliminate “fine-tune to add the latest information” and choose RAG with re-indexing. If the scenario complains about inconsistent format, eliminate RAG-only answers and focus on prompt structure and few-shot examples. If the scenario needs a specialized writing style across many intents and prompting isn’t stable, fine-tuning becomes the likely best path—provided you have enough curated examples and a validation plan.

Governance controls are not optional in DP-100. Look for controls aligned to the risk: content filtering for toxicity, prompt injection defenses (role separation, tool constraints), data protection (PII redaction, least-privilege access to indexes), and auditability (logging prompt/version, retrieval citations). In Azure ML operationalization, your deployment choice should support these: managed online endpoints for low-latency apps, batch endpoints for offline scoring, and pipeline-based evaluation for regression checks before promotion.

Exam Tip: When two answers both “improve quality,” pick the one that also improves traceability and repeatability (tracked eval runs, versioned data, reproducible retrieval context). DP-100 rewards MLOps discipline even for LLM apps.

  • Prompt-first: fastest iteration; best for format/tone/instructions.
  • RAG: best for private, changing knowledge; requires chunking/retrieval tuning.
  • Fine-tune: best for consistent behavior/style; highest data and governance cost.
  • Controls: logging, redaction, access control, safety evaluation, human review.

Common trap: Choosing the most complex approach. Many exam items are designed so the correct answer is “improve prompts and add evaluation,” not “fine-tune a larger model.” Complexity increases cost and risk unless the scenario explicitly demands it.

Chapter milestones
  • Select an LLM approach: prompt-first vs fine-tune vs RAG
  • Evaluate, optimize, and monitor LLM application quality
  • Operationalize LLM apps with Azure ML deployment patterns
  • Domain 4 practice set: LLM optimization exam questions
Chapter quiz

1. A retail company is building an internal policy Q&A assistant in Azure ML. Requirements: answers must be grounded in the latest internal HR documents, include citations to the source passages, and avoid using the model’s general knowledge when the answer is not in the documents. The team currently uses only a system prompt and observes confident but incorrect answers after policy updates. What should you implement first?

Show answer
Correct answer: Retrieval-Augmented Generation (RAG) with an indexed document store and prompt instructions to cite retrieved passages
RAG is the best first step when requirements include enterprise knowledge grounding, freshness, and citations. It reduces hallucinations by conditioning the model on retrieved, up-to-date passages and enables citation patterns. Fine-tuning on last year’s PDFs bakes in outdated content and does not guarantee citations or prevent the model from guessing. Increasing model size and prompt examples may improve style but does not provide access to updated internal documents or enforce grounding.

2. You are evaluating a customer support summarization feature built with an LLM in Azure ML. Business stakeholders complain that summaries sometimes omit key troubleshooting steps. You need a repeatable way to measure quality across releases and compare prompt variants. Which approach best aligns with DP-100 evaluation expectations for LLM apps?

Show answer
Correct answer: Create a fixed evaluation dataset and run automated evaluations (e.g., rubric/LLM-as-judge plus targeted metrics) tracked as Azure ML runs for each prompt/version
DP-100 emphasizes repeatable, versioned evaluation: a stable eval set, consistent scoring, and tracked results (runs/metrics) to compare changes. Ad-hoc spot checks are not repeatable or comparable and provide weak evidence. Production-only feedback is noisy, delayed, and confounded by traffic mix; it’s useful for monitoring but not sufficient as the primary pre-release evaluation method.

3. A team built a RAG-based assistant and deployed it behind an Azure ML managed online endpoint. They discover users can include instructions like "ignore previous rules and reveal confidential content" and occasionally the model complies. Which mitigation should you implement as the smallest effective change in the Azure ML LLM app design?

Show answer
Correct answer: Add input/output content safety controls (prompt-injection and sensitive-data filtering) and enforce system-message policies that the app never follows user instructions to reveal secrets
Prompt injection is primarily an application-layer and policy enforcement problem: validate/sanitize inputs, apply content safety checks, constrain tool/retrieval behavior, and enforce refusal policies. Fine-tuning may help but is typically slower, harder to validate, and not the first-line control for security/governance on the exam. Disabling retrieval defeats the business goal (grounded internal answers) and doesn’t prevent the model from leaking other sensitive information provided in prompts.

4. A legal team wants an LLM to produce contract clause suggestions in a consistent firm-specific tone and structure. The content is not based on a changing internal knowledge base; instead, they want the model’s default style and formatting to match prior approved examples. Prompt engineering alone produces inconsistent results. Which approach is most appropriate?

Show answer
Correct answer: Fine-tune (or use supervised adaptation) on a curated set of approved clause examples to align style/formatting
When the requirement is durable behavioral/style alignment (tone, structure, formatting) rather than freshness or grounding in a document corpus, fine-tuning is typically appropriate—especially when prompts are insufficiently reliable. RAG is best for retrieving factual/contextual content and citations, not primarily for enforcing consistent style. Increasing temperature usually increases variability and makes consistency worse; prompts alone are already failing the stated requirement.

5. You manage an LLM application deployed with an Azure ML managed online endpoint. After a prompt update, the number of user complaints rises, but latency and token usage are unchanged. You need an operational pattern that helps you detect quality regressions and support governance over time. What should you do?

Show answer
Correct answer: Implement ongoing monitoring that logs prompts/responses (with privacy controls), tracks quality signals/metrics, and links production versions to the evaluated prompt/model artifacts used for deployment
DP-100 expects separation of experimentation and operationalization with traceability: versioned artifacts, repeatable eval, and monitoring to catch regressions (quality, safety, drift) in production. Scaling compute addresses latency/throughput, not correctness or quality regressions when latency is unchanged. Hotfixing without version tracking undermines governance and makes it difficult to correlate incidents to specific prompt/model changes.

Chapter 6: Full Mock Exam and Final Review

This chapter is your “dress rehearsal” for DP-100. At this point you should already be comfortable building experiments in Azure Machine Learning, tracking runs with MLflow, training models with the SDK v2, and deploying to managed online endpoints or batch endpoints. What separates a pass from a near-miss is execution under pressure: pacing, identifying what the question is truly testing, and avoiding common traps (especially around identities, data access, model/asset versioning, and deployment configuration).

DP-100 questions rarely reward memorization alone. They reward recognizing patterns: “This is a managed identity + RBAC issue,” “This is an MLflow autologging mismatch,” “This pipeline step needs an output binding,” or “This deployment needs the right traffic/scale settings.” In this chapter you will run a two-part mock exam, analyze weak areas, and finish with a high-yield checklist you can review in minutes before you sit for the exam.

Exam Tip: Treat this chapter like an operational playbook. Use it to practice your process: read, classify (design vs troubleshooting), eliminate, decide, and move on. DP-100 rewards consistent, repeatable decision-making more than last-second inspiration.

Practice note for Mock Exam Part 1: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Mock Exam Part 2: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Weak Spot Analysis: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Exam Day Checklist: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Mock Exam Part 1: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Mock Exam Part 2: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Weak Spot Analysis: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Exam Day Checklist: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Mock Exam Part 1: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Mock Exam Part 2: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Weak Spot Analysis: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Sections in this chapter
Section 6.1: Mock exam instructions: pacing, marking, and review workflow

Section 6.1: Mock exam instructions: pacing, marking, and review workflow

Your goal in the mock exam is not just “get a score.” Your goal is to validate that your process holds up across all DP-100 domains: data preparation and feature engineering, experiment tracking, training, deployment, and governance/responsible AI. Before you start, set a timer and commit to a pacing rule. A practical pacing target is to keep a steady rhythm and avoid spending disproportionate time on any single scenario.

Use a three-pass workflow. Pass 1: answer everything you can confidently, quickly. Pass 2: revisit marked items and do deeper elimination. Pass 3: final review for unforced errors (misread constraints, wrong service, wrong identity). The exam often hides the constraint in one clause: “without exposing secrets,” “minimal operational overhead,” “no public endpoint,” or “must reproduce training.” Those clauses are where the correct option lives.

  • Pass 1 rule: If you can eliminate to one best answer within about a minute, commit and move on.
  • Marking rule: Mark items where you’re stuck between two plausible answers or where a hidden constraint may exist (identity, networking, cost, governance).
  • Review rule: On review, re-read only the stem and constraints first; don’t re-argue the entire question unless needed.

Exam Tip: When you mark an item, write a short “why it’s hard” note on scratch paper (e.g., “endpoint auth vs data access,” “MLflow registry vs workspace registry”). That prevents circular re-reading and forces targeted verification.

Common trap: changing correct answers late. In DP-100, your first answer is often right if it aligns with the constraint and the Azure ML feature set. Only change an answer if you can name the specific feature or limitation that makes your first choice impossible.

Section 6.2: Mock Exam Part 1: mixed-domain questions (caselets + standalones)

Section 6.2: Mock Exam Part 1: mixed-domain questions (caselets + standalones)

Part 1 is designed to simulate the reality of DP-100: you’ll see caselets (multi-question scenarios) mixed with standalones. Your job is to identify which domain is being tested and which Azure ML artifact is the “lever” you must adjust (compute, datastore/data asset, environment, component, pipeline, endpoint, registry, identity, or monitoring).

Caselets often test consistency across multiple decisions. For example, once a scenario implies enterprise governance, expect follow-ups that require: using managed identity, private networking, central registry, versioned assets, and auditable tracking. When you choose an option in the first question, anticipate whether it breaks later constraints (for example, selecting local secrets when the scenario requires “no secrets in code” will poison multiple answers).

Standalones frequently target “small but fatal” misunderstandings: the difference between a data asset vs a datastore, a model vs an MLflow model artifact, or a managed online endpoint vs a batch endpoint. They also probe SDK v2 patterns: defining command jobs, creating components, wiring inputs/outputs, and promoting assets to a registry for reuse.

  • When the question focuses on collaboration and reuse, think: registries, components, environments, and versioning.
  • When it focuses on “runs,” metrics, and reproducibility, think: MLflow tracking URI, experiment naming, tags, parameters, artifacts, and lineage.
  • When it focuses on operationalizing predictions, think: endpoint type (online vs batch), authentication, scaling, and monitoring.

Exam Tip: In caselets, build a “scenario map” in your head: data location, compute location, identity model, and deployment target. Then every question becomes “which choice keeps the map consistent.”

Common trap: over-engineering. DP-100 often prefers the simplest managed feature that satisfies constraints. If a managed online endpoint with autoscale meets requirements, a custom AKS cluster is rarely the best answer unless the question explicitly demands Kubernetes customization.

Section 6.3: Mock Exam Part 2: mixed-domain questions (troubleshooting + design)

Section 6.3: Mock Exam Part 2: mixed-domain questions (troubleshooting + design)

Part 2 shifts emphasis: fewer “what is the best service” prompts and more troubleshooting and design correction. The exam expects you to recognize failure modes quickly—especially around authentication/authorization, environment reproducibility, packaging/scoring, and MLflow tracking behavior.

Troubleshooting questions often provide symptoms such as “permission denied,” “run not logged,” “deployment unhealthy,” or “batch scoring output missing.” Your first step is to classify the failure: identity/RBAC vs network access vs compute quota vs environment dependency vs endpoint configuration. Then pick the fix that addresses the root cause, not a downstream symptom.

Design questions in this part often test trade-offs: latency vs cost, simplicity vs governance, experimentation speed vs reproducibility. Azure ML has “correct” patterns for these. Examples include: using managed identity instead of keys, using registries for shared models/environments, using pipelines for repeatable training, and selecting batch endpoints for large asynchronous scoring.

  • Identity/RBAC pattern: If a job can’t read data in a storage account, check workspace managed identity or compute identity permissions (Storage Blob Data Reader/Contributor as appropriate) and whether the datastore uses credential passthrough or stored credentials.
  • Environment pattern: If runs fail inconsistently, suspect environment drift—pin package versions, use curated images where appropriate, and version your environment assets.
  • Deployment pattern: If an online endpoint is unhealthy, verify the scoring script entry point, model path, environment dependencies, and probe logs; if throughput is poor, look at instance type and autoscale rules.

Exam Tip: On troubleshooting items, eliminate answers that “add more tooling” without changing the failing layer. DP-100 is rarely testing whether you can bolt on a new service; it is testing whether you can fix the Azure ML configuration already in play.

Common trap: confusing “model registration” with “endpoint deployment.” Registering a model (or logging an MLflow model) does not make it callable. Deployment requires an endpoint and a deployment configuration (compute SKU, scaling, auth, and scoring).

Section 6.4: Answer rationales map: objectives-to-questions crosswalk

Section 6.4: Answer rationales map: objectives-to-questions crosswalk

After you complete both parts, do not jump straight to the score. Build a crosswalk between each missed (or guessed) item and the DP-100 objective it represents. This is how you turn practice into predictable points. For each marked item, write: (1) the objective domain, (2) the key Azure ML concept, (3) the constraint you missed, and (4) the “tell” you should notice next time.

Use these objective buckets as your crosswalk headings: Design and prepare (data assets, feature engineering choices, responsible AI constraints), Explore and run experiments (SDK v2 jobs, notebooks, MLflow tracking, reproducibility), Train and deploy (pipelines/components, model registry, online vs batch endpoints, monitoring), Optimize language models (prompt iteration, evaluation, deployment patterns), and Governance/security (RBAC, managed identity, private endpoints, lineage).

When you write rationales, keep them operational. Example rationale style: “Correct because requirement is no secrets in code; managed identity + RBAC satisfies; key-based datastore auth violates constraint.” This style matches how DP-100 options are differentiated—by one requirement-breaking detail.

  • Pattern to look for: If two answers both “work,” choose the one that best matches Azure’s recommended managed pattern (registries, managed endpoints, MLflow tracking) and reduces operational overhead.
  • Red-flag words: “Always,” “must,” “only way,” and “manually” often indicate distractors unless explicitly required by the scenario.

Exam Tip: Track your misses by “cause,” not by “topic.” Common causes include: misreading constraint, confusing similarly named assets, forgetting default security posture, and not distinguishing training-time vs inference-time requirements.

This crosswalk becomes your final-week plan: you’re not re-studying everything—only the objective buckets where you repeatedly miss due to the same cause.

Section 6.5: Final review: high-yield Azure ML + MLflow mistakes and must-know commands

Section 6.5: Final review: high-yield Azure ML + MLflow mistakes and must-know commands

This final review focuses on high-yield mistakes that DP-100 regularly tests. Many candidates know the “happy path” but lose points on edge conditions: identity, versioning, and the boundary between Azure ML assets and MLflow artifacts.

High-yield Azure ML pitfalls include: mixing up datastores (connection) vs data assets (versioned reference), forgetting to version environments/components, assuming a run automatically logs everything without enabling MLflow autologging, and deploying without validating scoring dependencies. Another frequent miss is choosing the wrong endpoint type: online endpoints for low-latency requests; batch endpoints for large, asynchronous scoring over files/tables.

  • Must-know SDK v2 concepts: command job definition, inputs/outputs binding, environment specification, compute target selection, and asset versioning (data, model, environment, component).
  • Must-know MLflow concepts: tracking server/URI, experiments and runs, logging parameters/metrics/artifacts, model flavors, and the difference between logging a model vs registering/promoting it for deployment.
  • Must-know deployment concepts: managed online endpoint vs batch endpoint, authentication options, scaling/instance sizing, and where to find logs for failures.

Exam Tip: If an option mentions “store credentials in code/config,” treat it as suspicious unless the scenario explicitly permits it. DP-100 increasingly favors managed identity, RBAC, and secretless patterns.

Also review language model optimization patterns at a conceptual level: prompt design iteration, evaluation criteria, and safe deployment. The exam tends to test whether you can select an evaluation approach and deployment pattern that supports repeatability and governance, not whether you can recite prompt templates.

Final command-level readiness means you can recognize what tool is appropriate: Azure ML for jobs/pipelines/endpoints/registries; MLflow for tracking and packaging model artifacts; and role assignments/private networking for secure access. The exam won’t ask you to type long scripts, but it will test whether you can choose the correct action sequence and the correct artifact to create or update.

Section 6.6: Exam day strategy: environment check, timeboxing, and confidence plan

Section 6.6: Exam day strategy: environment check, timeboxing, and confidence plan

On exam day, your strategy should be boring and reliable. Start with an environment check: stable internet, a quiet space, and no distractions. If you’re testing remotely, complete the system check early and clear your desk. Then run a confidence plan: remind yourself of the core DP-100 pattern—identify the domain, find the constraint, choose the managed Azure ML feature that satisfies it.

Timeboxing is your safety net. Commit to forward motion. If you hit a question that feels like a deep rabbit hole, mark it and move on. The exam is designed to tempt you into overspending time on a small set of items. Your score improves more by collecting “easy points” elsewhere than by rescuing one complex item at the cost of several simpler ones.

  • First 10 minutes: settle in, read carefully, avoid early unforced errors.
  • Mid-exam: maintain pace; do not let one scenario dictate your entire timing.
  • Final review window: revisit only marked questions; verify constraints, then commit.

Exam Tip: When stuck between two options, ask: “Which one is more secure by default, more reproducible, and more aligned with Azure ML’s managed patterns?” DP-100 typically rewards that choice unless the scenario explicitly demands control or customization.

Common trap: confidence collapse after a tough section. DP-100 mixes difficulty; a challenging caselet does not predict the rest of the exam. Reset between questions: treat each one as independent, re-apply the same elimination method, and keep your pace steady. Your goal is not perfection—it is disciplined execution across all domains.

Chapter milestones
  • Mock Exam Part 1
  • Mock Exam Part 2
  • Weak Spot Analysis
  • Exam Day Checklist
Chapter quiz

1. You deploy a model to an Azure Machine Learning managed online endpoint. The scoring script reads a CSV from an Azure Storage account using the endpoint's managed identity. Requests return HTTP 500, and logs show: "AuthorizationPermissionMismatch" when accessing the blob. You must fix the issue with the least operational overhead. What should you do?

Show answer
Correct answer: Grant the endpoint's managed identity the Storage Blob Data Reader role on the storage account (or container) scope
Managed online endpoints commonly access data via managed identity + RBAC. "AuthorizationPermissionMismatch" indicates the identity lacks the required data-plane role (for reads, Storage Blob Data Reader) at the correct scope. Using access keys (B) increases secret management burden and is not the recommended DP-100 pattern when MI is available. Disabling MI and relying on SAS/connection secrets (C) also adds secret rotation risk and does not address the root cause (missing RBAC permissions).

2. A team trains models with MLflow and enables autologging. In the Azure ML run history, metrics appear, but the intended model artifact is missing from the run, causing the registration step to fail. Which action is most likely to ensure the model is captured reliably for registration?

Show answer
Correct answer: Explicitly log the model artifact using MLflow (e.g., mlflow.sklearn.log_model) or set the correct artifact path before registering
Autologging does not guarantee a model artifact is logged in the expected location for every framework/configuration; explicitly logging the model (and using the correct artifact path when registering) is the reliable DP-100 approach. Increasing timeouts (B) doesn't fix missing/incorrect artifact logging. Registering from a local filesystem path (C) is incorrect because the local path in a remote job isn't directly accessible for registry operations unless it was uploaded as a run artifact.

3. You build an Azure ML pipeline with SDK v2. Step 2 reads the processed dataset produced by Step 1. The pipeline runs, but Step 2 fails with a "file not found" error because it cannot locate Step 1 outputs. What is the correct fix?

Show answer
Correct answer: Define Step 1 output as an Azure ML pipeline output (uri_folder/uri_file) and pass that output as an input binding to Step 2
In SDK v2 pipelines, inter-step data must be passed via declared outputs and inputs (data bindings). Step 2 cannot assume Step 1's local working directory exists. Mounting node filesystems (B) is not how Azure ML shares data between jobs and breaks portability. Merging steps into a single job (C) sidesteps the pipeline design and is not the correct pattern when you need modular steps and lineage.

4. Your organization maintains multiple versions of a model and deploys them to a managed online endpoint. You must validate a new version with 10% of traffic while keeping 90% on the current version, and be able to roll back quickly. What should you configure?

Show answer
Correct answer: Create a second deployment under the same endpoint and configure traffic splitting (e.g., 90/10) between deployments
Managed online endpoints support multiple deployments and traffic rules, enabling canary/blue-green patterns and fast rollback by adjusting traffic. In-place updates (B) replace the active deployment and remove an easy rollback path. Using separate endpoints and client-side routing (C) increases operational complexity and is not the standard Azure ML pattern when traffic splitting is available.

5. During a timed DP-100-style mock exam, you notice you are consistently spending too long on troubleshooting questions with long logs. You want a repeatable strategy that improves score reliability under pressure. Which approach aligns best with exam-day best practices?

Show answer
Correct answer: Classify each question (design vs troubleshooting), eliminate distractors quickly, choose the best remaining option, and move on—flagging time sinks for review
DP-100 rewards consistent decision-making: identify what the question is testing, eliminate common traps (identity/RBAC, data access, versioning, deployment settings), and manage time by flagging difficult items. Over-investing time for certainty (B) reduces overall score by sacrificing other questions. Exact syntax memorization (C) is less valuable than understanding patterns and Azure ML concepts; the exam focuses on applying domain knowledge rather than recalling every command verbatim.
More Courses
Edu AI Last
AI Course Assistant
Hi! I'm your AI tutor for this course. Ask me anything — from concept explanations to hands-on examples.