AI Certification Exam Prep — Beginner
Timed DP-100 mock exams + clear explanations to build real exam confidence.
This course is built for beginners preparing for the Microsoft DP-100 exam (Azure Data Scientist Associate). Instead of passive reading, you’ll train the way the exam demands: timed, scenario-based practice tests with clear explanations that map back to official objectives. You’ll learn how Microsoft-style questions are written, how to eliminate distractors, and how to manage time across mixed question types.
If you’re new to certification exams, Chapter 1 removes the guesswork: how to register, what the scoring means, what to expect on exam day, and a practical study strategy that uses repetition and review—not cramming. You can also start your learning journey immediately on Edu AI: Register free.
The course is structured as a 6-chapter book that mirrors the DP-100 skill areas:
Chapters 2–5 each focus on one domain (or a tightly related set of objectives) and finish with a timed practice set. Every practice question includes a full explanation: why the correct option is right, why the other options are wrong, and what exam objective it’s testing. This helps you turn mistakes into a targeted plan for improvement rather than repeated guessing.
DP-100 questions often look straightforward but test deeper understanding: choosing the right Azure Machine Learning capability for a scenario, selecting the correct experiment or deployment pattern, or applying responsible AI and safety practices. This course emphasizes the decision points that appear in the exam—tradeoffs, constraints, and operational considerations—so you build “exam judgment,” not just vocabulary.
After the exam orientation in Chapter 1, you’ll move through domain-focused chapters with deep coverage and targeted practice. Chapter 6 culminates in a full mock exam split into two parts, followed by a structured review workflow and an exam-day checklist to help you perform under pressure.
When you’re ready to explore more certification prep on Edu AI, you can browse all courses and build a full learning path across Azure and AI.
This course is for anyone aiming to pass DP-100—career switchers, students, analysts moving into ML, and Azure beginners—who want realistic practice tests and explanations that directly support exam performance.
Microsoft Certified Trainer (MCT) | Azure AI & Data Science
Nadia El-Amin is a Microsoft Certified Trainer who helps learners prepare for Microsoft Azure AI and data science certifications. She specializes in DP-100 exam readiness through hands-on Azure Machine Learning workflows, exam-style practice, and clear explanations that build confidence.
This course is built to make you faster and more accurate on DP-100 by combining timed mock exams with explanations that teach the “why” behind each correct choice. DP-100 is not a general machine learning test; it is a role-based exam that checks whether you can operate effectively inside Azure Machine Learning (Azure ML) and deliver an end-to-end solution: translate a business request into an ML approach, prepare data assets, run and track experiments, train models at scale, deploy and monitor endpoints, and maintain governance and reproducibility. The exam rewards candidates who can connect Azure ML concepts (workspaces, data assets, compute, jobs, registries, endpoints, monitoring) to practical scenarios under time pressure.
Your goal for Chapter 1 is to build a working mental model of the exam: what it measures, how questions are presented, how scoring feels in practice, and how to use the practice tests to eliminate weak spots. Throughout, you’ll see common traps—answers that look plausible because they use correct words, but don’t satisfy the scenario constraints. The best DP-100 strategy is to learn the objective-level requirements, then rehearse decision-making with timed practice so your “first pass” accuracy rises and you stop bleeding minutes on avoidable ambiguity.
Exam Tip: DP-100 questions often hide the real requirement in one phrase (for example, “reproducible,” “lowest operational overhead,” “near real-time,” “governance,” or “must work across workspaces”). Train yourself to underline (mentally) the constraint and choose the option that satisfies it with the fewest assumptions.
Practice note for Understand DP-100 format, question types, and domain weighting: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.
Practice note for Registration, scheduling, and exam rules (remote vs test center): document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.
Practice note for Scoring, passing criteria mindset, and time management plan: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.
Practice note for How to use this course: mock exams, explanations, and review loop: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.
Practice note for Understand DP-100 format, question types, and domain weighting: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.
Practice note for Registration, scheduling, and exam rules (remote vs test center): document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.
Practice note for Scoring, passing criteria mindset, and time management plan: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.
Practice note for How to use this course: mock exams, explanations, and review loop: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.
Practice note for Understand DP-100 format, question types, and domain weighting: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.
DP-100 validates the day-to-day competencies of an Azure Data Scientist: you are expected to use Azure Machine Learning as the primary platform to build, train, evaluate, deploy, and govern ML solutions. The exam is less interested in deriving equations and more interested in whether you can choose the right Azure ML capability for a requirement and implement it in a maintainable way. Think of it as “production-aware applied ML on Azure,” not “model theory.”
On the test, you’ll routinely see scenarios starting from business goals (“reduce churn,” “detect fraud,” “summarize support tickets”) and you must translate them into ML tasks and operational design choices. That includes selecting an approach (classification/regression/forecasting/NLP), choosing where work runs (compute instances vs clusters vs serverless), deciding how to track experiments, registering models, and deploying endpoints with monitoring. Governance shows up as repeatability and auditability: versioned data and models, consistent environments, and clear lineage between code, data, and results.
Common trap: confusing what you can do in “pure Python” with what the exam wants you to do in Azure ML. Many options will describe a valid ML step but ignore platform best practices—e.g., running training locally without tracking, or saving artifacts in an ad hoc storage account. DP-100 typically rewards solutions that use Azure ML assets (data assets, environments, model registry) so the lifecycle is reproducible and manageable.
Exam Tip: When a scenario mentions collaboration, reusability, or audit requirements, bias toward first-class Azure ML resources (assets, registries, managed endpoints) rather than one-off scripts and manual storage paths. The “correct” answer often aligns to enterprise operations, not just getting a notebook to run.
DP-100 is organized around lifecycle domains that mirror real delivery. First, Design and prepare a machine learning solution: interpret the business objective, define success metrics, and design the Azure ML architecture (workspace, compute, networking/security considerations, and asset strategy). This is where the exam tests whether you can select the right Azure ML components and plan for constraints like data access, cost, and governance.
Second, Explore data and run experiments: using notebooks and jobs, creating and using Azure ML data assets, and tracking runs so results are comparable. Expect questions about experiment tracking, how to capture parameters/metrics/artifacts, and how to validate hypotheses efficiently. A classic exam mistake is treating experimentation like a one-off activity; DP-100 expects repeatable experimentation with clear lineage.
Third, Train and deploy models: choosing a training approach (script, AutoML, pipelines/jobs), registering models, and deploying real-time or batch endpoints with monitoring. Watch for operational clues: “low latency” suggests online endpoints; “daily scoring” suggests batch; “model updates frequently” suggests a pipeline and registry discipline.
Fourth, Optimize language models for AI applications: prompt engineering, evaluation, and safety techniques for LLM-based solutions. This domain often tests whether you can set up evaluation loops, choose appropriate metrics, reduce prompt injection risk, and implement content safety and responsible AI practices. The trap is answering with generic “LLM best practices” that aren’t actionable; the exam favors concrete evaluation and governance steps.
Exam Tip: Map every question to a domain before choosing an answer. If the question is about repeatability or traceability, you’re probably in “Explore/Experiment” or governance aspects of “Train/Deploy,” and the correct option will mention tracking, asset versioning, or managed resources.
DP-100 uses multiple item types designed to test applied judgment. You should expect standard multiple-choice, multi-select (“choose all that apply”), drag-and-drop matching, and case studies. Case studies are especially important: they present a longer scenario with multiple questions, and the constraints in the case study remain consistent across those questions. Your efficiency depends on extracting the key constraints once and reusing them.
Multi-select items are a major time sink when candidates hunt for “the one best answer” instead of verifying each choice independently. Treat each option as a true/false statement against the scenario. Drag-and-drop items frequently test correct ordering of steps (for example, register assets before deployment, or evaluate before promotion), or mapping tools to use cases (batch vs online endpoints, compute instance vs cluster). Labs-style or “interactive” items can appear as UI-like decision prompts; the core skill is recognizing which Azure ML capability solves the requirement.
Common trap: over-reading into the question and inventing requirements not stated. Microsoft items are usually precise; if it doesn’t say “real-time,” don’t assume online endpoints. If it doesn’t say “no-code,” don’t force AutoML. Another trap is ignoring wording like “minimize management overhead,” which often points to managed endpoints and reusable assets rather than custom infrastructure.
Exam Tip: For case studies, write (mentally) a three-bullet “constraint card”: (1) goal/metric, (2) operational requirement (latency, frequency, scale), (3) governance/security constraint. Use that card to answer every question in the case study without re-parsing the full text.
Exam readiness includes logistics. Register through Microsoft’s certification portal and schedule via the authorized provider (often Pearson VUE). Decide early whether you will test remotely or at a test center. Remote exams add constraints: a clean desk, stable internet, acceptable ID, and strict room rules. Test centers reduce technical risk but require travel planning and arrival time buffers.
Policies matter because violating them can end your attempt. Remote proctoring typically disallows extra monitors, phones, paper notes, and wandering out of camera view. Even “innocent” behaviors—reading questions aloud, looking off-screen repeatedly, or using a smartwatch—can trigger warnings. If you need accommodations (extra time, assistive technology), request them well in advance; approvals can take time and may affect scheduling options.
Exam-day readiness is partly technical (system check, updated browser, camera/mic permissions) and partly cognitive. Sleep and hydration affect speed and accuracy more than last-minute cramming. Know your check-in timeline and ID requirements. If you’re testing remotely, run the system test the day before and again 30–60 minutes prior, and have a backup plan for connectivity if possible.
Exam Tip: Treat remote exam rules as part of your prep. Do a “dry run” in your test room: clear the desk, remove extra devices, and position the camera so you won’t need to adjust it mid-exam—camera adjustments can look suspicious and waste time.
If you’re new to Azure ML or role-based Microsoft exams, the fastest path is a two-phase plan: build baseline skills first, then sharpen performance with timed practice. In phase one, prioritize “exam-usable” skills: creating and using data assets, running jobs, tracking metrics, registering models, deploying endpoints, and interpreting monitoring signals. You don’t need to memorize every service limit, but you must recognize which feature matches a scenario constraint.
In phase two, start timed mock exams early—before you feel fully ready. Timing pressure exposes gaps that reading won’t reveal: slow question parsing, confusion between similar terms, and inconsistent decision rules. Use spaced repetition by revisiting your weak objectives at increasing intervals (for example, 1 day, 3 days, 7 days, 14 days). This prevents “familiarity illusion,” where you recognize a term but can’t apply it in a new scenario.
Common trap: taking too many practice tests without closing the loop. Score improvements come from diagnosis and remediation, not repetition alone. Another trap is studying only the topics you enjoy (often modeling) and skipping governance, deployment, and monitoring—areas that can heavily impact your score.
Exam Tip: Build “decision shortcuts.” For example: online endpoint for low latency; batch endpoint for scheduled scoring; registries/assets for reuse across projects; tracked runs for comparability. These shortcuts reduce cognitive load during timed sections.
This course’s value is in the explanations—use them to transform wrong answers into durable exam skill. After each timed mock, review in two passes. Pass one: categorize misses as (A) knowledge gap, (B) misread constraint, (C) two-option confusion, or (D) time management. Pass two: map each miss to a DP-100 objective area (Design/Prepare; Explore/Experiment; Train/Deploy; Optimize LLMs; plus governance/reproducibility themes). This mapping prevents random studying and ensures coverage.
Your error log should be a living document. For each missed item, record: the objective, the scenario constraint you missed, the correct Azure ML feature, why your chosen option fails, and a “recognition cue” for next time (a phrase that should trigger the right pattern). Keep entries short but specific. Over time, patterns emerge—such as repeatedly mixing up batch vs online endpoints, or forgetting that reproducibility implies versioned data and environments.
How to identify correct answers faster: learn to eliminate options that violate constraints. If the question says “must be traceable and repeatable,” eliminate ad hoc scripts without tracking. If it says “minimize operational overhead,” eliminate self-managed infrastructure. If it mentions “evaluate and compare,” eliminate options that don’t capture metrics or lineage. This is where explanations help: they teach the rule, not just the result.
Exam Tip: Rewrite missed explanations into one-sentence rules (e.g., “Use Azure ML data assets to standardize access and versioning across jobs”). Review these rules with spaced repetition; they become your quick recall toolkit under time pressure.
1. You are preparing for DP-100 and want to focus your study time on what the exam actually measures. Which statement best describes the DP-100 exam focus?
2. You take a timed DP-100 practice test and notice you often spend several minutes debating between two plausible options. Based on common DP-100 traps, what is the MOST effective adjustment to improve first-pass accuracy?
3. A team is using this course to prepare for DP-100. They want a repeatable approach that turns missed questions into measurable improvements over time. Which method aligns best with how timed mock exams and explanations should be used?
4. During an internal study session, a colleague claims that DP-100 questions can be answered reliably by memorizing Azure ML feature definitions without considering operational constraints. Which response is MOST accurate?
5. A company schedules DP-100 for a remote proctored session. The candidate wants to maximize their score under time pressure. Which approach is MOST consistent with DP-100 scoring mindset and time management strategy described in the course orientation?
DP-100 tests whether you can translate an ambiguous business request into an implementable machine learning (ML) solution in Azure Machine Learning (Azure ML). This chapter focuses on the “design and prepare” decisions that come before model training: shaping the problem statement, defining success metrics, choosing workspace resources and compute, planning data ingestion and labeling, and setting governance and MLOps foundations.
The exam rarely asks you to “invent” architecture from scratch. More commonly, it gives a scenario (data location, security constraints, cost limits, latency targets) and asks which Azure ML feature or pattern best satisfies the constraints. Your edge comes from recognizing what the question is truly optimizing for: cost, compliance, reproducibility, throughput, time-to-first-results, or operational safety.
Exam Tip: When a question mixes business goals with technical options, pause and restate the goal as (1) ML task type, (2) success metric, and (3) non-functional constraints (security, latency, cost). Many wrong answers are “technically possible” but violate one constraint.
Practice note for Convert business requirements into ML problem statements and success metrics: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.
Practice note for Select Azure ML workspace resources and secure access patterns: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.
Practice note for Prepare data ingestion/labeling strategy and feature planning: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.
Practice note for Domain practice set (timed): design & preparation questions with explanations: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.
Practice note for Convert business requirements into ML problem statements and success metrics: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.
Practice note for Select Azure ML workspace resources and secure access patterns: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.
Practice note for Prepare data ingestion/labeling strategy and feature planning: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.
Practice note for Domain practice set (timed): design & preparation questions with explanations: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.
Practice note for Convert business requirements into ML problem statements and success metrics: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.
Practice note for Select Azure ML workspace resources and secure access patterns: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.
DP-100 expects you to convert business requirements into an ML problem statement you can test. Start by naming the use case in operational terms: “predict churn within 30 days,” “classify support tickets,” “forecast demand,” or “detect anomalies.” Then map it to an ML task (classification, regression, time series forecasting, clustering, anomaly detection) and define the unit of prediction (customer-day, transaction, device-hour). This removes ambiguity and drives your data and labeling plan.
Next, define success metrics that match the business cost of errors. For classification, don’t default to accuracy: imbalance and asymmetric costs often demand precision/recall, F1, ROC-AUC, PR-AUC, or a threshold-based metric like recall at a fixed false-positive rate. For regression, choose RMSE/MAE and align with tolerance ranges. For ranking/recommendations, think NDCG or MAP. The exam will reward you for choosing metrics aligned to the scenario (for example, “minimize missed fraud” implies prioritizing recall, not accuracy).
Constraints are where most exam traps live. Capture data constraints (availability, missingness, drift, label delay), compute constraints (GPU availability, quota, training window), and serving constraints (latency, throughput, offline batch vs real-time). Compliance constraints often imply restricted networking, least-privilege access, and auditable lineage. Responsible AI constraints may require explainability, fairness checks, and safety mitigations—especially when the model influences people’s outcomes (credit, hiring, healthcare) or handles sensitive data.
Exam Tip: If a question mentions “regulatory,” “PII,” “auditable,” or “human impact,” assume you must address governance: lineage/versioning, controlled access, and documented evaluation (including bias/fairness where relevant). Ignoring these is a common wrong-answer pattern.
Finally, translate requirements into measurable KPIs and acceptance criteria for experimentation: baseline model, target lift, and monitoring signals. DP-100 scenarios may hint at business KPIs (reduced call time, fewer returns) that you must convert into model metrics plus operational metrics (endpoint latency, cost per 1,000 predictions, data freshness). This becomes the scoreboard for experiments and for deciding whether to deploy.
An Azure ML workspace is the control plane for experiments, assets, jobs, and deployments. DP-100 frequently tests whether you can pick the right compute target for the job: Compute instance for interactive development (notebooks, debugging); Compute cluster for scalable training (autoscaling, job scheduling); Serverless/managed compute where supported for simplified execution; and inference compute (managed online endpoints, batch endpoints, or Kubernetes) based on latency and throughput needs.
Cost-aware design is a recurring theme. Clusters can scale to zero when idle; compute instances cannot autoscale and can run up costs if left on. GPU nodes are expensive and quota-limited; use them only when justified (deep learning, large embeddings, LLM fine-tuning) and consider smaller SKUs for feature engineering or classical ML. Batch scoring can be dramatically cheaper than always-on real-time endpoints if low latency is not required.
Quotas and region capacity appear in scenario questions: you may be blocked by GPU quota, vCPU quota, or SKU availability. The exam often expects the “most direct” fix: request quota increases, choose a different VM size, change region, or reduce parallelism. Also watch for the difference between workspace-level limits and subscription/region limits—incorrectly assuming one when the other is true is a classic trap.
Exam Tip: If the question emphasizes experimentation speed, choose autoscaling compute clusters and parallel runs; if it emphasizes interactive exploration, choose a compute instance. If it emphasizes cost control, pick scale-to-zero and batch approaches.
Architecturally, identify which resources must be shared across a team (workspace, datastores, registries) versus isolated per project (compute, endpoints, environments). DP-100 commonly tests “right-sized” setups: one workspace per environment (dev/test/prod) or per business unit depending on governance, with consistent naming, tags, and resource group strategy.
Designing the data path is central to “prepare a machine learning solution.” In Azure ML, you typically connect storage through datastores (backed by Azure Blob, ADLS Gen2, etc.), define data assets (so datasets are discoverable and versioned), and use connections for external services (like databases) where appropriate. The exam wants you to choose patterns that produce repeatable training and scoring inputs.
Data ingestion strategy depends on freshness and volume. For large historical training, store curated parquet/CSV in ADLS Gen2 and reference it via a datastore and versioned data asset. For incremental updates, consider partitioning by date and building pipelines/jobs that materialize a new version of the curated dataset. If the scenario highlights “multiple teams reuse the same dataset,” data assets with clear versioning and documentation are favored over ad hoc file paths in notebooks.
Labeling strategy matters when ground truth is missing. DP-100 scenarios may mention manual labeling, weak supervision, or delayed labels (for example, churn labels appear after 30 days). Your plan should address how labels are captured, stored, and joined, and how you prevent leakage (using information that wouldn’t exist at prediction time). Leakage is a top exam pitfall: features computed from future events can inflate offline metrics and fail in production.
Exam Tip: When you see “time-based data,” immediately think about train/validation splits that respect time order and feature computation that only uses past data. Random splits are often wrong in forecasting and many user-behavior problems.
Feature planning bridges data and modeling. Decide what transformations will be standardized (encoding, scaling, imputation) and where they live: in training code, in reusable components, or as part of a pipeline. The exam often rewards answers that keep transformations consistent between training and inference. Versioning is not optional: version data assets, code, and environments so you can reproduce a model later and explain which data snapshot produced it.
Security questions in DP-100 tend to be practical: who can access the workspace, how data access is granted, and how secrets are handled. Role-based access control (RBAC) is the default. You grant least privilege at the right scope (workspace/resource group/subscription) and avoid sharing keys. Know the difference between “can run jobs” versus “can manage the workspace,” and expect scenario prompts like “data scientists can experiment but not change networking.”
Managed identity is a frequent best answer when a compute resource (compute instance/cluster/endpoint) must access storage or other Azure services securely. It reduces secret sprawl and improves auditability. If a question mentions “avoid storing credentials” or “rotate secrets,” managed identity or Azure Key Vault integration is typically the direction.
Secrets should be stored in Key Vault, not in notebooks, pipeline YAML, or environment variables committed to Git. DP-100 commonly baits with options like “store the connection string in code” or “share a SAS token.” Prefer Key Vault references and identity-based access (Azure AD).
Exam Tip: If the scenario says “no public internet” or “private access only,” look for private endpoints, VNet integration, and disabling public network access where supported. Choosing “IP allowlist” may be insufficient if the requirement is fully private.
Network options include public access (simpler, faster to start) versus private networking (more secure, more setup). Governance foundations also include auditing and traceability: ensure assets are tracked in the workspace, runs are logged, and approvals exist for production changes. On the exam, the “most secure” answer is not always correct—choose what satisfies the stated requirement without over-engineering, unless compliance language implies strict isolation.
Even in a design-and-prepare chapter, DP-100 expects MLOps awareness: how you’ll keep experiments reproducible and deployments reliable. Start with source control: keep training code, inference code, and pipeline definitions in a repo, with branching aligned to environments (dev/test/prod) or trunk-based development depending on the organization. The exam often emphasizes collaboration and repeatability—ad hoc notebook-only workflows are rarely the best answer for production scenarios.
Environment management is a core reproducibility lever. Use curated environments or define your own with pinned dependencies (Conda/Docker). If the question mentions “the model behaves differently between training and deployment,” suspect environment drift and pick an answer involving a shared environment definition or container image. Also capture random seeds and data versions to make runs comparable.
CI/CD touchpoints typically include: lint/unit tests for feature code, training job submission in a pipeline, model registration conditioned on evaluation thresholds, and deployment to a staging endpoint with smoke tests. While DP-100 doesn’t require deep DevOps tooling specifics, it does expect you to recognize where automation reduces risk. “Register the model” and “promote only if metrics pass” are common motifs.
Exam Tip: If you see “reproduce results from last month” or “audit how this model was built,” the correct answer usually includes versioned data assets, tracked runs/metrics, and registered models tied to run IDs—not just saving a file to storage.
Plan for monitoring early: what signals indicate drift (feature distribution changes), performance decay (label-based metrics when labels arrive), and operational issues (latency, error rates). Even if monitoring is implemented later, the design should include where logs/metrics go and who owns alerts. DP-100 questions often reward designs that keep training, evaluation, and deployment as consistent, traceable steps rather than one-off manual actions.
This chapter’s domain practice set is designed to simulate DP-100’s “design and prepare” thinking under time pressure. You will see scenario-based prompts that mix requirements framing, Azure ML resource selection, data strategy, and security/governance constraints. The purpose is not memorization of menu paths; it’s rapid identification of the controlling constraint and selecting the Azure ML feature or pattern that satisfies it.
Your pacing goal: read the scenario once for context, then re-read the last line to confirm what is being asked (metric choice, compute choice, data/versioning choice, or security pattern). Many candidates lose time because they start validating all options before deciding what the question is optimizing. Train yourself to label each scenario with one primary driver: cost, speed, compliance, latency, or reproducibility.
Exam Tip: When two answers look plausible, eliminate options that introduce unmanaged credentials, non-versioned data paths, or non-scalable compute for repeated jobs. DP-100 favors managed, repeatable, least-privilege patterns.
After each timed set, do remediation by objective, not by question. If you missed a metric question, revisit task-to-metric mapping and leakage pitfalls. If you missed a compute question, revisit when to use compute instance vs cluster vs batch/online endpoints and the implications of autoscaling and quotas. If you missed a governance question, revisit managed identity, Key Vault usage, and private networking triggers. This approach improves your score faster than re-taking the same questions without diagnosing the underlying skill gap.
1. A retail company asks you to "reduce wasted marketing spend". They have historical campaign data with a binary outcome column (purchase within 7 days: yes/no). The business success criterion is: "maximize profit by targeting only customers likely to purchase while avoiding too many missed buyers." Which ML problem statement and primary success metric best match the requirement for an Azure ML solution?
2. You must design an Azure ML workspace for a regulated team. Data scientists authenticate with Microsoft Entra ID and must not use shared keys or personal access tokens. Workspace traffic must remain on the corporate network; public internet access should be blocked. Which access pattern best meets these requirements?
3. A company has 5 TB of raw logs arriving daily in Azure Data Lake Storage Gen2. They want to start experimenting quickly, but also need reproducible training runs that can be audited later. You are planning data ingestion and feature preparation in Azure ML. Which approach best supports both time-to-first-results and reproducibility?
4. You are building an image classification solution in Azure ML. The customer has 200,000 images stored in blob storage but no labels. They need a labeling workflow where multiple labelers can annotate images, and you must track label quality and progress. Which Azure ML capability should you use?
5. A team is selecting Azure ML compute for model experimentation. They want to minimize cost when no jobs are running, but still be able to scale out to multiple nodes for training jobs submitted during business hours. Which compute choice best fits?
DP-100 expects you to prove that you can move from “I have data” to “I can justify a first experiment and interpret results” using Azure Machine Learning (Azure ML). This chapter maps directly to the exam skill area Explore data and run experiments: performing EDA and data quality checks aligned to ML objectives, running iterative experiments, tracking runs, and producing responsible baseline comparisons. You are not being tested on fancy charts; you are being tested on correct reasoning, reproducibility, and using Azure ML features (data assets, notebooks/jobs, and tracking) to validate hypotheses.
In practice tests, many wrong answers are “technically possible” but miss a key exam expectation: traceability. DP-100 questions often hide requirements such as “re-run the experiment later,” “compare multiple runs,” “avoid leakage,” or “scale to a cluster.” Keep that mindset as you work through the sections: you are building evidence that your model development is measurable, repeatable, and governed.
Exam Tip: When a question includes words like reproducible, lineage, audit, or compare runs, the correct solution almost always involves Azure ML experiments, MLflow tracking, registered data assets, and saved artifacts—not only local notebook outputs.
Practice note for Perform EDA and data quality checks aligned to ML objectives: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.
Practice note for Use Azure ML experiments, runs, and tracking for iteration: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.
Practice note for Build baseline models and compare results responsibly: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.
Practice note for Domain practice set (timed): exploration & experimentation questions with explanations: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.
Practice note for Perform EDA and data quality checks aligned to ML objectives: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.
Practice note for Use Azure ML experiments, runs, and tracking for iteration: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.
Practice note for Build baseline models and compare results responsibly: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.
Practice note for Domain practice set (timed): exploration & experimentation questions with explanations: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.
Practice note for Perform EDA and data quality checks aligned to ML objectives: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.
Practice note for Use Azure ML experiments, runs, and tracking for iteration: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.
EDA on DP-100 is about identifying patterns that change how you design experiments. Start with missingness: quantify null rates per column and look for patterns correlated with the target or time. Missing values that are not at random can encode business processes (e.g., “no lab test ordered” may indicate low-risk patients). The exam often frames this as a data quality check aligned to ML objectives: if the prediction must work for all users, you must detect whether missingness will bias evaluation.
Leakage is the most common trap. Leakage includes features created after the prediction time, direct proxies of the label, or target-derived aggregations that were computed using the full dataset (including validation/test). In questions, look for suspicious columns like “status_after_claim,” “resolution_date,” “days_until_churn,” or any post-event logging fields. Leakage also happens when you do preprocessing (imputation, scaling, encoding) on the full dataset before splitting. The correct fix is to split first, then fit transforms on training only (or use pipelines that do this correctly).
Class imbalance affects both metrics and split strategy. A dataset with 98% negatives can make accuracy meaningless; you’ll need stratified splits and metrics like AUC, precision/recall, F1, or PR AUC depending on the business cost. Also check label prevalence per subgroup (fairness risk) and per time window (drift signal). Drift signals can appear as changing feature distributions over time, rising missingness, or shifting label rates. DP-100 may ask what to log or monitor; the answer often includes tracking dataset versions, time-based splits, and storing summary statistics as artifacts.
Exam Tip: If a scenario says “predict next month” or “real-time scoring,” treat time as a first-class constraint. Randomly shuffling across time is often the wrong answer because it hides drift and leakage.
DP-100 questions frequently test whether you know when common transformations are necessary and how to apply them without contaminating evaluation. Scaling (standardization/min-max) matters for distance-based or gradient-based methods (k-NN, SVMs, logistic regression, neural nets). Tree-based models (random forests, gradient-boosted trees) are generally insensitive to monotonic scaling, so scaling is not always required—an exam distractor is “always scale features.” Choose scaling because the algorithm needs it, not because it’s fashionable.
Encoding categorical variables is another test favorite. One-hot encoding is common for low-to-medium cardinality features. High cardinality can explode dimensionality; alternatives include target encoding (high leakage risk if done incorrectly), hashing trick, or learned embeddings. For DP-100, the safe exam posture is: use transformations inside a pipeline and fit on training only, especially for target encoding.
Text/vector basics appear more often now: bag-of-words/TF-IDF creates sparse vectors suitable for linear models; embeddings (from language models) yield dense vectors that can feed classical or neural models. The exam is not asking you to implement an LLM; it’s testing whether you can represent text appropriately and keep the process reproducible (log the vectorizer configuration, vocabulary/version, or embedding model reference).
Finally, splits: feature engineering must respect split boundaries. If you compute statistics (mean/variance, vocab, imputation values) using the full dataset before splitting, you leak information from validation/test into training. DP-100 expects you to avoid this by (a) splitting first and (b) using pipelines so transforms are fit only on the training fold during cross-validation.
Exam Tip: When the question mentions “cross-validation,” assume transformations must be inside the CV loop (pipeline) to prevent leakage—this is a classic DP-100 trap.
Azure ML supports interactive development (notebooks) and repeatable execution (jobs). The exam often asks what to use given constraints like scale, reproducibility, or scheduling. Notebooks are ideal for quick EDA, hypothesis exploration, and debugging. Jobs are the unit of scalable, traceable execution: you submit a script or command to a compute target, and Azure ML captures inputs, outputs, logs, environment, and status.
A common trap is choosing notebooks when the prompt requires an auditable run history or re-running on a cluster. If a scenario says “run nightly,” “share with team,” “compare experiments,” or “rerun with the same environment,” jobs are the more defensible answer. Jobs also work better for MLOps pipelines where steps are chained.
Logging and artifacts are the bridge between the two. Whether you run locally or as a job, you should log metrics (e.g., accuracy, AUC, RMSE), parameters (learning rate, regularization, preprocessing choices), and artifacts (confusion matrix plots, feature importance, model files, and dataset profiling reports). In Azure ML, MLflow is the standard mechanism for tracking; artifacts are stored with the run and accessible later for comparison and governance.
Exam Tip: If a question mentions “capture outputs,” “store artifacts,” or “compare runs,” look for job + MLflow tracking rather than “save a file to local disk in the notebook.” Local disk is not durable or shareable in the exam’s framing.
Experiment tracking is central to DP-100 because it enables iteration with evidence. On the exam, “experiment,” “run,” and “tracking” usually imply MLflow-backed logging in Azure ML. You should understand what to log and why: parameters (inputs you control), metrics (outputs you evaluate), and artifacts (files that support analysis). Parameters might include train/validation split seed, vectorizer settings, or regularization strength. Metrics might include AUC, F1, MAE, or log loss. Artifacts might include a trained model file, preprocessing pipeline, plots, or a data profile report.
Lineage is the “chain of custody” for ML: which dataset version, code, environment, and hyperparameters produced which model. Azure ML can link runs to registered data assets and to code stored in a repo, and it records the compute target and environment. DP-100 commonly tests whether you can reproduce a run later—meaning you must version data (data assets), pin environments (conda/docker), and record parameters.
MLflow concepts that appear in Azure ML contexts include experiments, runs, logging (log_metric, log_param), and model logging/registration. Even when you do not explicitly call MLflow APIs, many Azure ML workflows integrate with MLflow under the hood. The key exam behavior: choose answers that create comparable runs and preserve metadata rather than one-off outputs.
Exam Tip: When you see “lineage” or “which dataset produced this model,” prioritize answers that use Azure ML data assets and tracked runs. “Upload a CSV to the VM” is almost never sufficient for lineage requirements.
Validation is where DP-100 separates “trained a model” from “ran a defensible experiment.” Choose a strategy that matches the data shape and business objective. Holdout validation (train/validation/test) is simple and fast; it’s often correct for large datasets where variance is low. Cross-validation (k-fold) is useful for smaller datasets to reduce variance and to compare models more reliably. However, cross-validation increases compute cost—if the scenario emphasizes speed or limited compute, holdout may be preferred.
Stratification is critical when classes are imbalanced; it ensures each split preserves label proportions. A frequent exam trap is using random split without stratification for rare-event classification, which can produce folds with zero positives and meaningless metrics. Another trap is shuffling time series data: if the target is future behavior, you should use a time-based split to avoid training on “future” information.
Metric selection must align to the ML task and business cost. For regression, consider MAE/RMSE (RMSE penalizes large errors more) and R2. For binary classification, accuracy can be misleading under imbalance; use AUC for ranking performance, precision/recall and F1 for decision thresholds, and PR AUC when positives are rare. For multiclass, use macro/micro averaging depending on whether you care equally about each class or overall frequency-weighted performance.
Exam Tip: If the scenario mentions “minimize false negatives” (e.g., fraud, safety), look for recall-sensitive metrics and threshold tuning. If it mentions “alerts are expensive,” precision often matters more. The best answer links metric choice to business impact.
This chapter’s domain practice set (timed) will focus on rapid recognition of the patterns you just learned: spotting leakage, choosing the correct validation approach, and selecting Azure ML features that create reproducible experimentation. In timed conditions, your goal is not to remember every API call; it’s to identify the exam’s hidden requirement and eliminate distractors that violate it.
Expect questions that describe an EDA observation (missingness, skew, class imbalance, drift) and ask what to do next. The correct option usually ties back to an ML objective (e.g., “predict at inference time,” “generalize to next quarter,” “reduce false negatives”) and includes a responsible experiment design (proper splits, correct metrics, pipeline-based preprocessing). You will also see scenarios that contrast notebooks with jobs. If the prompt includes collaboration, repeatability, scaling, or scheduled runs, pick jobs and tracked experiments.
Exam Tip: If two answers seem plausible, choose the one that improves traceability (tracked experiment + versioned data + logged artifacts). DP-100 rewards lifecycle thinking more than clever modeling.
After you complete the timed set, review explanations specifically for why wrong options fail (often due to leakage, wrong metric, or non-reproducible workflow). This is the fastest way to raise your score in the “explore and run experiments” domain.
1. You are exploring a tabular dataset in an Azure ML notebook to build a first baseline model. The team requires that your EDA findings can be reproduced later and that the exact input dataset version used is auditable. Which approach best meets this requirement?
2. A data scientist runs the same training script multiple times in Azure ML to compare feature sets. They must be able to query metrics across runs, compare them side-by-side, and identify which code/config produced each result. What should they do?
3. You are building a baseline model for a binary classification problem. During EDA you discover a feature that is populated only after the event you are trying to predict (for example, a 'resolution_time' column recorded after a support case is closed). What is the most appropriate action before training the baseline?
4. A team wants to rerun a baseline training job in Azure ML next month and get comparable results. They require that the training environment and dependencies are consistent between runs. Which action best supports this?
5. You run two baseline experiments in Azure ML using different train/test splits and notice that one split produces much higher performance. Your manager asks you to pick the 'best' baseline. What is the most defensible approach aligned with DP-100 expectations for responsible comparison?
This chapter maps to the DP-100 skills around training, registering, deploying, and operating models in Azure Machine Learning (Azure ML). Expect exam questions to test whether you can choose the right training method (custom script vs AutoML vs pipelines), tune runs efficiently, register models with proper lineage, and deploy to the correct endpoint type with practical monitoring and troubleshooting steps. The “gotcha” on DP-100 is that many answers sound plausible unless you recognize which Azure ML asset (job, environment, model, endpoint) provides reproducibility, governance, and operational readiness.
As you study, keep an eye on the verbs the exam uses: “train,” “track,” “register,” “deploy,” “monitor,” and “troubleshoot” each map to a different set of Azure ML objects. You’ll score higher when you can identify the minimal correct set of steps rather than selecting extra-but-unnecessary actions.
Exam Tip: If a question mentions repeatability or auditability, look for answers involving Azure ML jobs, environments, and registered assets with versions/lineage—rather than ad-hoc notebook execution.
Practice note for Choose training method (script, AutoML, pipelines) and tune effectively: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.
Practice note for Register and manage models with lineage and versioning: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.
Practice note for Deploy to managed endpoints and batch scoring targets: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.
Practice note for Domain practice set (timed): training & deployment questions with explanations: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.
Practice note for Operations essentials: monitoring, troubleshooting, and iteration: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.
Practice note for Choose training method (script, AutoML, pipelines) and tune effectively: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.
Practice note for Register and manage models with lineage and versioning: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.
Practice note for Deploy to managed endpoints and batch scoring targets: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.
Practice note for Domain practice set (timed): training & deployment questions with explanations: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.
Practice note for Operations essentials: monitoring, troubleshooting, and iteration: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.
DP-100 expects you to understand how training is executed and tracked in Azure ML using jobs, with dependencies captured in environments and resources provided by compute. In exam terms, a “job” is the unit of execution that records parameters, code, metrics, and outputs. An “environment” pins the runtime (Docker image/conda dependencies), which is essential for reproducibility and later deployment. Compute selection (CPU vs GPU; cluster vs instance) is often the deciding factor in scenario questions that mention cost, speed, or scale.
You’ll typically choose between: (1) custom training (script-based command jobs), (2) AutoML, and (3) pipelines (to orchestrate multi-step workflows). Custom training is best when you control the algorithm, training loop, feature engineering, and want full flexibility (e.g., PyTorch, scikit-learn, distributed training). AutoML is best when you need strong baselines quickly for standard tasks (classification/regression/forecasting) and want built-in featurization and model selection. Pipelines are best when the scenario emphasizes repeatable workflows: data prep → train → evaluate → register → deploy.
Exam Tip: If the prompt says “compare many algorithms quickly” or “automatically tune and select,” AutoML is usually the intended answer. If it says “custom loss function,” “custom architecture,” or “bring your own training loop,” pick custom script training.
Common trap: choosing a compute instance for scalable training. Compute instances are great for notebooks and development, but scalable training across multiple nodes typically points to compute clusters and job submission. Another trap: assuming AutoML is the default for every task—DP-100 often tests your ability to recognize when custom code is required.
Hyperparameter tuning is frequently assessed through “sweep” concepts: running many trials with different hyperparameter combinations and selecting the best based on a primary metric. DP-100 questions often focus less on the math and more on operational choices: how to reduce time/cost while maintaining search quality. In Azure ML, tuning typically means a sweep job with a sampling strategy (random, grid, Bayesian) and a termination policy to stop underperforming runs.
Efficiency cues: if the scenario mentions limited budget, long training times, or wanting faster iteration, look for early termination (also called bandit/median stopping in many contexts). Early termination stops trials that are unlikely to beat the current best, which can save significant compute. Parallelism is another lever: run multiple trials at once using a compute cluster. DP-100 expects you to align the number of concurrent trials with cluster capacity (nodes/cores/GPUs), rather than selecting an arbitrarily high parallel count.
Exam Tip: When you see “minimize wasted compute,” pick early termination policies. When you see “shorten wall-clock time,” pick parallel trials on a scalable cluster (assuming budget allows).
Common trap: selecting grid search for a high-dimensional hyperparameter space because it “sounds thorough.” On the exam, grid search is usually the least efficient option unless the space is tiny and explicitly constrained. Another trap is forgetting that early termination policies require intermediate metrics—if a job only logs metrics at the end, early stopping can’t help.
After training, DP-100 expects you to manage the model as a governed asset, not just a file in storage. In Azure ML, “registering a model” creates a model asset with versioning and metadata. This matters because deployments reference model assets, and governance relies on traceability: which data, code, and environment produced the model.
Lineage is a key exam concept: the platform can link a registered model back to the training job, including parameters, metrics, and outputs. If a scenario mentions audit, compliance, reproducibility, or “identify which run produced the deployed model,” model registration with lineage is the intended direction. You should also recognize the difference between: (1) model artifacts (the serialized model files), (2) model metadata/tags (business context, intended use, approval state), and (3) versions (immutable snapshots of the model asset).
Exam Tip: If the question is about “promoting” models across environments (dev/test/prod) or rolling back, model versioning is the core feature to select—avoid answers that rely on manual file copying.
model.pkl, model.onnx, tokenizer files, or preprocessing objects.Common trap: assuming “registering” is only for MLflow models. Azure ML can register many artifact types; the exam tests whether you understand the platform concept rather than a single framework. Another trap is confusing dataset versioning with model versioning—both matter, but model lifecycle questions usually require the model asset and the training run linkage.
Deployment questions on DP-100 often come down to choosing the correct endpoint type: managed online endpoints for real-time low-latency inference, and batch endpoints for asynchronous, high-throughput scoring over large datasets. If the scenario mentions interactive apps, APIs, or immediate responses, it points to online endpoints. If it mentions nightly scoring, large backlogs, scoring millions of rows, or writing outputs to storage, it points to batch endpoints.
Scoring components are a frequent exam target. You must connect the trained model artifact to an inference entry point (often a scoring script) that loads the model and handles requests. Even when the platform can auto-generate pieces, DP-100 expects you to know the responsibilities: initialization/loading (performed once per replica) and request handling (performed per call). For batch, the “request” is typically an input dataset/URI and output destination rather than per-record HTTP calls.
Exam Tip: If a question emphasizes “low latency” and “autoscale based on traffic,” choose managed online endpoints. If it emphasizes “cost-effective processing of a large dataset,” choose batch endpoints.
Common traps: deploying batch workloads to online endpoints (expensive and timeouts) or expecting batch endpoints to satisfy strict real-time SLAs. Another trap is ignoring preprocessing artifacts—if training included a scaler/encoder/tokenizer, deployment must include those artifacts or replicate the transformations identically, or performance will degrade.
Operations essentials are part of “train and deploy” on DP-100: deploying a model isn’t the finish line. You should be comfortable with what to monitor, where to look when something fails, and how to iterate safely. Monitoring typically includes service health (availability, error rate), performance (latency, throughput), and model quality signals (data drift, prediction distribution shifts, and—where applicable—responsible AI indicators such as fairness or explainability outputs).
For troubleshooting, the exam frequently tests whether you know to inspect logs and job/endpoint events to locate root causes. Examples include dependency import failures (environment mismatch), serialization issues (wrong model file path), schema mismatches (input JSON vs expected features), and resource constraints (insufficient CPU/memory causing timeouts). Latency issues often map to model size, cold start time, or inefficient preprocessing inside the request handler.
Exam Tip: When a scenario mentions “sudden accuracy drop” without code changes, think data drift or upstream feature changes. When it mentions “errors after deployment,” think environment/serialization/schema and check logs first.
Common trap: treating monitoring as only “endpoint up/down.” DP-100 expects you to connect operational telemetry to ML iteration: detect drift → retrain → register new version → deploy with controlled rollout. Another trap is “fixing” by manually patching the running container; exam-correct answers favor updating the environment/model version and redeploying for traceability.
This chapter’s domain practice set targets the DP-100 training and deployment objectives under timed conditions. The goal isn’t memorizing commands—it’s recognizing patterns in the scenario and selecting the Azure ML feature that best satisfies constraints like latency, cost, governance, and reproducibility. Under time pressure, many candidates over-select “kitchen sink” answers (pipelines + AutoML + custom endpoints + extra services) when the question is testing a single concept such as sweep early termination or model version rollback.
Use a two-pass strategy. Pass 1: identify what the question is really testing (training method, tuning, registration/lineage, online vs batch deployment, or monitoring). Pass 2: eliminate choices that violate a stated constraint (e.g., “must be real-time,” “must be reproducible,” “must minimize compute cost,” “must allow rollback”). This mirrors how DP-100 is written: one option will satisfy the key constraint cleanly, while distractors will be partially correct but miss the main requirement.
Exam Tip: Watch for the hidden noun that reveals the domain: “endpoint” implies deployment, “model version” implies lifecycle/rollback, “trial” implies hyperparameter sweeps, and “lineage” implies registered assets tied to jobs.
After completing the timed set, remediate by categorizing misses into: (1) selection errors (online vs batch, AutoML vs custom), (2) governance errors (not registering/versioning), and (3) ops errors (not using logs/metrics/drift signals). Then reattempt similar questions with a strict time box to build exam-speed pattern recognition.
1. You need to train a model in Azure Machine Learning with full reproducibility and auditability. The team currently runs training from a notebook and wants every run to capture code, environment, inputs, and metrics so it can be repeated later and compared across iterations. What should you do?
2. A data science team wants to run hyperparameter tuning efficiently and select the best model based on a primary metric. They also want to avoid manually orchestrating multiple training runs. Which approach best meets the requirement?
3. Your organization requires that deployed models are traceable to the exact training run, including the training data reference, code version, and environment. After training, you want to promote the model to production while preserving lineage. What should you do next?
4. A company needs to provide low-latency, real-time predictions to a web application. The model must be updated over time with minimal disruption, and the solution should use managed Azure ML capabilities. Which deployment target should you choose?
5. After deploying a model to a managed online endpoint, requests start failing intermittently with errors indicating missing Python packages. You need to troubleshoot and prevent recurrence. What is the most appropriate action?
DP-100 increasingly expects you to make correct architectural choices for language-model solutions in Azure: when prompting is enough, when you need retrieval-augmented generation (RAG), and when fine-tuning is justified. The exam is less about building a flashy chatbot and more about making disciplined decisions that satisfy constraints like latency, cost, governance, privacy, and evaluation rigor. In practice-test items, you’ll often be given a scenario with compliance requirements, changing knowledge, or output-format constraints—your job is to map those signals to the right approach and the right Azure ML/Azure AI components.
This chapter frames LLM optimization as an engineering workflow: start by selecting the correct approach (prompting vs RAG vs fine-tuning), then design prompts and system messages aligned to measurable evaluation criteria, validate via offline and human review, and finally implement safety controls and monitoring. As you read, keep asking: “What would DP-100 expect me to change first to improve reliability—data (RAG), instructions (prompt), or weights (fine-tune)?”
Exam Tip: When a question mentions “latest policies,” “rapidly changing content,” or “must cite sources,” the correct answer usually involves RAG and grounding—not fine-tuning. Fine-tuning is for stable patterns (tone, format, domain style) rather than frequently changing facts.
Practice note for Select LLM approach for the use case (prompting vs fine-tuning vs RAG): document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.
Practice note for Design prompts, system messages, and evaluation criteria: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.
Practice note for Implement safety, compliance, and monitoring for LLM apps: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.
Practice note for Domain practice set (timed): LLM optimization questions with explanations: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.
Practice note for Select LLM approach for the use case (prompting vs fine-tuning vs RAG): document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.
Practice note for Design prompts, system messages, and evaluation criteria: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.
Practice note for Implement safety, compliance, and monitoring for LLM apps: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.
Practice note for Domain practice set (timed): LLM optimization questions with explanations: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.
Practice note for Select LLM approach for the use case (prompting vs fine-tuning vs RAG): document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.
DP-100 scenarios typically start with a business goal (“reduce support workload,” “summarize case notes,” “extract entities from contracts”) and constraints (“PII,” “low latency,” “must be deterministic,” “auditable output”). Your first optimization step is choosing the LLM interaction pattern that fits the task: classification, summarization, extraction, or open-ended chat. Classification and extraction often demand structured outputs (JSON, fixed labels) and high consistency; summarization needs faithfulness; chat needs conversational context management and safety boundaries.
On the exam, constraints are the tell. If the task requires strict formatting (e.g., always return a JSON object with required keys), the best improvement is usually prompt/system-message design plus evaluation with format checks—before any fine-tuning. If latency and cost are tight, smaller models with well-designed prompts, caching, and reduced context windows often beat “use the largest model.” If the business requires that answers never exceed provided documents, you’ll need grounded generation (RAG) and citation requirements.
Common trap: choosing fine-tuning when the real issue is missing information. If the model “hallucinates policy details,” adding training examples won’t fix the fact it lacks the authoritative policy text at inference time—RAG will.
Exam Tip: When you see “PII cannot leave the tenant” or “data residency,” think about data handling and architecture first (where prompts, embeddings, logs are stored) and only then model choice. DP-100 rewards risk-aware design more than clever prompt hacks.
Prompt engineering on DP-100 is tested as a reliability technique: you craft system messages, user prompts, and examples to reduce variance and enforce constraints. The exam often expects you to separate (1) system instructions (role, policies, output rules), (2) developer instructions (task-specific constraints), and (3) user inputs (untrusted text). This separation is critical for injection resistance and consistent behavior.
Few-shot prompting improves consistency by showing the model representative examples of inputs and correct outputs. Use few-shot when labels, schemas, or style need reinforcement. However, too many examples can bloat the prompt and increase cost/latency; DP-100 items may hint at context window limits, pushing you toward shorter exemplars or retrieval of examples.
Chain-of-thought handling is a subtle exam area. You want good reasoning, but you should not depend on hidden reasoning being exposed. Many best practices involve asking for concise justifications or structured steps while avoiding requests to reveal sensitive internal reasoning. In exam answers, prioritize instructions that demand verifiable artifacts (citations, extracted fields, computed values) over “show your reasoning,” especially in regulated settings.
Tools/function calling concepts appear as “call an API,” “query a database,” or “use a calculator” patterns. The model should decide when to invoke a tool and return structured arguments; your evaluation then checks both the tool call correctness and final response. This is often more reliable than forcing the model to “remember” dynamic facts.
Common trap: mixing user content and instructions in one block. If a scenario mentions prompt injection risk (users can paste instructions), the correct mitigation is to harden system messages and isolate user-provided text as data, not instructions.
Exam Tip: If the requirement is “deterministic output,” look for settings and design choices like temperature reduction, constrained output formats, explicit schemas, and post-parse validation—prompting alone without validation is rarely sufficient.
RAG is the go-to pattern when the model must answer using proprietary or frequently updated information. The pipeline is: chunk documents, compute embeddings, store them in a vector index, retrieve top-k relevant chunks at query time, then ground the generation by including retrieved text in the prompt and instructing the model to cite sources. DP-100 questions typically test whether you know what improves retrieval quality versus what improves generation quality.
Chunking controls recall and precision. Smaller chunks improve precision but can lose context; larger chunks preserve context but can dilute relevance. A common practical approach is to chunk by semantic boundaries (headings/sections) and include overlap to preserve continuity. If an exam vignette says “retrieval returns irrelevant passages,” the fix is often chunk strategy, metadata filtering, or embeddings choice—not fine-tuning the LLM.
Embeddings convert text to vectors; quality depends on the embedding model and consistent preprocessing. Normalizing text, keeping the same language, and storing metadata (document id, timestamp, access control tags) enable filtered retrieval—important when documents have different confidentiality levels.
Grounding means instructing the model to answer only from retrieved context and to say “not found” when the context lacks the answer. This reduces hallucinations and supports compliance. Citations further increase auditability; the exam may describe requirements like “provide policy section references,” which strongly indicates RAG with citation formatting and source tracking.
Common trap: assuming RAG automatically prevents hallucinations. Without explicit grounding instructions, context selection, and evaluation, models can still fabricate. RAG is an architecture, not a guarantee.
Exam Tip: If the scenario includes “must respect document-level permissions,” your answer should mention security trimming/metadata filters in retrieval and careful logging practices—DP-100 expects governance thinking, not just vector search basics.
LLM optimization is not “prompt until it feels good.” DP-100-style questions reward candidates who turn quality into measurable evaluation. Start with a golden set: a curated collection of representative prompts with expected behaviors (and, for RAG, expected source documents). Golden sets should cover normal cases, edge cases, and policy-sensitive cases.
Offline metrics depend on the task. For extraction/classification, you can compute exact match, F1, and schema validity rates. For summarization, you may use faithfulness checks, coverage heuristics, and human scoring rubrics. For RAG, measure retrieval metrics (hit rate, precision@k) separately from generation metrics (citation correctness, groundedness). The exam often tests whether you can decompose the problem: if answers are wrong because retrieval misses the right chunk, tune retrieval; if retrieval is correct but answers omit key points, tune the prompt or response format.
Human review remains essential for subjective quality, safety, and business acceptability. DP-100 items may frame this as “human-in-the-loop approval” or “spot-checking,” especially for high-stakes outputs like medical or financial guidance.
Regression testing protects you when you change prompts, chunking, embedding models, or model versions. In Azure ML terms, think experiment tracking, versioned assets, and repeatable evaluation pipelines so you can compare runs. A strong answer mentions maintaining baselines and re-running the golden set after every change.
Common trap: using only one metric like “average rating” and missing failure modes (format errors, missing citations, policy violations). The exam likes multi-metric evaluation aligned to requirements.
Exam Tip: When asked “how to improve reliability,” prefer answers that add automated checks (schema validation, citation verification, refusal compliance) rather than purely subjective review. Automation scales and is test-aligned.
Safety is not an optional add-on; it is an exam-relevant requirement. DP-100 expects you to incorporate responsible AI controls across design, deployment, and monitoring. Start with content filtering to reduce harmful outputs and to implement category-based policies (hate, violence, sexual content, self-harm). Combine filters with prompt-level policies (system rules) and refusal behavior for disallowed requests.
Data privacy shows up in scenarios involving PII, PHI, secrets, or internal documents. Practical controls include redaction before logging, limiting what is stored in prompts/traces, and minimizing retention. In RAG, avoid embedding highly sensitive fields if not required, and apply access controls to the vector index. For DP-100, the key is demonstrating that you recognize data exposure paths: prompts, retrieved chunks, logs, evaluations, and monitoring dashboards.
Jailbreak awareness refers to attempts to override system instructions (“ignore previous instructions…”). Mitigations include isolating user content, using allowlisted tools, restricting tool arguments, and running adversarial test prompts as part of evaluation. Don’t claim jailbreaks can be eliminated; the exam favors layered defenses and monitoring.
Monitoring should track both performance and safety: refusal rates, policy violations, hallucination signals (e.g., missing citations), latency/cost, and drift in user query patterns. Alerting and audit logs support incident response and compliance.
Common trap: assuming “private endpoint” equals “safe.” Network isolation helps, but you still need content policies, logging controls, and evaluation for harmful behaviors.
Exam Tip: If the scenario mentions “regulated industry,” look for controls that produce evidence: audit trails, documented evaluation results, and repeatable safety tests—not just “we added a disclaimer.”
This domain is frequently tested with scenario-based multiple choice where several options sound plausible. Under timed conditions, use a decision checklist: (1) identify the task type (classification/extraction/summarization/chat), (2) identify the primary failure mode (missing knowledge vs inconsistent format vs unsafe output), (3) map to the correct lever (RAG vs prompt vs fine-tune), and (4) confirm governance/safety constraints (privacy, citations, monitoring).
Expect distractors that over-prescribe heavy solutions. For example, “fine-tune a large model” is often a trap when the scenario calls for retrieval of fresh documents, or when the requirement is citations and auditability. Another common distractor is “increase temperature for creativity” when the requirement is consistency and structured output. Under time pressure, prioritize deterministic settings and validation mechanisms.
Exam Tip: When two answers both include an LLM approach, choose the one that also includes evaluation and monitoring. DP-100 rewards end-to-end thinking: build, measure, govern, and iterate.
Use your practice runs to track which signal words you miss (citations, permissions, retention, regression). Those missed signals are typically the difference between a correct and an almost-correct choice in this chapter’s objective area.
1. A healthcare provider is building an internal assistant that answers clinician questions using the latest internal treatment protocols. Protocols change weekly, and the assistant must cite the exact policy section used in each response. Which approach should you choose first to meet these requirements with minimal retraining overhead?
2. You are deploying an LLM-based customer support feature. The business requires responses to be in strict JSON with fields: {"category","severity","next_action"}. You notice the model occasionally adds extra keys and prose. What is the best first step to improve reliability without changing the model weights?
3. A financial services company wants an assistant to generate explanations of loan decisions. They must prevent disclosure of personally identifiable information (PII) and need ongoing monitoring for policy violations in production. Which set of controls best addresses safety and compliance requirements for an LLM app?
4. A retailer wants an LLM to write product descriptions in a consistent brand voice and format. The product facts come from a catalog database that updates daily. The business does not require citations, but the descriptions must always reflect current product attributes (price, availability, specs). Which architecture best fits this use case?
5. You are asked to define evaluation criteria for a RAG-based internal policy assistant before pilot deployment. The assistant must answer only using the provided policy documents and must refuse when the answer is not present. Which evaluation approach best aligns with these requirements?
This chapter is your capstone: you will run a full, timed mock exam in two parts, then convert the results into a targeted remediation plan aligned to DP-100 objectives. DP-100 rewards disciplined workflow knowledge more than memorization: you must recognize which Azure Machine Learning (Azure ML) feature fits the scenario, identify the minimal set of steps to implement it, and avoid “almost-right” answers that violate governance, security, or reproducibility expectations.
You will complete Mock Exam Part 1 and Part 2 under strict timing, then perform a structured review that maps every miss (and every lucky guess) to the relevant domain: solution design, data/experimentation, training/deployment, LLM optimization, and lifecycle governance. The final sections give you a last-pass cram sheet and an exam-day checklist so you can execute with calm pacing and consistent answer selection.
Throughout, treat each question as a mini-consulting engagement: what is the business goal, what is the constraint (cost, latency, compliance, reproducibility), what Azure ML component is implied (workspace, compute, datastore, data asset, environment, pipeline, endpoint, registry), and what action is being tested (configure, secure, monitor, troubleshoot). Your goal is not just to “get it right,” but to get it right for the reason the exam writers expect.
Practice note for Mock Exam Part 1: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.
Practice note for Mock Exam Part 2: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.
Practice note for Weak Spot Analysis: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.
Practice note for Exam Day Checklist: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.
Practice note for Mock Exam Part 1: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.
Practice note for Mock Exam Part 2: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.
Practice note for Weak Spot Analysis: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.
Practice note for Exam Day Checklist: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.
Practice note for Mock Exam Part 1: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.
Practice note for Mock Exam Part 2: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.
Run the mock exam like the real DP-100: one sitting, closed notes, and a strict timebox. The exam often blends scenario-based items with configuration and troubleshooting prompts; success depends on pacing and avoiding deep rabbit holes. Plan for two passes: an “answer-now” pass to secure easy and medium points, then a “review/resolve” pass for flagged items.
Suggested pacing targets: allocate ~75% of time for the first pass and ~25% for review. If you find yourself rereading a scenario three times, flag it and move on. Most items contain a small number of decisive keywords (for example: “reproducible,” “regulated,” “low-latency,” “managed identity,” “private endpoint,” “online endpoint,” “batch scoring,” “feature drift,” “responsible AI,” “prompt injection”). Train yourself to hunt these words and map them to Azure ML primitives.
Exam Tip: Use a “stop-loss” rule: if you cannot reduce to two options within 60–90 seconds, flag and defer. The DP-100 is designed so you can recover points later by staying on schedule.
Mock rules: no internet, no documentation, and no code execution. This constraint forces you to internalize patterns: when to use pipelines versus notebooks, where to register models, how to choose managed online endpoints versus Kubernetes, and what monitoring artifacts look like in Azure ML. Track your confidence per item (high/medium/low) so your review time focuses on high-impact uncertainty.
Part 1 should feel “bread-and-butter” DP-100: translating requirements into Azure ML architecture, setting up data and compute, and running experiments with tracked lineage. Expect scenarios where the correct answer is less about ML theory and more about operationalizing the right Azure ML object: Data assets for managed datasets, Environments for dependency control, and Jobs (command/components) for repeatable execution.
Focus on these exam-tested patterns. First, identity and access: if a scenario mentions enterprise security, assume managed identity, least privilege, and avoiding embedded secrets. Second, reproducibility: if they mention auditability or repeatable training, favor pipelines, versioned data assets, pinned environments, and model registry usage. Third, experimentation: if they emphasize comparison across runs, interpret it as a prompt to use MLflow tracking, run metrics, and tags to capture hyperparameters and dataset versions.
Exam Tip: In Microsoft-style scenarios, the “right” option often uses the smallest number of services while still meeting governance. If two answers both work, the exam typically rewards the one that stays inside Azure ML managed capabilities (assets, jobs, pipelines, endpoints) rather than custom glue.
Common traps in Part 1: confusing “datastore” (storage connection) with “data asset” (versioned, reusable dataset reference), and confusing “compute instance” (interactive dev) with “compute cluster” (scalable training). Also watch for answers that ignore cost controls (for example, leaving compute running) or that break isolation (for example, sharing credentials in code). Your job is to spot the hidden constraint and pick the option that respects it.
Part 2 should feel sharper: deployment edge cases, monitoring, data/model drift, and LLM solution optimization and safety. Here the exam often tests whether you can select between online vs batch endpoints, manage rollouts, and reason about operational telemetry. When latency and real-time inference are explicit, managed online endpoints are usually implied; when scoring large historical datasets, batch endpoints or pipeline-based scoring workflows are a better fit.
Expect items where the “trap” is choosing a technically correct approach that fails the stated non-functional requirements: private networking, regulated data handling, or reproducibility. For example, if the scenario mentions private access, look for private endpoints, VNet integration, and disabling public network access in the workspace and dependent resources. If it mentions rollback or safe deployment, think blue/green deployments, traffic splitting, and separate deployments under one endpoint.
LLM-related items can appear as “optimize language models for AI applications” objectives: prompt design, evaluation, and safety. The exam tends to reward structured evaluation (golden datasets, offline evaluation, and continuous monitoring) and safety mitigations (input/output filtering, prompt injection defenses, and grounded responses with citations where applicable). Avoid answers that treat prompt tweaks as a substitute for evaluation or that ignore content safety requirements.
Exam Tip: If an answer choice sounds like “just do it manually,” be skeptical. DP-100 repeatedly favors automated, repeatable processes: pipelines for training/scoring, registered models for deployment consistency, and monitoring hooks that produce measurable signals (latency, failure rate, drift metrics) rather than ad hoc checks.
Edge-case traps: mixing up Azure ML registries with workspace-level model registration, assuming that “AKS is required” for all production workloads (managed online endpoints are often sufficient), and overlooking environment pinning (unversioned dependencies can invalidate reproducibility claims). Part 2 is where exam writers reward operational maturity.
Your score is not the main output; your error log is. Immediately after finishing both parts, do a structured review while your reasoning is fresh. For every missed item and every guess, write: (1) what keyword(s) in the prompt should have guided you, (2) what DP-100 objective domain it maps to, (3) what you chose and why, and (4) what the correct reasoning pattern is. The goal is to convert confusion into a repeatable decision rule.
Map each item to the course outcomes: solution design, experimentation, training/deployment, LLM optimization, and governance. You should end with a ranked list of weak spots. For example, if you repeatedly miss “data asset vs datastore,” schedule a focused drill on Azure ML asset types and versioning. If you miss “online endpoint rollout,” drill traffic splitting, deployment slots, and monitoring signals.
Exam Tip: Do not stop at “I forgot.” Replace it with “I will recognize it next time because…” and write the recognition cue (for example: “auditability” → versioned assets + pipeline + environment pinning; “no public access” → private endpoint/VNet + managed identity).
Update your remediation plan: 30–45 minutes on the top two weak domains, then a short targeted re-test of only those concepts (not a full exam). Also annotate any “distractor patterns” that fooled you (for example, answers that mention many services, or answers that sound advanced but ignore the scenario constraint). This method is how you convert a mock exam into score growth.
Use this cram sheet as a last review pass. Keep it tight: definitions, decision rules, and traps.
Exam Tip: When you see “must be repeatable,” “audit,” or “traceability,” your answer should mention versioning (data/model/environment) and tracked execution (jobs/pipelines), even if the question is framed as a simple configuration choice.
Finally, rehearse how you eliminate distractors: remove options that violate a stated constraint, require unnecessary services, or describe a process that cannot be operationalized (no monitoring, no identity model, no versioning). DP-100 is as much about disciplined engineering as it is about modeling.
Go into exam day with a checklist you can execute under pressure.
Exam Tip: If two answers both sound plausible, ask: “Which one produces a versioned, monitored, least-privilege, repeatable outcome with fewer moving parts?” That question aligns closely to DP-100 scoring intent.
Confidence plan: expect a few unfamiliar phrasings. Do not spiral—DP-100 often repeats the same underlying patterns with new wording. Anchor on the keywords (latency, batch, private, audit, drift, rollout) and select the option that satisfies the requirement with Azure ML-native constructs. Finish with a final sweep of flagged items, then submit without second-guessing answered-and-supported choices.
1. You are reviewing results from a timed DP-100 mock exam. Several missed questions involved deploying models with consistent dependencies across dev and prod. You want a remediation task that directly improves reproducibility for training and deployment in Azure ML. What should you focus on first?
2. A team must run a full mock exam in two parts and then perform a structured weak-spot analysis mapped to DP-100 domains (design, data/experimentation, training/deployment, LLM optimization, governance). Which review approach best aligns with exam expectations?
3. A financial services company is preparing for production deployment and must ensure only approved models are deployed across multiple workspaces while maintaining auditability. Which Azure ML feature best supports this governance requirement?
4. You are creating an exam-day checklist for DP-100 and want to reduce errors caused by choosing "almost-right" answers. In a scenario question that includes compliance and reproducibility constraints, what should you verify before selecting an answer involving data usage in Azure ML?
5. During the mock exam, you consistently miss questions where the correct answer involves "minimal steps" to operationalize an ML workflow. A scenario asks you to automate repeatable training and evaluation with clear lineage between steps. Which Azure ML component is the most appropriate starting point?