HELP

+40 722 606 166

messenger@eduailast.com

Supply Chain Stress Testing: AI Inventory Scenario Modeling

Career Transitions Into AI — Intermediate

Supply Chain Stress Testing: AI Inventory Scenario Modeling

Supply Chain Stress Testing: AI Inventory Scenario Modeling

Turn planning instincts into AI-driven inventory simulations in 6 chapters.

Intermediate supply-chain · inventory · scenario-modeling · monte-carlo

Why this course exists

Most supply chain planning work already involves scenario thinking: “What if demand spikes?”, “What if lead time stretches?”, “What safety stock actually protects service?”. The gap is that many teams answer these questions with static spreadsheets, point forecasts, and single-number buffers—tools that hide uncertainty instead of measuring it. This book-style course teaches you how to become an AI-adjacent scenario modeler by building inventory stress tests using simulations.

You’ll learn how to convert planner instincts into a structured modeling workflow: define decisions, quantify uncertainty, run Monte Carlo experiments, and present results as ranges and trade-offs that leaders can act on. The focus is practical and career-oriented: you will finish with a reusable notebook template and a portfolio-ready scenario analysis.

What you’ll build

Across six chapters, you will progressively create an inventory simulation model that can answer questions like:

  • What is our stockout risk under longer lead times?
  • How much working capital is tied up to achieve a target service level?
  • Which parameters (demand volatility, supplier reliability, MOQ) drive our outcomes the most?
  • What happens under disruptions like port delays, supplier failures, or demand shocks?

By the end, you will have a scenario library (baseline, upside, downside, disruption), consistent experiment outputs, and stakeholder-ready summaries of service vs cost trade-offs.

How the learning progresses (the “book” arc)

Chapter 1 turns your planning knowledge into a modelable system: clear KPIs, system boundaries, assumptions, and a scenario charter. Chapter 2 builds data foundations—clean inputs, distribution choices, and validation checks—so your simulation doesn’t become “garbage in, garbage out.”

Chapter 3 is where you implement Monte Carlo inventory simulations in Python: inventory position logic, pipeline orders, backorders, and service metrics over many trials. Chapter 4 adds experiment design and disruption scenarios so you can compare policy choices fairly and communicate uncertainty with the right visuals and summaries.

Chapter 5 introduces validation and model risk: backtesting, sensitivity analysis, and assumption stress testing—exactly what stakeholders will ask before trusting results. Chapter 6 helps you package everything into a portfolio project and career narrative, connecting scenario modeling to broader AI paths like forecasting and optimization.

Who this is for

This course is designed for supply chain planners, inventory analysts, buyers, operations analysts, and planners-in-transition who want a credible entry point into AI and data science without losing the business context. If you can explain service levels and replenishment policies, you already have the domain advantage—this course teaches you how to quantify uncertainty and ship a simulation-driven decision tool.

How to get started

You’ll want a laptop that can run Python notebooks and a basic comfort with spreadsheets. If you’re ready to formalize your scenario work into repeatable simulations, start here: Register free. Prefer to compare options first? You can also browse all courses.

Outcome

After completing this course, you will be able to build and defend inventory stress tests using simulation, explain results in business language, and present a portfolio artifact that supports a career transition into AI-adjacent supply chain roles.

What You Will Learn

  • Translate supply chain planning questions into simulation-ready problem statements
  • Build Monte Carlo simulations for inventory stress testing (demand and lead time variability)
  • Model key policies: reorder point, order-up-to level, safety stock, and review cycles
  • Estimate service levels, fill rate, stockout risk, and working-capital impact from simulations
  • Design scenario libraries (baseline, upside, disruption) and compare outcomes consistently
  • Validate assumptions, perform sensitivity analysis, and communicate uncertainty to stakeholders
  • Create a reusable Python notebook template for scenario modeling and reporting
  • Position your experience for AI-adjacent roles: scenario modeler, analytics translator, planning data analyst

Requirements

  • Comfort with basic supply chain concepts (demand, lead time, safety stock, service level)
  • Basic Excel skills (filters, pivots, simple formulas) for quick checks
  • Intro-level Python familiarity (variables, loops, pandas basics) or willingness to learn
  • A laptop with Python available (Anaconda or similar) and ability to run notebooks

Chapter 1: From Planner to Scenario Thinker

  • Define the business decision and KPI: service vs cash vs risk
  • Map the inventory system: item, location, policy, constraints
  • Choose the uncertainty sources to simulate (demand, lead time, yield)
  • Set the baseline dataset and data dictionary
  • Draft your scenario charter and acceptance criteria

Chapter 2: Data Foundations for Inventory Simulation

  • Assemble item-location history and clean the time series
  • Estimate distributions for demand variability
  • Estimate lead time distributions and supplier reliability inputs
  • Create a parameter table for policies and constraints
  • Build sanity checks and back-of-the-envelope validations

Chapter 3: Monte Carlo Inventory Stress Tests in Python

  • Implement a single-item simulation loop end-to-end
  • Add randomness for demand and lead time with seeded runs
  • Track inventory position, pipeline, and backorders correctly
  • Compute service metrics across many trials
  • Package the simulation into reusable functions

Chapter 4: Scenario Design and Comparative Experiments

  • Build a scenario library with controlled parameter changes
  • Run experiments and generate comparable result tables
  • Quantify trade-offs: cost vs service vs volatility
  • Add disruption events (port delay, supplier failure, demand shock)
  • Create executive-ready scenario summaries

Chapter 5: Validation, Sensitivity, and Model Risk

  • Validate the baseline against historical performance
  • Run sensitivity analysis on key drivers and parameters
  • Stress test model assumptions and document limitations
  • Calibrate safety stock and reorder points from simulation outputs
  • Create a model risk checklist for stakeholders

Chapter 6: Ship a Scenario Modeler Portfolio Project

  • Finalize a complete notebook project with documentation
  • Create a lightweight results dashboard and narrative
  • Prepare stakeholder readouts: ops, finance, and leadership
  • Define how this fits AI: roadmap to forecasting and optimization
  • Translate the project into resume bullets and interview stories

Sofia Chen

Supply Chain Data Scientist, Simulation & Risk Modeling

Sofia Chen is a supply chain data scientist who builds simulation-based decision tools for inventory, service levels, and disruption risk. She has led scenario modeling initiatives across retail and manufacturing, translating planner workflows into reproducible analytics in Python. Her teaching focuses on practical models, clear assumptions, and business-ready outputs.

Chapter 1: From Planner to Scenario Thinker

Traditional supply chain planning rewards decisiveness: pick a forecast, place an order, hit a service target. AI-enabled stress testing rewards a different skill: turning ambiguous planning questions into simulation-ready problem statements that make uncertainty explicit. In this course, you will move from “What should I order?” to “What ordering policy is robust across demand and lead time variability, and what is the service/cash/risk trade-off?”

This chapter establishes the working mindset and the practical workflow you will use throughout the course. You will (1) define the business decision and KPI—usually a tension between service, working capital, and risk; (2) map the inventory system—item, location, replenishment policy, constraints; (3) select uncertainty sources worth simulating—demand, lead time, and sometimes yield; (4) establish a baseline dataset and data dictionary so models are repeatable; and (5) draft a scenario charter with acceptance criteria so stakeholders know what “good enough” looks like before you build.

The output of Chapter 1 is not code. It is a crisp scenario design that an engineer can implement and a planner can defend: a shared definition of the system, the KPIs, the uncertainties, and the comparison method for baseline versus stress scenarios. That is what makes Monte Carlo simulations useful, rather than interesting-but-irrelevant.

Practice note for Define the business decision and KPI: service vs cash vs risk: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Map the inventory system: item, location, policy, constraints: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Choose the uncertainty sources to simulate (demand, lead time, yield): document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Set the baseline dataset and data dictionary: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Draft your scenario charter and acceptance criteria: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Define the business decision and KPI: service vs cash vs risk: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Map the inventory system: item, location, policy, constraints: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Choose the uncertainty sources to simulate (demand, lead time, yield): document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Set the baseline dataset and data dictionary: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Sections in this chapter
Section 1.1: Planner mental models vs probabilistic models

Planners often operate with a deterministic mental model: a single demand number for next week, a standard lead time, and an implicit assumption that “exceptions” will be handled when they occur. That approach works when variability is low, expediting is easy, and the cost of being wrong is modest. Stress testing starts where deterministic planning breaks: variability is not an exception, it is the system.

A probabilistic model does not replace planning judgment; it formalizes it. Instead of asking “What is the forecast?”, you ask “What distribution of demand should we assume, and how does it change under different scenarios?” Instead of “Lead time is 14 days,” you ask “What is the probability lead time exceeds 21 days, and what happens to service when it does?” Monte Carlo simulation is the practical bridge: you run thousands of plausible futures and summarize outcomes (service, fill rate, stockouts, cash) as distributions, not single values.

Common mistake: treating simulation as a forecasting contest. The goal is not to predict the next month perfectly; it is to evaluate decisions and policies under uncertainty. A good simulation is decision-centric: it has a clear replenishment policy (reorder point, order-up-to, review cycle), clear constraints (MOQ, pack, capacity), and clear measures of success.

  • Planner question: “Should we increase safety stock?”
  • Simulation-ready statement: “For item X at DC Y with (s,S) weekly review, compare safety stock multipliers 0.8×–1.5× under demand and lead time variability; report fill rate, stockout days, and average inventory value.”

That translation—from vague intent to a testable, auditable model—is the first career-transition skill you are building.

Section 1.2: Inventory KPIs that matter under uncertainty

Stress testing starts by defining the business decision and the KPI. Under uncertainty, “improve service” is not specific enough because service has multiple definitions, and each pushes policy choices differently. Align on the KPI set before you simulate, or you will optimize the wrong thing with impressive rigor.

Three tensions show up repeatedly: service (customer impact), cash (working capital), and risk (tail events). Service is commonly measured as cycle service level (probability of no stockout in a cycle) and fill rate (fraction of demand immediately filled). These diverge under lumpy demand: you can have frequent small stockouts (low cycle service) but still fill most units (high fill rate). Cash is typically captured by average on-hand and on-order inventory value, sometimes converted to working capital or carrying cost. Risk is captured by stockout probability, expected stockout days, backorder magnitude, and tail percentiles (e.g., 95th percentile of backorders).

Common mistake: reporting averages only. Averages hide stress. Two policies with the same average inventory can have very different worst-week outcomes. In stakeholder language, translate distributions into decision-relevant statements: “Policy A meets 98% fill rate on average but has a 10% chance of exceeding 5 days of stockout in a month; Policy B reduces that to 2% at the cost of $120k more inventory.”

Practical workflow: define (1) primary KPI (e.g., fill rate), (2) guardrail KPIs (e.g., max stockout days, budget cap), and (3) the time basis (daily vs weekly). Your simulation outputs should match these definitions exactly, including how you treat lost sales vs backorders—an assumption that dramatically changes service calculations.

Section 1.3: System boundaries and assumptions checklist

Before building scenarios, map the inventory system in a way that is precise enough to simulate. Start with boundaries: what item(s), what location(s), what time horizon, and what replenishment loop. A “system map” for stress testing is less about drawing arrows and more about encoding rules: when do you review inventory, when do you place orders, what constraints apply, and how do receipts arrive.

At minimum, document: item (SKU, unit of measure), location (DC/store/plant), policy (reorder point, order-up-to level, safety stock logic, review cycle), and constraints (MOQ, order multiples, capacity, supplier allocation, shelf life). Then define what your simulation will and will not include. For example, are you modeling transfers between DCs? Are you modeling expediting as an emergency lead time reduction with added cost? Are you modeling partial shipments?

Uncertainty sources should be explicit: demand variability, lead time variability, and possibly yield variability (especially for manufacturing, perishable goods, or quality-sensitive suppliers). A useful rule: simulate any uncertainty that can plausibly change the decision. If lead time is stable but demand is highly volatile, focus there. If demand is steady but port delays drive tail risk, lead time tails matter more than mean.

  • Lost sales vs backorders: decide and document.
  • Time granularity: daily demand with weekly ordering is different than weekly demand with weekly ordering.
  • Inventory position definition: on-hand + on-order − backorders; ensure the code matches the definition.
  • Warm-up period: decide whether to start at current on-hand or run a burn-in to steady state.

Common mistake: silently changing boundaries between scenarios. If the baseline assumes no expediting but the disruption scenario includes expediting, the comparison mixes policy changes with environmental changes. Keep boundaries stable; vary one lever at a time unless your charter explicitly allows combined changes.

Section 1.4: Data you need (and what to do when you don’t have it)

Monte Carlo simulation is only as credible as its baseline dataset and its data dictionary. The goal is not perfect data; the goal is to make assumptions visible, consistent, and revisable. Start by listing what is required to run the replenishment logic and compute KPIs.

Baseline dataset essentials typically include: historical demand by time bucket; lead time history (order date to receipt date) or at least mean and variability; current on-hand and open purchase orders; item cost (for working capital); policy parameters (reorder point, order-up-to, safety stock, review cycle); and constraints (MOQ, multiples, supplier capacity). Your data dictionary should define each field, unit of measure, time zone/calendar, and how missing values are handled. This sounds administrative, but it prevents silent errors like mixing cases and units, or interpreting lead time in business days when the system uses calendar days.

When you don’t have data, use structured substitutes rather than guesses. For demand, if history is short, fit a simple distribution by category (e.g., normal for stable items, negative binomial for intermittent, empirical bootstrapping for lumpy). For lead time, if only standard lead time exists, introduce variability using a plausible coefficient of variation and test sensitivity. For yield, start with a pass-rate assumption and model occasional failures if quality issues are known.

Common mistake: using a single “average demand” and “average lead time” as inputs and still calling the result a stress test. The point is to preserve variability. Even a rough distribution (e.g., bootstrap historical demand and a triangular lead time) is better than a deterministic average—provided you label it clearly in the scenario charter and test sensitivity.

Practical outcome of this section: a baseline table you can hand to any teammate, plus a data dictionary that makes your simulation repeatable across tools and months.

Section 1.5: Scenario framing: baseline, upside, downside, disruption

A scenario library is how you keep stress testing from becoming an endless one-off analysis. Scenarios should be named, parameterized, and comparable. The simplest library that still drives good decisions includes: baseline, upside, downside, and disruption. Each scenario is a controlled change to the uncertainty model and/or constraints, while the inventory policy under test remains the decision variable.

Baseline is not “business as usual” in prose; it is a specific dataset version and distribution choice. Upside might mean higher mean demand with similar variability, or increased promotion-driven volatility. Downside could mean demand drop plus higher intermittency (more zero-demand periods) that increases obsolescence risk. Disruption usually targets lead time tail risk: longer delays, higher variance, or occasional extreme events (port closure, allocation). Yield disruption can be modeled similarly (higher defect rate, occasional batch failures).

Write a scenario charter for each scenario: purpose, changes from baseline, parameters, time horizon, and acceptance criteria. Acceptance criteria define what decisions the scenario must support (e.g., “Compare (s,S) weekly vs continuous review on fill rate and working capital; results must be stable within ±0.2% fill rate across random seeds”). This prevents “analysis drift,” where stakeholders keep adding requirements after results appear.

  • Keep comparability: same KPI definitions, same horizon, same service calculation rules.
  • Control randomness: use common random numbers (same random streams) across policy comparisons when possible.
  • Document levers: demand mean shift, demand variance multiplier, lead time distribution change, capacity cap, MOQ change.

Common mistake: mixing a scenario (environment) with a mitigation (decision). For example, “disruption with expediting” is two changes. Split it: one scenario for disruption, and a policy variant that adds expediting logic. Then you can quantify how much the mitigation helps.

Section 1.6: Reproducibility: notebooks, versioning, and audit trails

Scenario modeling is decision support, which means your work will be questioned—often months later, sometimes by people who were not in the room. Reproducibility is not optional. Treat every simulation like a lightweight product: it needs inputs, configuration, outputs, and an audit trail.

Use a notebook (or scripted pipeline) to make the workflow explicit: load baseline dataset (versioned), validate schema, generate simulated demand and lead time paths, run the inventory policy, compute KPIs, and save results. Store configuration in a human-readable file (YAML/JSON): scenario name, parameter values, random seed(s), time horizon, and KPI definitions. This allows you to rerun “Baseline v3 + Disruption LT_tail v1” exactly, rather than recreating it from memory.

Version everything that changes outcomes: raw extracts, cleaned tables, and code. If you cannot use a full data versioning tool, at least timestamp and hash input files and record them in the output metadata. Build simple validation checks: no negative demand, lead times within expected bounds, inventory position consistency, and reconciliation of starting inventory with ERP snapshots.

Common mistake: presenting a single chart without the run context. Always attach: scenario charter reference, dataset version, seed, number of trials, and policy parameters (reorder point, order-up-to, safety stock method, review cycle). Practical outcome: stakeholders gain confidence because they can trace results to assumptions, and you can iterate quickly—changing one assumption at a time and showing how sensitive the decision is to uncertainty.

Chapter milestones
  • Define the business decision and KPI: service vs cash vs risk
  • Map the inventory system: item, location, policy, constraints
  • Choose the uncertainty sources to simulate (demand, lead time, yield)
  • Set the baseline dataset and data dictionary
  • Draft your scenario charter and acceptance criteria
Chapter quiz

1. How does AI-enabled supply chain stress testing change the core planning question compared with traditional planning?

Show answer
Correct answer: It shifts from choosing a single best forecast/order to testing which ordering policy is robust under uncertainty and understanding service/cash/risk trade-offs.
The chapter emphasizes moving from a single-point decision to a simulation-ready question that makes uncertainty explicit and evaluates trade-offs.

2. When defining the business decision and KPI in a scenario design, what tension is most commonly surfaced?

Show answer
Correct answer: Service level versus working capital versus risk.
Chapter 1 frames KPI definition as balancing service, cash (working capital), and risk.

3. What does it mean to “map the inventory system” for scenario modeling?

Show answer
Correct answer: Specify the item, location, replenishment policy, and constraints that define the system being simulated.
System mapping is about defining the scope and mechanics of inventory behavior: item/location, policy, and constraints.

4. Which set of uncertainty sources aligns with what Chapter 1 recommends simulating?

Show answer
Correct answer: Demand and lead time variability, and sometimes yield.
The chapter highlights demand and lead time as core uncertainties, with yield included in some cases.

5. Why does Chapter 1 require a baseline dataset, data dictionary, and a scenario charter with acceptance criteria before building models?

Show answer
Correct answer: To make simulations repeatable and ensure stakeholders agree on what “good enough” means and how baseline vs stress scenarios will be compared.
The workflow is designed to produce a defendable, implementable scenario design (not code) with clear data definitions and success criteria.

Chapter 2: Data Foundations for Inventory Simulation

Inventory stress testing lives or dies on data foundations. Monte Carlo simulation can absorb uncertainty in demand and lead time, but it cannot rescue misaligned time buckets, inconsistent units, or policy parameters that were never made explicit. This chapter turns “planning questions” into simulation-ready inputs: clean item-location histories, realistic distributions, and a parameter table that encodes how the business actually orders and receives goods.

Your goal is not perfect forecasting. Your goal is a dataset and set of assumptions that allow you to ask: “If demand and lead time vary like this, and we follow policy X under constraints Y, what service level, fill rate, stockout risk, and working-capital exposure should we expect?” To answer that, you will assemble an item-location time series, estimate demand variability distributions, estimate lead-time distributions (including supplier reliability), and create a parameter table for reorder points, order-up-to levels, safety stock, and review cycles. You will also build sanity checks—back-of-the-envelope validations that catch obvious errors before you simulate thousands of scenarios.

Two engineering judgments come up repeatedly. First: decide what variability belongs in the model versus what should be cleaned away (e.g., one-time data entry mistakes). Second: decide what level of aggregation preserves decision relevance (daily vs weekly, SKU-store vs SKU-region). The sections below give a practical workflow and the pitfalls that typically mislead new practitioners—especially those transitioning into AI roles and relying on automated pipelines.

Practice note for Assemble item-location history and clean the time series: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Estimate distributions for demand variability: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Estimate lead time distributions and supplier reliability inputs: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Create a parameter table for policies and constraints: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Build sanity checks and back-of-the-envelope validations: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Assemble item-location history and clean the time series: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Estimate distributions for demand variability: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Estimate lead time distributions and supplier reliability inputs: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Create a parameter table for policies and constraints: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Sections in this chapter
Section 2.1: Time buckets, calendars, and alignment pitfalls

Before cleaning demand or fitting distributions, choose a time bucket that matches the replenishment decision. If orders are reviewed weekly, a weekly bucket is usually the most simulation-ready. If you simulate daily demand but only place orders weekly, you must model both the daily consumption and the weekly review cycle explicitly—otherwise service metrics will be biased.

Build an item-location calendar table first. It should include every bucket (day/week), even when demand is zero, and it should carry business calendar flags: weekends, holidays, fiscal weeks, and store closures. A common mistake is letting missing rows imply missing data rather than true zero demand; intermittent SKUs will look artificially “dense,” inflating average demand and deflating stockout risk.

Alignment is where many inventory simulations quietly go wrong. Demand is often recorded by ship date or POS date, while inventory is recorded by end-of-day on-hand snapshots, and receipts are recorded by posting date. Pick a consistent convention (e.g., events within a day affect end-of-day on-hand) and shift series as needed. If you subtract demand from inventory snapshots without aligning timestamps, you will infer negative inventory or phantom receipts.

Practical workflow: (1) define bucket size; (2) create a complete calendar per item-location; (3) left join demand, receipts, and on-hand snapshots; (4) add a “data completeness” flag per bucket; (5) store timezone and cutoff rules. This is the foundation that makes Monte Carlo outputs credible to operations teams.

Section 2.2: Outliers, promotions, and intermittent demand handling

Cleaning is not about making the data look smooth; it is about making the variability represent what could happen again. Start by separating three classes of anomalies: (a) true spikes (promotions, one-time bulk buys), (b) operational artifacts (stockouts that censor demand), and (c) errors (unit-of-measure mistakes, duplicate orders).

Promotions and events should be tagged, not blindly removed. If the business plans promotions, they belong in an “upside” scenario library; if they were one-time liquidation events, they may be excluded from baseline but retained for stress tests. A robust practice is to create a demand series with a companion “event_type” column so you can include/exclude segments consistently across scenarios.

Intermittent demand requires special handling because many buckets are zeros. If you fit a continuous distribution to nonzero values only, you will overstate demand frequency. A simple and effective approach is a two-part model: a Bernoulli process for whether demand occurs in a bucket, plus a distribution for the demand size given it occurs. Alternatively, use an empirical distribution that includes zeros explicitly.

Do not ignore stockout-censored demand. If inventory hits zero, observed demand may drop even though true demand exists. If you treat those zeros as real, your model will understate demand variability and produce overly optimistic service levels. Practical fix: flag “censored buckets” where on-hand was zero (or below a threshold) and either exclude them from distribution fitting or impute demand using nearby comparable periods, then test sensitivity by bracketing low/medium/high demand assumptions.

Section 2.3: Distribution choices (normal, lognormal, Poisson, empirical)

Inventory simulation needs distributions that produce plausible draws and preserve tail risk. You typically estimate demand variability per item-location per bucket, then draw demand in each simulated period. The right distribution depends on volume, intermittency, and whether negative values are possible (they are not for demand).

Normal is tempting because it is easy, but it can generate negative demand and often underrepresents skew. It can work for high-volume, stable items when you truncate at zero, but truncation changes mean/variance and must be handled carefully.

Lognormal is a strong default for positive, right-skewed demand sizes, especially when variability grows with the mean. It preserves non-negativity and creates realistic spikes. The downside is that it can exaggerate tail events if fit on small samples, so pair it with sanity checks and consider capping extreme quantiles for baseline scenarios.

Poisson (and negative binomial variants) fits count-like demand—useful when demand is small integers and the bucket is short (daily). Pure Poisson assumes mean equals variance; if your data is overdispersed, Poisson will understate volatility and stockout risk. In that case, empirical or negative binomial is safer.

Empirical distributions (resampling historical buckets) are often the most defensible for stress testing because they retain seasonality and zero-mass naturally. They require enough history and careful treatment of structural change. A practical hybrid is: resample within season (e.g., by month) and overlay a rare “shock” distribution for disruptions.

Engineering judgement: fit multiple candidates and compare not just AIC/BIC, but operational metrics—95th percentile demand, frequency of zeros, and implied safety stock. Always document what the distribution is meant to represent: baseline variability, upside demand, or disruption. That clarity is what allows consistent scenario libraries later.

Section 2.4: Lead time decomposition: processing, transit, receiving

Lead time is rarely a single number; it is a process with stages, each with its own variability. For simulation, decompose lead time into at least three parts: processing (order entry, pick/pack, production), transit (carrier movement, customs), and receiving (dock congestion, put-away, system posting). This decomposition improves realism and makes reliability levers actionable.

Start with event timestamps: PO created, PO confirmed, shipped, arrived, received/posted. If your systems do not capture all milestones, do not invent precision—use what exists and aggregate stages accordingly. The key is consistency: define lead time as “order date to available-for-sale date” (or similar) and stick to it across all items and locations.

Supplier reliability inputs often matter more than average lead time. Capture (a) on-time rate, (b) partial shipment frequency, and (c) expediting probability. In simulation terms, you may model lead time as a mixture distribution: with probability p it follows a “normal” lead time distribution, and with probability (1-p) it follows a “late” distribution representing disruptions.

Common mistake: fitting a normal distribution to lead time. Lead times are nonnegative and often right-skewed; lognormal, gamma, or empirical are typically better. Another mistake is ignoring receiving delay. Many teams simulate “arrival” but measure service based on “available inventory,” which makes results look better than reality.

Practical outcome: when you later test policies (reorder point, order-up-to), you can attribute service failures to specific lead-time stages, enabling targeted mitigations like carrier changes, PO cutoff adjustments, or dock scheduling—rather than blanket increases in safety stock.

Section 2.5: Parameterization: MOQ, pack size, capacity, review cycle

Simulation requires a parameter table that encodes replenishment policy and constraints per item-location. Treat this as productized data, not a spreadsheet artifact. At minimum, include: reorder point (ROP), order-up-to level (S), safety stock, review cycle, minimum order quantity (MOQ), pack size or case multiple, supplier capacity limits, and any shelf-life or maximum inventory constraints.

Be explicit about the ordering logic. In a periodic review system, you order every R days and target S; in a continuous review system, you order when inventory position drops below ROP. Many businesses use hybrids (e.g., “review daily, order only on Mon/Thu”). Encode this as a rule table so the simulator can apply it deterministically.

Constraints change outcomes as much as variability. MOQ and pack size create lumpy inventory that inflates average on-hand and can paradoxically increase stockouts if orders are delayed waiting to reach MOQ. Capacity constraints (supplier allocation, warehouse throughput, budget caps) create rationing behavior—if you ignore them, your simulation will overstate achievable service levels in disruption scenarios.

Practical workflow: (1) define canonical units (each, case, pallet) and convert all parameters; (2) store both business inputs (e.g., MOQ in cases) and computed equivalents (MOQ in eaches); (3) record parameter provenance (system, contract, planner override); (4) version the table by effective date. This turns policy modeling into reproducible engineering rather than tribal knowledge.

Section 2.6: Data validation: balancing, units, and reasonableness tests

Before running Monte Carlo, run deterministic validations that catch the 80% of problems that create misleading results. Start with a basic inventory balance check per item-location per bucket: ending_on_hand should equal starting_on_hand + receipts - demand (plus/minus adjustments). You will rarely match perfectly due to shrink, cycle count corrections, and timing differences—but large unexplained residuals are a red flag that alignment or units are wrong.

Units and sign conventions are the most common silent failure. Confirm whether demand is recorded as positive sales or negative issues; whether receipts include returns; whether transfers are double-counted across locations. Create unit tests: pick 10 random item-locations, manually compute one month of balances, and compare to the pipeline output.

Back-of-the-envelope reasonableness tests make you faster and more credible. Examples: (a) implied turns = annual demand / average on-hand should be within a plausible range; (b) average demand during known off-season should be near zero for seasonal items; (c) lead time percentiles should respect physical reality (transit cannot be 0 days for overseas freight). For each test, define thresholds and generate a data quality report.

Finally, validate that fitted distributions reproduce key historical moments. Simulate a short historical period using observed ordering rules and compare simulated service level and average on-hand to actuals. You are not calibrating a perfect digital twin; you are checking that the model is directionally correct and that uncertainty bands are believable.

When these validations pass, your simulation outputs—service level, fill rate, stockout risk, and working-capital impact—become defensible. The remainder of the course will build the Monte Carlo engine and scenario library on top of these foundations.

Chapter milestones
  • Assemble item-location history and clean the time series
  • Estimate distributions for demand variability
  • Estimate lead time distributions and supplier reliability inputs
  • Create a parameter table for policies and constraints
  • Build sanity checks and back-of-the-envelope validations
Chapter quiz

1. Why does Chapter 2 emphasize that Monte Carlo simulation cannot “rescue” certain data problems?

Show answer
Correct answer: Because simulation depends on aligned time buckets, consistent units, and explicit policy parameters; otherwise results are meaningless
The chapter stresses that uncertainty can be modeled, but foundational issues like misaligned buckets, inconsistent units, or missing policy inputs invalidate simulations.

2. What is the primary goal of the data work in Chapter 2?

Show answer
Correct answer: To produce a dataset and assumptions that enable service level, fill rate, stockout risk, and working-capital exposure to be evaluated under variability
Chapter 2 frames the goal as simulation-ready inputs and assumptions for stress testing outcomes, not perfect forecasting or immediate optimization.

3. Which set of inputs best matches the “planning questions into simulation-ready inputs” workflow described in the chapter?

Show answer
Correct answer: Clean item-location histories, estimate demand and lead time distributions (including supplier reliability), and create a policy/constraint parameter table
The chapter highlights assembling item-location time series, estimating distributions for demand and lead time (plus reliability), and encoding policies/constraints in a parameter table.

4. What is the purpose of building sanity checks and back-of-the-envelope validations before running many scenarios?

Show answer
Correct answer: To catch obvious errors in inputs and assumptions before scaling up simulation runs
Sanity checks are meant to detect clear issues early (e.g., bad units or implausible parameters) before thousands of scenarios amplify the damage.

5. Which pair of engineering judgments does Chapter 2 say comes up repeatedly when preparing data for inventory simulation?

Show answer
Correct answer: Deciding what variability to model vs clean away, and choosing an aggregation level that preserves decision relevance (e.g., daily vs weekly, SKU-store vs SKU-region)
The chapter emphasizes judgment calls about separating true variability from errors and selecting an aggregation level that supports the decisions being modeled.

Chapter 3: Monte Carlo Inventory Stress Tests in Python

In Chapter 2 you learned to turn planning questions into simulation-ready statements. Now you will implement those statements as a working Monte Carlo inventory stress test in Python. The goal is not to build a “perfect” supply chain digital twin; it is to create a correct, auditable single-item engine that you can reuse across scenarios and policies.

A Monte Carlo stress test answers questions like: “If demand becomes more volatile and suppliers slip, how often do we stock out?”, “How much working capital might we tie up under an aggressive service target?”, and “Which policy parameter is most sensitive: reorder point or order-up-to level?” The workflow is consistent: define the time step and events, model randomness for demand and lead time, simulate inventory flows with correct accounting, record outcomes, and repeat across many trials with controlled seeds.

Throughout this chapter, treat correctness as a product feature. Most mistakes in inventory simulation are not mathematical—they are bookkeeping errors: receiving orders at the wrong time, double-counting on-order inventory, or calculating service levels from the wrong denominator. We will build the loop end-to-end, then package it into reusable functions that can run scenario libraries (baseline, upside, disruption) reliably.

  • Single-item focus: master one SKU before scaling to many.
  • Discrete time: daily or weekly buckets; choose one and be explicit.
  • Monte Carlo: many trials to quantify uncertainty, not just one “expected” run.

By the end of the chapter you will have a minimal simulation kernel you can trust, and a set of metrics you can explain to stakeholders without hand-waving.

Practice note for Implement a single-item simulation loop end-to-end: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Add randomness for demand and lead time with seeded runs: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Track inventory position, pipeline, and backorders correctly: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Compute service metrics across many trials: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Package the simulation into reusable functions: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Implement a single-item simulation loop end-to-end: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Add randomness for demand and lead time with seeded runs: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Track inventory position, pipeline, and backorders correctly: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Sections in this chapter
Section 3.1: Simulation architecture: events, states, and time steps

Inventory simulations become manageable when you separate events (things that happen) from state (what you track) and commit to a single time step. For a first stress test, use a discrete time model: each period (day or week) executes the same ordered sequence of events. This makes the simulation loop easy to audit and aligns with how most planning data is stored.

A practical event order for each period t is: (1) receive any shipments due at t, (2) realize demand and allocate from on-hand, creating backlog if needed, (3) update costs/metrics, (4) perform review (continuous or periodic) and place orders, (5) advance time. Pick an order and keep it consistent. Many “off-by-one” errors come from placing an order before demand occurs (or receiving after demand) without intending that behavior.

Your state variables typically include: on-hand inventory, backlog (or lost sales), a pipeline of outstanding orders with due dates, and any policy parameters (reorder point, order-up-to level, review cycle). You also need a record of history for metrics: demand, shipments, backorders, and inventory position over time.

Engineering judgement: decide whether you are modeling backorders (demand is delayed and eventually filled) or lost sales (demand disappears if not immediately served). The choice materially changes service metrics and working-capital behavior. For career transitioners, start with backorders because accounting is explicit and it highlights stockout pain clearly.

Implementation tip: represent the pipeline as a list of “open orders” with fields {quantity, arrival_time}. Each period, pop or sum those with arrival_time == t and add to on-hand. This structure is simple and avoids complex queues while you learn.

Section 3.2: Random sampling and reproducibility (seeds, trials)

Monte Carlo means you run the same simulation many times with different random draws. In Python, use numpy.random.Generator (not the legacy global RNG) so you can control seeds and reproduce results exactly. Reproducibility is not a luxury: when a stakeholder asks why a scenario looks worse this week, you need to know whether the model changed or the randomness changed.

In each trial you sample stochastic inputs: period demand and order lead times (or shipment delays). Demand might be Poisson, Normal (truncated at 0), Negative Binomial (overdispersion), or empirical bootstrap from history. Lead time might be discrete (e.g., 1–7 days) or modeled as a base plus random delay under disruptions.

Use a consistent pattern: rng = np.random.default_rng(seed), then inside a trial draw arrays for demand (length T) and optionally lead times (per order). For stress testing you usually want seeded runs so that when you compare policies you can use common random numbers: run policy A and policy B with the same random demand/lead time sequences. This reduces noise in comparisons and makes policy differences clearer.

Common mistake: reseeding inside the loop (e.g., every period). That destroys randomness and can create repeating patterns. Seed once per trial (or once per scenario) and draw sequentially.

Practical outcome: by structuring randomness as inputs, your simulation becomes a pure function of (policy, scenario parameters, random draws). That makes debugging easier, supports sensitivity analysis, and allows you to store “scenario libraries” as parameter sets plus seeds.

Section 3.3: Inventory accounting: on-hand, on-order, position, backlog

Correct inventory accounting is the heart of a credible stress test. You must distinguish on-hand (physically available), on-order (pipeline not yet received), inventory position (on-hand + on-order − backlog), and backlog (unfilled demand for backorder models). Policies typically trigger on inventory position, not on-hand, because pipeline inventory matters.

A reliable period update looks like this: first, add receipts to on-hand. Second, realize demand. If on-hand ≥ demand + backlog, you can clear backlog and satisfy demand fully; otherwise you ship what you can and increase backlog. Explicitly track shipments (units served) each period, because fill rate depends on shipped units, not on-hand levels.

Pipeline tracking: when you place an order of quantity Q with lead time L, create an entry arriving at time t+L. At each period, compute on-order as the sum of all pipeline quantities. Then inventory position is computed after demand allocation (and backlog update) but before ordering, depending on your chosen event order. Write this down in comments; future-you will thank you.

  • On-hand can never be negative. If it is, you have double-subtracted demand.
  • Backlog can grow without bound under severe disruptions; plan for it.
  • Inventory position can be negative; that is not an error under backorders.

Common mistakes to watch for: (1) counting receipts in both on-hand and on-order after arrival, (2) computing service level from inventory position instead of actual shipments, (3) forgetting backlog when allocating new demand (backlog has priority in most settings).

Practical outcome: once accounting is correct for one item, you can trust any policy logic layered on top. If accounting is wrong, every metric you compute will be misleading, no matter how sophisticated the distribution assumptions look.

Section 3.4: Policies: (s,S), reorder point, min-max, periodic review

Inventory policies are decision rules that turn state into orders. In stress testing, you typically compare a small set of policies across scenarios to see which is robust. The most common are reorder point (ROP), (s,S), min-max, and periodic review (order-up-to every R periods). These are closely related; your implementation should unify them rather than branching into many special cases.

Reorder point with fixed order quantity (Q, ROP): if inventory position ≤ ROP, place an order of size Q. This is easy to implement but can be inefficient under variable demand because Q may not match the gap to your target. (s,S) improves that: if inventory position ≤ s, order up to S (i.e., order quantity = S − position). A min-max policy is effectively (s,S) with names that business users recognize.

Periodic review (R, S): every R periods, compute inventory position and order up to S. This matches real operations where planners order weekly. In simulation, periodic review introduces “review cycle risk”: you may dip below target between reviews even if S is high. That effect is exactly what stress tests should reveal.

Safety stock is not a separate policy; it is a parameterization. For example, set ROP = expected demand during lead time + safety stock, where safety stock might be z * σ (under a normal approximation) or tuned directly via simulation to meet a target service level.

Implementation pattern: write a single function that receives (policy_type, parameters, state, t) and returns an order quantity (possibly 0) and a sampled lead time. Keep “review due?” as a boolean (t % R == 0) for periodic review. This packaging makes it easy to run a scenario library consistently.

Common mistake: ordering based on on-hand instead of inventory position. That tends to over-order during long lead times because you ignore pipeline inventory, inflating working capital and masking stockout risk.

Section 3.5: Metrics: CSL, fill rate, expected stockouts, cycle time

Metrics turn simulated histories into decisions. In stress tests you should compute metrics at two levels: per-trial (one possible future) and aggregated across trials (expected performance and risk). Define metrics precisely and compute them from the correct signals.

Cycle Service Level (CSL) is often interpreted as “probability of no stockout in a replenishment cycle.” In discrete simulation, a practical proxy is: fraction of cycles with zero backlog (or zero stockout events) between order placements/receipts, depending on your definition. Be explicit: CSL can be measured per cycle or per period, and those numbers differ.

Fill rate is unit-based: total units shipped on time divided by total demand units. Under backorders, “on time” typically means shipped in the same period as demanded; backlog shipped later does not count as immediate fill. Track shipped units each period so fill rate is computed as sum(immediate_shipments) / sum(demand). For lost sales models, fill rate equals 1 − (lost_units / demand_units).

Expected stockouts can mean (a) expected number of stockout periods, (b) probability of any stockout in the horizon, or (c) expected lost units/backordered units. Choose the one that matches the business question. Executives often hear “stockout risk” as probability of any stockout event; planners often care about expected backorder units.

Cycle time and responsiveness: you can estimate average time to clear backlog after a disruption, or average time inventory position stays below zero. These operational metrics are useful when comparing disruption scenarios: two policies may have similar fill rate, but one recovers much faster.

Aggregation across trials: report mean and percentile bands (e.g., p50/p90) for each metric. Percentiles communicate uncertainty better than a single average. Common mistake: averaging time series first and then computing metrics; instead compute metrics per trial, then summarize.

Section 3.6: Performance and scaling basics (vectorization, batching)

A correct single-item simulation loop can be written in pure Python and still run thousands of trials quickly for typical horizons (e.g., 365 days). But stress testing often becomes a scaling problem: more scenarios, more SKUs, more sensitivity runs. You do not need premature optimization; you need a few practical techniques.

First, batch trials. Instead of running one trial at a time with ad-hoc prints, write a function run_trial(params, rng) that returns a compact dictionary of metrics and (optionally) a few key time series. Then write run_monte_carlo(n_trials, seed) that loops over trials and stores only what you need. Keeping full daily histories for every trial will explode memory and slow analysis.

Second, use vectorized random draws even if the state update is a loop. For example, draw the entire demand array upfront with NumPy. This reduces Python overhead and makes your randomness reproducible and inspectable (you can plot the demand path for a given seed).

Third, profile before optimizing. If the pipeline structure becomes a hotspot, replace “list of orders” with a small array indexed by arrival time (a receipt schedule). For bounded lead times, you can maintain a ring buffer of receipts: receipts[t % max_L] += Q. This removes per-order objects and speeds up long runs.

Finally, package reusable components: a scenario object (distributions and parameters), a policy object (s,S,R), and a simulator function. This organization enables consistent scenario comparisons, sensitivity sweeps, and stakeholder-ready reporting. The main scaling bottleneck in real projects is not CPU—it is inconsistent assumptions across runs. Clean function boundaries help you avoid that.

Chapter milestones
  • Implement a single-item simulation loop end-to-end
  • Add randomness for demand and lead time with seeded runs
  • Track inventory position, pipeline, and backorders correctly
  • Compute service metrics across many trials
  • Package the simulation into reusable functions
Chapter quiz

1. What is the primary goal of the Monte Carlo inventory stress test engine built in this chapter?

Show answer
Correct answer: Create a correct, auditable single-item simulation kernel that can be reused across scenarios and policies
The chapter emphasizes correctness and auditability for a reusable single-item engine, not a perfect digital twin or a single deterministic run.

2. Why does the chapter emphasize running many trials with controlled (seeded) randomness?

Show answer
Correct answer: To quantify uncertainty across outcomes while keeping experiments reproducible
Monte Carlo uses many trials to measure uncertainty, and seeding makes runs repeatable for comparison and auditing.

3. Which set of bookkeeping elements must be tracked correctly to avoid common simulation errors?

Show answer
Correct answer: Inventory position, pipeline (on-order), and backorders
The chapter notes most mistakes are bookkeeping errors, especially around receipts timing, on-order inventory, and backorders.

4. What is the consistent workflow described for building the stress test?

Show answer
Correct answer: Define time step and events → model randomness for demand and lead time → simulate flows with correct accounting → record outcomes → repeat across many trials
The chapter outlines a repeatable workflow from time-step definition through Monte Carlo repetition with recorded outcomes.

5. What is the main reason to package the simulation loop into reusable functions?

Show answer
Correct answer: To run scenario libraries (baseline, upside, disruption) reliably and consistently
Functions support reuse across scenario sets and policies while maintaining consistent, auditable behavior.

Chapter 4: Scenario Design and Comparative Experiments

Stress testing inventory policies is only useful if scenarios are designed so results are comparable. In practice, teams often run “one-off” simulations with a handful of parameter tweaks, then struggle to explain why metrics changed. This chapter turns ad hoc simulation into an experimental discipline: build a scenario library with controlled parameter changes, run comparative experiments, quantify trade-offs (cost vs service vs volatility), incorporate disruption events, and produce executive-ready summaries that can stand up to scrutiny.

The core idea is simple: treat every simulation run as an experiment with a documented setup, a consistent measurement plan, and an auditable record of assumptions. You will separate what stays fixed (the baseline model and measurement) from what changes (a small, intentional set of parameters). You will also standardize outputs: service level, fill rate, stockout risk, inventory investment (working capital proxy), and cost components. Finally, you will communicate uncertainty by showing distributions, percentiles, and “typical vs tail” outcomes rather than a single average.

By the end of this chapter, you should be able to build a scenario library (baseline, upside, disruption) that can be re-run whenever forecasts, suppliers, or policies change, while preserving comparability over time.

Practice note for Build a scenario library with controlled parameter changes: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Run experiments and generate comparable result tables: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Quantify trade-offs: cost vs service vs volatility: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Add disruption events (port delay, supplier failure, demand shock): document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Create executive-ready scenario summaries: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Build a scenario library with controlled parameter changes: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Run experiments and generate comparable result tables: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Quantify trade-offs: cost vs service vs volatility: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Add disruption events (port delay, supplier failure, demand shock): document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Create executive-ready scenario summaries: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Sections in this chapter
Section 4.1: Experimental design: what changes, what stays fixed

Section 4.1: Experimental design: what changes, what stays fixed

Good scenario design starts with discipline: decide the “control” elements that must remain constant across scenarios. Typically, the control set includes the simulation horizon (e.g., 52 weeks), warm-up period, random seed strategy, SKU/location scope, demand/lead-time distribution families, and the exact metric definitions (cycle service level vs fill rate, how backorders are counted, what constitutes a stockout). If any of these move, you are no longer comparing scenarios—you are comparing models.

Next, define the “treatment” variables: the minimum set of parameters you intentionally change to answer a planning question. Examples include: demand mean (+10% upside), demand volatility (+50% CV), lead time mean (+7 days), lead time variance (fatter tail), review cycle (daily vs weekly), safety stock factor, reorder point, order-up-to level, or expediting rules. Keep treatments small and purposeful. A common mistake is to change five things at once and then argue about causality; treat that as exploratory work, not a decision experiment.

Use a simple experimental grid. For example, hold policy fixed and vary uncertainty first (demand CV × lead-time CV). Then hold uncertainty fixed and vary policy levers (reorder point multiplier × order-up-to level). This separation lets you quantify: (1) how fragile the current policy is under variability, and (2) which policy change provides the best mitigation. In Monte Carlo terms, run enough iterations to stabilize tail metrics (e.g., 95th percentile stockout days). If your executive question is “How bad can it get?”, do not stop when the mean stabilizes; check convergence on the percentile that matters.

  • Practical rule: One scenario answers one primary question. Additional tweaks belong in sensitivity analysis, not the main comparison table.
  • Common pitfall: Reusing different random seeds per scenario can inflate differences; prefer common random numbers (same seeds) when comparing policies so noise cancels out.
Section 4.2: Scenario naming, metadata, and traceability

Section 4.2: Scenario naming, metadata, and traceability

A scenario library becomes valuable when it is searchable, repeatable, and defensible. That requires consistent naming and metadata. Your scenario name should encode: scope (SKU group, lane, plant/DC), uncertainty regime (baseline/upside/disruption), and policy variant. Avoid vague labels like “Test 3.” Prefer something like DC01_SkuA_Baseline_RP1.2_OUTS1.0_RevWeekly. The name should be readable by humans, but also parseable by code to auto-generate tables and plots.

Metadata is what turns a simulation into an auditable artifact. At minimum store: parameter dictionary (all inputs, not just changed ones), model version/commit hash, data snapshot date, distribution choices and fit diagnostics, seed strategy, run timestamp, and a short scenario intent statement (“Assess whether weekly reviews sustain 98% fill rate under +30% demand CV”). When stakeholders ask “Why did last month’s results differ?”, you can point to a change in data snapshot, a policy assumption, or a model version—not guess.

Traceability also means recording what stayed fixed. Many teams only log deltas; later, no one can reconstruct the exact baseline. Treat baseline as a first-class scenario with the same metadata rigor. For comparative experiments, generate a scenario manifest (CSV/JSON) that lists all scenarios, their treatments, and their tags (baseline, upside, port_delay, supplier_failure). Your experiment runner can iterate the manifest, execute runs, and append results in a single standardized schema.

  • Engineering judgment: If a parameter is “obvious,” log it anyway. Hidden assumptions are the fastest way to lose trust.
  • Operational outcome: A traceable library supports continuous stress testing—new forecast, same scenario set, updated results in hours.
Section 4.3: Cost modeling: holding, ordering, expediting, stockout

Section 4.3: Cost modeling: holding, ordering, expediting, stockout

Service metrics alone can mislead. A policy that drives fill rate from 96% to 99% might double working capital and quietly increase obsolescence risk. Comparative experiments need a cost model that is consistent across scenarios and honest about trade-offs: cost vs service vs volatility.

Start with four cost buckets. Holding cost is typically modeled as average on-hand inventory × unit cost × annual holding rate (or weekly equivalent). Include warehousing, insurance, shrink, and cost of capital; if you cannot defend a single holding rate, run a sensitivity band (e.g., 18–30% annual). Ordering cost includes fixed per-order admin/transport costs; even if small, it matters when comparing review cycles and lot-sizing behavior. Expediting cost captures premium freight or supplier rush fees triggered by low inventory. Model it as an optional policy: expedite when projected stockout risk within lead time exceeds a threshold, with a defined premium and reduced lead time. Stockout cost is the hardest and most politically charged: it can be lost margin, contractual penalties, line stoppage, or customer churn proxy. If stakeholders resist a dollar value, present stockout in operational units (days stocked out, backorder volume) alongside a range of implied costs.

Volatility matters because it drives stress: two scenarios can have the same average inventory but very different peak inventory and cash swings. Track distributional metrics such as 95th percentile on-hand, peak backorder, and week-to-week order variability. Those metrics help connect inventory policy to warehouse capacity, cash planning, and supplier stability.

  • Common mistake: Mixing “fill rate” (volume-based) with a stockout cost calibrated to “order lines” without aligning units. Ensure your cost uses the same demand granularity as your service metric.
  • Practical outcome: Your result table should show a total cost number, but also its components so executives can see what they are buying (service) and what they are paying (capital, expedite, variability).
Section 4.4: Disruption modeling: shock magnitude, duration, recovery

Section 4.4: Disruption modeling: shock magnitude, duration, recovery

Disruptions are not just “higher lead time.” They have structure: a start time, magnitude, duration, and recovery path. Modeling that structure is how you turn generic risk talk into actionable policy recommendations.

Define disruption events explicitly. A port delay might add +10 days to lead time for a 6-week window, with a gradual recovery where delays decay over the next 4 weeks. A supplier failure might reduce supply capacity to 0% for 2 weeks, then 50% for 4 weeks, then normal. A demand shock might increase demand level by +25% immediately, but also increase volatility (CV) and intermittency. Each event should specify: trigger (calendar week or probability), magnitude (additive or multiplicative), duration distribution (fixed vs random), and recovery function (step, linear, exponential, or staged).

In Monte Carlo, disruptions can be deterministic scenarios (“port delay occurs in week 20”) or probabilistic (“10% chance per quarter”). Use deterministic disruptions for planning playbooks and probabilistic ones for risk quantification (e.g., annual expected cost). Keep the rest of the model fixed so you can attribute outcomes to the event. If you combine multiple disruptions, treat that as a separate scenario tier (“compound disruption”) and label it clearly.

Engineering judgment shows up in how you avoid unrealistic extremes. For example, if you sample lead times from a lognormal with a heavy tail, then also add a port delay shock, you may double-count tail risk. Decide whether disruptions replace the tail, add to it, or represent a different mode (mixture distribution). Document that choice in metadata so future analysts do not “improve” it accidentally.

  • Practical outcome: A disruption library lets you test resilience levers: extra safety stock, alternate supplier, faster expedite trigger, or longer review frequency under disruption.
Section 4.5: Visualizing uncertainty: fan charts, distributions, percentiles

Section 4.5: Visualizing uncertainty: fan charts, distributions, percentiles

Executives often ask for “the number,” but decisions require seeing the range. Your job is to make uncertainty legible without overwhelming. Three visuals do most of the work.

Fan charts show inventory position (or backorders) over time with percentile bands (e.g., 10th–90th shaded, median line). This exposes when risk concentrates—often during seasonal peaks or long-lead-time reorder windows. Fan charts also reveal whether a policy fails abruptly (thin band then sudden blow-up) versus gradually degrades (bands widen steadily). Use consistent y-axis scales across scenarios; otherwise, the chart becomes a marketing tool rather than an analysis tool.

Distribution plots (histograms or violin plots) summarize outcomes like total cost, fill rate, and peak inventory. Show median and key percentiles (5th/50th/95th). Many supply chain metrics are skewed; the mean can hide a fat-tail risk that matters operationally. For stockout risk, a complementary view is the probability of exceeding a threshold (e.g., “P(stockout days > 3)”).

Comparable result tables anchor the narrative. Standardize columns: service level, fill rate, stockout probability, average on-hand, 95th percentile on-hand, order variability, holding/ordering/expedite/stockout costs, and total cost. Include confidence intervals or at least Monte Carlo standard error for primary metrics, especially when differences are small. A common mistake is to present two scenarios with a 0.2% fill-rate difference as meaningful when it is within simulation noise.

  • Engineering judgment: Pick percentiles that match the decision. For a high-penalty stockout environment, show 95th/99th tail outcomes, not just 90th.
  • Practical outcome: Stakeholders can see both expected performance and “bad week” behavior, enabling policy decisions that tolerate reality.
Section 4.6: Decision thresholds and policy recommendations

Section 4.6: Decision thresholds and policy recommendations

Scenario analysis becomes actionable when you convert distributions into decisions. Start by defining decision thresholds before you look at results. Examples: “Fill rate must be ≥ 98% in the median case and ≥ 96% at the 10th percentile,” or “Probability of more than 2 stockout days per quarter must be ≤ 5%,” or “Working capital cannot exceed $3M at the 95th percentile.” Pre-committing to thresholds reduces the temptation to choose a scenario because it “looks reasonable.”

Then map policy levers to outcomes. If a baseline policy fails service thresholds under the upside demand scenario but passes under baseline, your recommendation might be conditional: adopt a trigger-based adjustment (increase reorder point multiplier when forecast error rises beyond X, or when lead-time percentile shifts). If disruptions drive unacceptable tail risk, consider layered mitigations: a modest safety stock increase plus an expedite trigger can outperform a large safety stock increase alone by reducing both average capital and extreme stockouts.

Produce an executive-ready scenario summary with three parts: (1) Headline (what decision is needed), (2) Options (2–4 policy variants) with a compact comparison table, and (3) Recommendation tied to thresholds, including when it might change (sensitivity notes). Explicitly call out trade-offs: “Option B improves 95th percentile stockout days from 6 to 2, but increases median inventory by 18% and raises expedite spend variability.”

Common mistakes at this stage include overfitting to one disruption story, ignoring operational feasibility (supplier MOQ, warehouse capacity, review process limits), and presenting a single “optimal” policy without explaining robustness. Your goal is not to win an argument—it is to provide a defensible policy recommendation under uncertainty with clear guardrails and monitoring signals.

  • Practical outcome: A repeatable scenario library plus thresholds yields a governance process: rerun monthly, compare against the same scorecard, and adjust policy only when thresholds are at risk.
Chapter milestones
  • Build a scenario library with controlled parameter changes
  • Run experiments and generate comparable result tables
  • Quantify trade-offs: cost vs service vs volatility
  • Add disruption events (port delay, supplier failure, demand shock)
  • Create executive-ready scenario summaries
Chapter quiz

1. Why does Chapter 4 emphasize designing scenarios so results are comparable?

Show answer
Correct answer: Because comparable scenarios make it easier to attribute metric changes to a controlled set of parameter changes
The chapter warns that ad hoc, one-off tweaks make it hard to explain why metrics changed; controlled changes enable clear attribution.

2. In the chapter’s experimental approach, what should be held fixed versus changed across scenario runs?

Show answer
Correct answer: Hold fixed the baseline model and measurement plan; change a small, intentional set of parameters
Comparability comes from keeping the baseline and measurement consistent while varying only a documented, limited set of inputs.

3. Which set of outputs best matches the chapter’s recommended standardized metrics for comparing scenarios?

Show answer
Correct answer: Service level, fill rate, stockout risk, inventory investment, and cost components
The chapter lists these inventory and service/cost metrics as standard outputs for scenario comparison.

4. How does Chapter 4 recommend communicating uncertainty in scenario results to executives?

Show answer
Correct answer: Show distributions and percentiles, highlighting typical versus tail outcomes rather than a single average
The chapter explicitly advises using distributions/percentiles and typical vs tail outcomes to convey uncertainty.

5. Which scenario element is an example of a disruption event that should be incorporated into stress tests, according to the chapter?

Show answer
Correct answer: Port delay, supplier failure, or demand shock
The chapter calls out disruption events such as port delays, supplier failures, and demand shocks as key scenario types.

Chapter 5: Validation, Sensitivity, and Model Risk

By Chapter 4 you can run Monte Carlo inventory simulations and compare scenario libraries. Chapter 5 is where those results become trustworthy enough to inform decisions. In real supply chains, the biggest failures are not coding errors—they are validation gaps, brittle assumptions, and “pretty” outputs that hide uncertainty. This chapter gives you a practical workflow to validate a baseline against history, probe sensitivity to key drivers, stress test assumptions, and then calibrate reorder points and safety stock using simulation outputs rather than static formulas.

Validation is not about proving the model is “correct.” It is about building confidence that the model is useful for the decision at hand, within defined limits. You will learn to match historical service levels and stockout patterns, to run sensitivity analysis that identifies which inputs truly matter, and to document limitations so stakeholders can sign off with clear eyes. You will also build a model risk checklist—what could go wrong, how you would detect it, and how you would contain the damage if the model is used outside its intended scope.

Throughout the chapter, keep an engineering mindset: treat every assumption as a component with a known tolerance. If a parameter is uncertain, your job is to (1) quantify that uncertainty, (2) test the decision’s robustness to it, and (3) communicate residual risk without false precision. The practical outcome is a simulation you can defend: one that supports policy calibration (reorder points, order-up-to levels, safety stock, review cycles) and communicates uncertainty in a way that enables action.

Practice note for Validate the baseline against historical performance: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Run sensitivity analysis on key drivers and parameters: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Stress test model assumptions and document limitations: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Calibrate safety stock and reorder points from simulation outputs: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Create a model risk checklist for stakeholders: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Validate the baseline against historical performance: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Run sensitivity analysis on key drivers and parameters: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Stress test model assumptions and document limitations: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Calibrate safety stock and reorder points from simulation outputs: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Sections in this chapter
Section 5.1: Backtesting: matching service levels and stockout patterns

Backtesting is baseline validation: run the model over a historical period and see whether it reproduces the performance you actually experienced. The goal is not to match every day’s inventory position—your simulation is stochastic—but to match key metrics and patterns: service level (cycle service level), fill rate, stockout frequency, average backorders (if modeled), and the distribution of days with low cover.

A practical workflow is: (1) freeze the historical policy parameters (review cycle, reorder point, order-up-to, minimum order quantity, case pack rules), (2) feed the model the same demand history (or a fitted distribution with the same moments), (3) use lead time histories or a fitted lead time distribution, and (4) compare simulated vs. actual metrics. Look beyond averages. Plot stockout “runs” (consecutive days out of stock) and compare the distribution of run lengths; many organizations under-estimate the pain of clustered disruptions.

  • Metric alignment: match fill rate and stockout days within a tolerance band (e.g., ±1–2 percentage points) before trusting scenario outputs.
  • Pattern alignment: check whether stockouts occur in the same operational situations (after promotions, before holidays, during supplier variability spikes).
  • Face validity: review a few “storyline” weeks with planners: do the simulated order signals look like what they would have done?

Common mistakes include validating only on aggregate KPIs (hiding SKU-level failures), ignoring the impact of order constraints (MOQ, case pack, capacity), and “double counting” safety stock (keeping a safety factor in both the demand distribution and the policy). If the model misses history, do not immediately tune parameters to force a fit. First ask: is the policy implemented in practice as documented? Many gaps are process gaps, not model gaps.

Section 5.2: Sensitivity methods: one-at-a-time, tornado, Sobol-lite

Sensitivity analysis answers: “Which inputs drive the decision?” This is the bridge from simulation to action because it tells you where to spend effort improving data and where robust policies matter more than precision. Start with one-at-a-time (OAT): vary one parameter (e.g., mean demand, demand CV, mean lead time, lead time CV, review period, MOQ) while holding others constant, and measure changes in outputs (fill rate, working capital, stockout risk).

OAT is easy but can miss interactions. A tornado chart improves communication: pick a high/low range for each parameter (often a percentile range from historical data or an expert range), run the model, and sort parameters by impact on a target metric (e.g., fill rate). This quickly shows stakeholders that, for example, lead time variability may dominate mean demand error.

For a “Sobol-lite” approach (interaction-aware without full global sensitivity math), sample parameters from distributions (Latin hypercube if you have it; otherwise stratified random), run many simulations, then compute rank correlations (Spearman) between inputs and outputs and optionally fit a simple surrogate model (linear or gradient-boosted tree) to estimate feature importance. This reveals non-linearities like “service level collapses when lead time CV exceeds 0.6.”

  • Define ranges with discipline: use data-derived intervals where possible; document if ranges are expert judgment.
  • Keep the decision fixed: sensitivity can be on outcomes under a fixed policy, or on the optimal policy—do not mix them without stating it.
  • Report robustness: show “policy A beats policy B in 85% of parameter draws,” not just average differences.

Practical outcome: you can prioritize initiatives (e.g., supplier lead time stabilization vs. demand forecasting improvements) based on quantified impact rather than intuition.

Section 5.3: Correlation and dependency (demand across items, lead times)

Independence assumptions are a frequent hidden risk. Many inventory simulations assume each SKU’s demand is independent and lead times are independent of demand conditions. In reality, dependencies drive the worst outcomes: multiple items spike together (promotions, weather, economic shocks), and lead times stretch precisely when demand surges (capacity constraints, port congestion).

Start by diagnosing dependency. Compute correlations of weekly demand between related SKUs (substitutes, shared customers, shared channels) and across a product family. Also check autocorrelation: demand this week depends on last week. For lead times, check whether lead time is correlated with order size or calendar period (peak season). If you see dependencies, reflect them in scenario modeling.

Practical modeling options, from simplest to richer: (1) impose a shared “market factor” that multiplies demand across a group in each period; (2) use correlated random draws via a correlation matrix (copula approach if available, but even a Gaussian correlation approximation is better than none); (3) model lead time regimes—normal vs. disrupted—with probabilities that increase during high demand periods.

  • Cross-item stress testing: in disruption scenarios, increase correlation to represent “everyone spikes together.”
  • Shared constraints: if multiple SKUs share a supplier or lane, treat lead time shocks as common shocks.
  • Substitution effects: if item A is out of stock, demand may transfer to item B; ignoring this can misstate both stockout risk and inventory needs.

Document dependency choices explicitly. Stakeholders often accept simplified dependence (a shared factor) when they see it materially changes tail risk. This section ties directly to “stress test assumptions”: most “black swan” stockouts are actually correlated gray rhinos.

Section 5.4: Non-stationarity: seasonality, trend, regime shifts

Baseline validation and sensitivity results can be misleading if your model assumes stationarity—constant demand and lead time distributions—when the business is changing. Non-stationarity shows up as seasonality (predictable cycles), trend (growth/decline), and regime shifts (step changes due to new channels, supplier changes, policy changes, or macro shocks).

Operationally, treat the year as a set of planning regimes. For seasonality, fit separate demand distributions by month or by peak vs. off-peak, and simulate using the appropriate period distribution. For trend, incorporate a forecast path (deterministic or stochastic) and simulate around it; otherwise your “service level” may look great simply because the model assumes last year’s lower volume. For regime shifts, create explicit scenario library branches: baseline, upside (growth), and disruption (supply shock), but also “process shift” scenarios such as moving from weekly to biweekly reviews or switching carriers.

A useful stress test is to re-run backtesting on multiple historical windows: stable periods and turbulent periods. If the model only matches in calm times, it is not ready for stress testing. Another practical technique is “rolling calibration”: re-estimate parameters quarterly and measure parameter drift; large drift indicates your assumptions need dynamic updating.

  • Common mistake: calibrating safety stock from an annual average demand CV, then using it in peak season where CV and mean both change.
  • Decision linkage: if the policy is changed seasonally in practice, model that explicitly; do not evaluate a static policy against dynamic reality.

Outcome: your simulation becomes a decision tool for planning transitions—how policies should adapt across seasons and during regime changes—rather than a fragile snapshot.

Section 5.5: Governance: assumptions log, validation report, sign-offs

Model risk is reduced more by governance than by clever algorithms. A simple, repeatable documentation package lets stakeholders understand what the model can and cannot do, and it protects your analysis from misuse after you hand it off.

Maintain an assumptions log with versioning. Each assumption should include: description, rationale, data source, date, owner, and “impact if wrong.” Examples: how lost sales vs. backorders are treated; whether orders can be expedited; whether lead time includes receiving and put-away; how MOQs and case packs are enforced; and whether inventory accuracy issues are ignored. Pair this with a validation report that includes baseline backtesting results (Section 5.1), sensitivity findings (Section 5.2), and a summary of stress tests and limitations.

  • Sign-offs: require at least three roles: planner/business owner (policy realism), data owner (data quality), and analytics owner (method correctness).
  • Reproducibility: log random seeds, parameter files, and scenario IDs so results can be recreated.
  • Change control: define what triggers re-validation (new supplier, network redesign, forecast methodology change).

Common mistake: presenting simulation outputs without clarifying governance, leading to “model drift” where people tweak inputs ad hoc. Governance enables calibration too: when you adjust reorder points and safety stock based on simulation, you can tie the new settings to documented assumptions and validation evidence.

Section 5.6: Ethics and decision safety: avoiding false precision

Inventory stress testing influences real outcomes: customer experience, working capital, and sometimes patient safety or food availability. Ethical modeling here is not abstract—it is about decision safety. The main ethical failure mode is false precision: reporting a fill rate of 96.37% as if it were known, when it depends on uncertain parameters and unmodeled constraints.

Adopt “decision-safe” communication. Report ranges and probabilities: “Under baseline assumptions, fill rate is 95–97% (P10–P90); under disruption, it drops to 88–93%.” Show confidence intervals and tail metrics (e.g., probability of more than 5 stockout days per month). If you calibrate reorder points from simulation outputs, do it to meet targets with a buffer: choose policies that hit service goals in most simulations, not only on average. This is how calibration becomes robust rather than optimistic.

Also be explicit about who bears the risk. A policy that reduces inventory may look good financially but increases stockout risk for critical customers. If some SKUs are safety-critical, use asymmetric loss functions or explicit constraints (e.g., “probability of stockout during lead time must be below 1%”).

  • Common mistake: tuning the model until it matches one KPI while ignoring others (e.g., matching fill rate but creating unrealistic order volatility).
  • Limitation discipline: if you did not model capacity, expediting, substitution, or allocation rules, say so prominently.
  • Model risk checklist: list top misuse risks (stationarity assumed, independence assumed, policy constraints ignored), detection signals (KPI drift, parameter drift), and mitigations (re-validation, scenario expansion, governance sign-off).

Outcome: stakeholders can act on results without being misled, and your simulation becomes a reliable support tool—one that respects uncertainty and avoids overconfident recommendations.

Chapter milestones
  • Validate the baseline against historical performance
  • Run sensitivity analysis on key drivers and parameters
  • Stress test model assumptions and document limitations
  • Calibrate safety stock and reorder points from simulation outputs
  • Create a model risk checklist for stakeholders
Chapter quiz

1. In this chapter, what is the primary purpose of validating a baseline inventory simulation against historical performance?

Show answer
Correct answer: To build confidence that the model is useful for a specific decision within defined limits
Validation focuses on decision-usefulness and clearly bounded applicability, not proving absolute correctness.

2. Which validation approach best aligns with the chapter’s guidance for comparing the baseline model to history?

Show answer
Correct answer: Match historical service levels and stockout patterns to see if the model reproduces key behaviors
The chapter emphasizes matching service levels and stockout patterns rather than overfitting to a narrow slice of history.

3. What is the main goal of sensitivity analysis in the workflow described in Chapter 5?

Show answer
Correct answer: Identify which key drivers and parameters materially affect outcomes and decision robustness
Sensitivity analysis is used to determine which inputs truly matter and how robust decisions are to uncertainty.

4. According to Chapter 5, why should safety stock and reorder points be calibrated using simulation outputs rather than static formulas?

Show answer
Correct answer: Simulation-based calibration incorporates uncertainty and scenario behavior instead of assuming a single fixed world
The chapter stresses using simulation outputs to reflect uncertainty and real-world variability when setting policies.

5. Which item best fits the purpose of a model risk checklist for stakeholders as described in the chapter?

Show answer
Correct answer: What could go wrong, how to detect it, and how to contain damage if the model is used outside its intended scope
The checklist is about risks, detection, and containment tied to assumptions and scope, enabling informed sign-off.

Chapter 6: Ship a Scenario Modeler Portfolio Project

A portfolio project only “counts” when someone else can run it, trust it, and use it to make a decision. In supply chain stress testing, that means your scenario modeler must be reproducible (same inputs → same outputs), explainable (assumptions and policies are explicit), and decision-oriented (outputs map to service risk and working capital, not just charts). This chapter turns your Monte Carlo inventory simulation into a shippable artifact: a documented notebook (or small codebase), a lightweight dashboard, and stakeholder readouts tailored to operations, finance, and leadership.

Your goal is to translate planning questions into simulation-ready problem statements, run consistent scenario libraries (baseline, upside, disruption), validate assumptions with sensitivity analysis, and communicate uncertainty with ranges. You will also position the project as “AI-adjacent” by outlining a roadmap from simulation to forecasting and optimization. Finally, you will convert the work into resume bullets and interview stories that prove you can build decision support, not just models.

As you work through this chapter, keep a single test question in mind: “If I emailed this repository to a hiring manager, could they reproduce my results in 15 minutes and understand the decision implications in 5?” Everything you ship should serve that bar.

Practice note for Finalize a complete notebook project with documentation: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Create a lightweight results dashboard and narrative: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Prepare stakeholder readouts: ops, finance, and leadership: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Define how this fits AI: roadmap to forecasting and optimization: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Translate the project into resume bullets and interview stories: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Finalize a complete notebook project with documentation: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Create a lightweight results dashboard and narrative: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Prepare stakeholder readouts: ops, finance, and leadership: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Define how this fits AI: roadmap to forecasting and optimization: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Translate the project into resume bullets and interview stories: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Sections in this chapter
Section 6.1: Project structure: folders, configs, and reproducible runs

Section 6.1: Project structure: folders, configs, and reproducible runs

Start by treating your notebook as a product. The most common portfolio failure is a “works on my machine” notebook with hidden state, hard-coded paths, and undocumented assumptions. Your structure should make the simulation easy to rerun for a new SKU, lane, or policy with minimal edits.

Use a simple repository layout: data/ (small sample inputs), notebooks/ (one main notebook plus exploration), src/ (simulation and metrics functions), configs/ (scenario definitions), outputs/ (generated artifacts), and docs/ (README, assumptions, decision memo templates). Keep raw data out of version control if it is proprietary; include a synthetic dataset generator so reviewers can run end-to-end.

  • Config-driven scenarios: define demand distribution parameters, lead time variability, review cycle, reorder point/order-up-to policy, safety stock method, and cost inputs (holding, stockout, expedite). Store these in YAML/JSON so scenarios are consistent and comparable.
  • Reproducibility controls: set random seeds per scenario run, log versions (Python, key libraries), and write outputs with a run ID. Include a single command or notebook cell that runs all scenarios and regenerates the same tables and plots.
  • Clear interfaces: implement simulate(policy, demand_model, lead_time_model, horizon, n_sims) and return tidy event-level or period-level results. Avoid mixing plotting code inside simulation loops.

Engineering judgment shows up in defaults. Pick conservative, explainable defaults (e.g., 10,000 Monte Carlo trials for stable tail estimates; weekly periods; warm-up period if you model on-hand initialization). Document why. A frequent mistake is forgetting unit consistency: daily demand with weekly review cycles, lead time in days, and reorder point in units can silently misalign. Add validation checks (assertions) that stop the run if units or parameter ranges are inconsistent.

Section 6.2: Reporting: KPIs, scenario tables, and insights narrative

Section 6.2: Reporting: KPIs, scenario tables, and insights narrative

Your model’s credibility depends on reporting that matches how supply chain decisions are evaluated. A “lightweight dashboard” can be a single notebook section that produces standardized KPI tables and a few decision-grade plots. The key is consistency across scenarios: same metrics, same time horizon, same definitions.

At minimum, report service and risk metrics alongside capital impact. Typical KPIs include cycle service level (probability of no stockout in a cycle), fill rate (fraction of demand satisfied immediately), stockout probability per period, expected backorders, average on-hand, and working capital tied up in inventory (units × unit cost). For executives, translate these into business terms: “What’s the chance we miss customer demand next month?” and “How much cash is locked in stock?”

  • Scenario table: rows = scenarios (baseline/upside/disruption), columns = KPIs (mean, P50, P90 or P95). Include deltas versus baseline.
  • Distribution views: plot histogram/violin for fill rate and working capital; plot tail risk for stockouts (e.g., P95 backorders). Avoid only plotting averages; stress testing exists to see the tails.
  • Policy comparison: compare reorder point vs order-up-to adjustments, safety stock changes, or review cycle changes using the same scenario library.

Write an “insights narrative” directly under your tables: 5–8 sentences that interpret results and suggest actions. Example structure: (1) what changed, (2) what it does to service, (3) what it costs in capital, (4) what risks remain, (5) recommended policy choice and why. Common mistakes include mixing KPIs (e.g., calling fill rate “service level”), reporting point estimates without ranges, or presenting too many plots without a decision. Your narrative should answer: “So what should we do?”

Section 6.3: Packaging outputs: CSVs, plots, and decision memos

Section 6.3: Packaging outputs: CSVs, plots, and decision memos

Stakeholders rarely read code. To make your project usable, package outputs in formats others can open and reuse: CSVs for tables, PNG/PDF for plots, and a one-page decision memo that captures assumptions, results, and recommendation. Treat these as build artifacts: generated every run into outputs/{run_id}/.

Start with tidy CSVs: a scenario summary table (one row per scenario-policy combination), a sensitivity table (one row per parameter perturbation), and optionally a time-series export (period-level inventory position, on-hand, backorders) for analysts. Name columns clearly and include units (e.g., avg_on_hand_units, working_capital_usd, fill_rate_pct). A subtle but valuable practice is to include the full scenario config embedded as JSON in a column or separate file so results remain traceable.

  • Plots that travel: include (a) KPI distribution plot, (b) trade-off curve (fill rate vs working capital), (c) scenario comparison bar chart with error bars (P10–P90), and (d) a time-series example path for intuition. Keep them readable in a slide.
  • Decision memo template: sections for question, scope/SKUs, assumptions, scenarios, results table, sensitivity highlights, recommendation, and “what would change my mind.”
  • Documentation: a README with “How to run,” “Data dictionary,” “Policy definitions,” and “Metric definitions.”

Common packaging mistakes are overwriting outputs (no run IDs), not pinning dependencies, and omitting metric definitions. Remember: your portfolio reviewer is testing whether you can ship decision support. If a finance partner can open your CSV and reproduce the headline numbers, you’ve won half the battle.

Section 6.4: Stakeholder communication: uncertainty, ranges, and trade-offs

Section 6.4: Stakeholder communication: uncertainty, ranges, and trade-offs

Stress testing is inherently about uncertainty, so your communication must be explicit about ranges and trade-offs. The biggest stakeholder mistake is to present Monte Carlo results as a single “answer,” which invites false precision and erodes trust when reality differs. Instead, present a range (P50/P90) and explain what drives it (demand volatility, lead time variability, review cycle cadence, and policy thresholds).

Prepare three readouts tailored to different audiences:

  • Ops readout: focuses on feasibility and execution. Highlight expected stockout frequency, backorder sizes, and how reorder points or review cycles affect firefighting (expedites, schedule changes). Provide operational levers: increase safety stock, reduce review period, qualify alternate suppliers.
  • Finance readout: focuses on working capital, carrying cost, and downside exposure. Present distributions of inventory investment and the cost-of-service curve. Be precise about cost assumptions and what is excluded (e.g., obsolescence, lost sales penalties).
  • Leadership readout: focuses on risk posture and decision. Lead with “If disruption scenario happens, we have X% chance of stockout; mitigating to Y% costs $Z in inventory.” Keep details in an appendix.

Use consistent language: “There is a 10% chance outcomes are worse than this” (P90 risk), not “we expect.” Pair every service improvement with its capital trade-off. When stakeholders challenge assumptions, welcome it and point to your sensitivity analysis: show which parameters matter most and where uncertainty dominates. A common mistake is to run sensitivity by changing many things at once; instead, do one-factor-at-a-time plus a small set of structured combined shocks (e.g., demand up + lead time up) to reflect realistic disruptions.

Section 6.5: Next steps: integrating forecasting/ML and optimization

Section 6.5: Next steps: integrating forecasting/ML and optimization

Your project is already “AI-ready” because it formalizes the problem and creates a simulation environment where better predictions and better policies can be evaluated safely. The roadmap to AI should be presented as incremental upgrades, not a rewrite.

First, forecasting integration: replace static demand distributions with forecast distributions. In practice, this means producing a predictive distribution (mean and uncertainty) per period and sampling from it inside the Monte Carlo simulation. Start simple: exponential smoothing with residual bootstrapping, then graduate to ML models (gradient boosting, temporal fusion transformers) if you have features. The key is to carry uncertainty forward, not just point forecasts—otherwise the stress test becomes optimistic.

Second, lead time modeling: use historical purchase order data to estimate a lane- or supplier-specific lead time distribution with seasonality and risk of extreme delays. A practical step is a mixture model (normal for typical lead times + heavy-tail component for disruptions) or a regime-switching approach that toggles between “normal” and “disrupted” states in scenarios.

  • Optimization layer: once your simulator is stable, you can search for policies that meet a service constraint at minimum working capital. Start with grid search over reorder points and order-up-to levels; then introduce Bayesian optimization or evolutionary search for higher-dimensional policies.
  • Closed-loop evaluation: compare “current policy,” “optimized policy,” and “forecast-informed policy” under the same scenario library to avoid cherry-picking.
  • Governance: define what requires sign-off (metric definitions, scenario library, cost inputs) so changes don’t invalidate comparisons.

A common mistake is jumping to sophisticated ML before the simulation assumptions and KPIs are agreed. Hiring managers and stakeholders value a clear evaluation harness more than a fancy model. Your simulator is that harness.

Section 6.6: Career transition kit: role mapping, keywords, case walkthrough

Section 6.6: Career transition kit: role mapping, keywords, case walkthrough

To make this project convert into interviews, package it as a “case.” You need a crisp story: problem, approach, results, and decision. Then map it to roles and keywords so recruiters understand the fit.

Role mapping: For supply chain analytics roles, emphasize inventory policy modeling, scenario planning, service/capital trade-offs, and stakeholder communication. For data science roles, emphasize probabilistic simulation, uncertainty quantification, experiment design (scenario libraries), and reproducible pipelines. For operations research roles, emphasize policy optimization and constraints. For ML engineering or analytics engineering, emphasize configs, reproducible runs, packaging outputs, and data contracts.

  • Keywords to include: Monte Carlo simulation, inventory optimization, (s, S) / reorder point, safety stock, service level vs fill rate, lead time variability, scenario analysis, sensitivity analysis, uncertainty quantification, working capital, reproducible pipelines, decision memo.
  • Resume bullets (pattern): “Built X to achieve Y measured by Z.” Example: “Built a Monte Carlo inventory stress-test simulator (reorder point/order-up-to, stochastic demand/lead time) to quantify fill-rate risk across baseline/upside/disruption scenarios; reduced required working capital by 12% at constant P90 stockout risk via policy tuning.”
  • Interview walkthrough: open with the business question, define KPIs, explain assumptions, show scenario library design, then show one trade-off chart and one table with P50/P90. Finish with sensitivity findings and what you’d improve next (forecast integration, optimization).

Common interview mistakes are describing the project as “a notebook with simulations” (too vague), failing to define metrics, or not explaining why scenarios were chosen. Your advantage is demonstrating end-to-end thinking: from planning question → simulation-ready statement → policy modeling → uncertainty-aware recommendation → stakeholder-ready artifacts. That is exactly what hiring teams mean when they ask for “applied AI” in operations.

Chapter milestones
  • Finalize a complete notebook project with documentation
  • Create a lightweight results dashboard and narrative
  • Prepare stakeholder readouts: ops, finance, and leadership
  • Define how this fits AI: roadmap to forecasting and optimization
  • Translate the project into resume bullets and interview stories
Chapter quiz

1. A portfolio scenario modeler “counts” only when it meets which set of criteria described in the chapter?

Show answer
Correct answer: Reproducible, explainable, and decision-oriented
The chapter defines a shippable project as one others can run and trust: same inputs→same outputs, explicit assumptions/policies, and outputs that support decisions.

2. Which output framing best matches the chapter’s guidance for decision-oriented communication?

Show answer
Correct answer: Translate results into service risk and working capital impacts
The chapter emphasizes mapping simulation outputs to decision implications (service risk and working capital), not just visualizations.

3. What is the purpose of running consistent scenario libraries like baseline, upside, and disruption?

Show answer
Correct answer: To compare decisions across standardized conditions and reduce ad-hoc analysis
Using a consistent scenario library enables reliable comparisons and repeatable decision analysis across common planning conditions.

4. How does the chapter recommend validating assumptions in a Monte Carlo inventory simulation project?

Show answer
Correct answer: Use sensitivity analysis to test how assumptions affect outcomes
The chapter calls for validating assumptions with sensitivity analysis and communicating uncertainty with ranges.

5. What does it mean to position the scenario modeler as “AI-adjacent” in this chapter?

Show answer
Correct answer: Outline a roadmap from simulation to forecasting and optimization
The chapter’s AI framing is a roadmap: simulation as decision support now, with a path toward forecasting and optimization later.
More Courses
Edu AI Last
AI Course Assistant
Hi! I'm your AI tutor for this course. Ask me anything — from concept explanations to hands-on examples.