HELP

+40 722 606 166

messenger@eduailast.com

From Excel to AI Analyst: Forecasting Models in 10 Scenarios

Career Transitions Into AI — Beginner

From Excel to AI Analyst: Forecasting Models in 10 Scenarios

From Excel to AI Analyst: Forecasting Models in 10 Scenarios

Turn spreadsheets into forecasting models you can defend in real meetings.

Beginner excel-to-ai · forecasting · time-series · business-analytics

Course Overview

This book-style course is designed for Excel-powered analysts who want to become AI-ready forecasting practitioners—without jumping straight into black-box models or vague theory. You’ll learn a repeatable forecasting workflow that starts with the same instincts you already use in spreadsheets (sanity checks, baselines, and clear logic) and upgrades them into a modern, defensible modeling pipeline.

By the end, you’ll be able to build, evaluate, and communicate forecasting models that hold up in real business conversations: “What is the horizon?”, “What happens if promotions change?”, “How accurate is this compared to our current method?”, and “How do we know the model won’t break next month?”

What Makes This Different

Most forecasting content either stays in Excel forever or jumps directly to advanced time-series tooling. This course bridges the gap with a practical approach: start with strong baselines, build leak-free datasets, add business drivers, then train ML models you can explain. You’ll practice the same core template across 10 business scenarios so you can generalize—not memorize.

  • Excel-to-AI translation: concepts mapped to a workflow you can reuse
  • Business-first evaluation: metrics, backtesting, and error analysis
  • Driver-aware forecasting: promotions, pricing, inventory, and calendars
  • Stakeholder delivery: scorecards, visuals, uncertainty, and handoff

Who This Is For

If you’re in finance, operations, sales ops, marketing analytics, or business analysis—and you’re known as “the Excel person”—this course is your structured transition plan. It’s also a good fit if you’ve tried machine learning before but struggled to make results credible, repeatable, and decision-ready.

How You’ll Learn (Chapter-by-Chapter)

You’ll begin by reframing forecasting problems in business terms: horizon, cadence, granularity, and success metrics. Next, you’ll focus on data preparation that prevents leakage—one of the most common reasons forecasts look great in a notebook and fail in production. Then you’ll build baselines that set a trustworthy benchmark, followed by feature engineering that brings in real drivers like promotions and seasonality. Finally, you’ll train explainable ML models, add uncertainty and scenario planning, and deliver a portfolio-ready report based on 10 practical scenarios.

Outcome: Job-Ready Forecasting Confidence

When you finish, you won’t just have “a model.” You’ll have a process: a modeling table that’s safe to backtest, a baseline leaderboard, a champion model with documented assumptions, and a clear story about tradeoffs and limitations. That’s exactly what hiring managers and stakeholders look for in an AI-capable analyst.

Ready to start? Register free or browse all courses.

What You Will Learn

  • Translate Excel forecasting habits into an end-to-end AI forecasting workflow
  • Frame forecasting problems with horizons, granularity, and business constraints
  • Prepare time-series data: timestamps, missingness, outliers, and leakage prevention
  • Build strong baselines (naive, seasonal naive, moving average) and compare fairly
  • Engineer time-series features: lags, rolling stats, calendar effects, and promotions
  • Train and tune regression-based and tree-based forecasting models
  • Evaluate with MAE, RMSE, MAPE/sMAPE, WAPE, and backtesting strategies
  • Create prediction intervals and scenario-based forecasts for planning
  • Apply the same model template across 10 real business scenarios
  • Package results into a stakeholder-ready forecast report and handoff checklist

Requirements

  • Comfort with Excel formulas, tables, and pivot tables
  • Basic statistics intuition (averages, variance, seasonality concepts)
  • No prior machine learning required
  • A computer capable of running Python locally or in a cloud notebook

Chapter 1: From Spreadsheet Forecasts to AI Thinking

  • Define the business question, forecast horizon, and decision threshold
  • Map Excel forecasting workflows to an AI pipeline you can repeat
  • Set up a simple dataset and a reproducible project structure
  • Choose success metrics and a baseline before modeling
  • Milestone project: build and document your first naive baseline

Chapter 2: Time-Series Data Prep Without Leaks

  • Clean timestamps, align calendars, and standardize granularity
  • Handle missing data, stockouts, and anomalies with business logic
  • Create train/validation/test splits with backtesting
  • Build a data quality checklist you can reuse across scenarios
  • Milestone project: produce a leak-free modeling table

Chapter 3: Baselines That Beat Overfitting

  • Implement naive, seasonal naive, and moving-average baselines
  • Add exponential smoothing and understand when it works
  • Run backtests and summarize results in a model scorecard
  • Select a champion baseline to beat in later chapters
  • Milestone project: baseline leaderboard across multiple series

Chapter 4: Feature Engineering for Business Drivers

  • Create lag and rolling-window features safely
  • Add calendar features and encode holidays and seasonality
  • Incorporate business drivers: price, promo, inventory, and marketing
  • Design a feature store template for repeated scenario work
  • Milestone project: driver-aware dataset ready for ML

Chapter 5: Train Forecasting Models You Can Explain

  • Train regression-based models (regularized linear) as a strong first ML step
  • Train tree-based models (random forest/gradient boosting) for nonlinearity
  • Tune models with time-series cross-validation and avoid overfitting
  • Generate prediction intervals and scenario forecasts for planning
  • Milestone project: champion ML model that beats the baseline

Chapter 6: 10 Real Business Scenarios + Portfolio-Ready Delivery

  • Apply the template to 10 scenarios: sales, demand, staffing, churn proxy, and more
  • Create a reusable forecasting notebook/report structure for stakeholders
  • Add monitoring signals: drift, accuracy decay, and data quality alerts
  • Package a portfolio case study and interview-ready talking points
  • Milestone project: final forecast report + model handoff checklist

Sofia Chen

Analytics Lead, Forecasting & Decision Intelligence

Sofia Chen leads forecasting and decision-intelligence projects across retail and SaaS, bridging finance teams and data science. She trains analysts to move from spreadsheet workflows to reproducible, stakeholder-ready models with clear evaluation and communication.

Chapter 1: From Spreadsheet Forecasts to AI Thinking

Most Excel-based forecasting is already “modeling”—it’s just informal, hard to reproduce, and often evaluated with intuition instead of consistent metrics. This chapter turns familiar spreadsheet habits (filters, pivot tables, trendlines, FORECAST functions) into an AI-ready workflow you can repeat across scenarios. The goal is not to abandon Excel thinking, but to upgrade it: define the business decision first, translate the work into a clean dataset, choose a baseline and success metric before you touch a complex model, and document the process so someone else can rerun it next month.

By the end of this chapter you will be able to (1) frame a forecasting problem using horizon, cadence, and granularity, (2) map Excel workflows to a pipeline that works with code and dataframes, and (3) complete a milestone project: build and document your first naïve baseline forecast. If you can do those three things, you’re already thinking like an AI analyst—even before you train a single “AI” model.

Practice note for Define the business question, forecast horizon, and decision threshold: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Map Excel forecasting workflows to an AI pipeline you can repeat: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Set up a simple dataset and a reproducible project structure: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Choose success metrics and a baseline before modeling: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Milestone project: build and document your first naive baseline: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Define the business question, forecast horizon, and decision threshold: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Map Excel forecasting workflows to an AI pipeline you can repeat: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Set up a simple dataset and a reproducible project structure: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Choose success metrics and a baseline before modeling: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Milestone project: build and document your first naive baseline: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Sections in this chapter
Section 1.1: What forecasting is (and isn’t) in business settings

Section 1.1: What forecasting is (and isn’t) in business settings

In business, forecasting is not about guessing the future; it’s about making better decisions under uncertainty. A forecast becomes valuable only when it changes an action: how much to order, how many support agents to schedule, which regions need extra inventory, or whether revenue is likely to miss a target. That’s why the first step is to define the business question and decision threshold. “Forecast demand” is vague; “Forecast next week’s unit sales by store to keep stockouts under 2%” is actionable. The threshold (e.g., stockout rate, service level, budget limit) determines how you will evaluate the forecast.

Forecasting also isn’t a single number. Many stakeholders implicitly want a plan, not a forecast: they ask for “the number” to plug into budgeting. As an analyst, you should separate the forecast (what’s likely to happen) from the plan (what we will do). A good workflow often produces multiple outputs: a point forecast, a range (uncertainty), and exceptions (items with high risk).

  • Forecast: estimate future values given historical patterns and known drivers.
  • Decision: the operational action that depends on the estimate.
  • Constraint: capacity, lead time, budget, policy, or SLA that shapes the acceptable error.

Common mistake: jumping straight to “Which model is best?” without specifying the decision. In interviews and on the job, you will stand out by starting with: Who uses this forecast? When do they need it? What happens if we’re wrong by 10%? Those answers determine the horizon, cadence, and metric before you build anything.

Section 1.2: Horizon, cadence, granularity, and aggregation traps

Section 1.2: Horizon, cadence, granularity, and aggregation traps

Excel forecasting often hides key design choices behind a chart. AI-style forecasting makes those choices explicit: horizon (how far ahead), cadence (how often you update), and granularity (the unit you predict—SKU, store, region). These choices determine the shape of your dataset and the difficulty of the task. Predicting tomorrow’s total website visits is not the same as predicting next quarter’s visits by channel.

Start by writing the problem as a sentence: “Every Monday at 9 a.m., forecast daily demand for the next 14 days per store-SKU.” This one sentence forces alignment: the horizon is 14 days, cadence weekly, granularity store-SKU, and the time unit is daily. It also reveals business constraints such as lead time (if replenishment takes 10 days, a 14-day horizon is sensible; a 2-day horizon is too late to act).

  • Aggregation trap #1: averaging away variability. Forecasting at a weekly level may look accurate, but daily operations can still fail (stockouts on weekends, staffing spikes).
  • Aggregation trap #2: mixing calendars. Summing across regions with different holidays or promo schedules can create “false seasonality.”
  • Aggregation trap #3: hierarchy mismatch. Finance wants a total; operations need item-level detail. Decide whether you forecast bottom-up, top-down, or reconcile both.

A practical guideline: pick the smallest granularity that the decision truly requires, then verify data quality at that level. If promotions are set per store, forecasting only at the national level will miss the very signal the business controls. Conversely, if data is extremely sparse at SKU-store-day, you may need aggregation or methods designed for intermittent demand. The key AI habit: treat these as engineering decisions, not afterthoughts.

Section 1.3: Translating Excel tools to AI equivalents (tables to dataframes)

Section 1.3: Translating Excel tools to AI equivalents (tables to dataframes)

If you can build a clean forecasting worksheet, you can build a clean forecasting dataset. The difference is that AI workflows expect tidy data: each row is one observation, each column is a variable, and timestamps are first-class. In Excel, the same analysis might be spread across multiple tabs, pivot tables, and manual fill-down formulas. In AI, you want one canonical table (often called a “fact table”) plus optional lookup tables (dimensions) for product, store, or calendar attributes.

Map what you already do in Excel to repeatable operations:

  • Filters / Sort → dataframe filtering and ordering by timestamp.
  • Pivot tables → group-by aggregations (e.g., sum sales by week and store).
  • VLOOKUP / XLOOKUP → joins/merges to attach attributes (category, region, promo flags).
  • Fill down → computed columns like lags and rolling means.
  • Manual “fixes” → explicit rules for missingness and outliers, documented and rerunnable.

Set up a simple starter dataset for this chapter’s milestone: a single series such as daily sales for one store (columns: date, sales). Make timestamps unambiguous (ISO format, one timezone), and ensure there are no duplicate dates. Then address the classic time-series hazards: missing days (gaps), outliers (returns, one-time events), and leakage (using information from the future). Leakage is especially common when Excel users compute a “rolling average” that accidentally includes today’s value or future values due to misaligned ranges.

Engineering judgment matters: don’t “clean” away real demand spikes if they correspond to known events (a promotion) that will recur. Instead, label them with features later. The mindset shift is: data preparation is part of the model, so it must be consistent, testable, and explainable.

Section 1.4: The forecasting lifecycle: data → model → evaluation → deploy

Section 1.4: The forecasting lifecycle: data → model → evaluation → deploy

An Excel workflow often ends at “produce a chart.” An AI forecasting workflow ends at “produce a forecast that can run again next cycle.” That means you need a lifecycle: data → model → evaluation → deploy, with feedback at each step. Even if you never deploy to production, you should practice as if you will—because stakeholders will ask for the same forecast again, with updated data, under time pressure.

Data. Define the dataset boundary: what tables feed it, what time range is included, and what the prediction target is. Confirm that features used at forecast time are actually known at forecast time. For example, “units sold” is not known for tomorrow; “promotion planned” might be known, but “promo effectiveness” is not.

Model. Start with baselines, then move to regression or tree-based models later in the course. The model is only one component; the bigger win is a stable pipeline that can generate training data with lags, rolling stats, and calendar effects without manual edits.

Evaluation. Use time-aware validation, not random splits. A simple method is a holdout window: train on the past, test on the most recent period. For more reliability, use rolling-origin backtesting (multiple train/test splits moving forward in time). This prevents the common mistake of overestimating performance because the model has “seen” the future pattern.

Deploy. In analyst terms, “deploy” may simply mean: a script or notebook that reruns weekly, exports a CSV, and writes a short report. The lifecycle is complete when another person can reproduce your results from raw data, without your hidden Excel steps.

Section 1.5: Metrics primer: accuracy vs cost, and why baselines win interviews

Section 1.5: Metrics primer: accuracy vs cost, and why baselines win interviews

Choosing a success metric is not a technical detail—it’s the contract between the forecast and the business. Accuracy metrics summarize error, but businesses experience cost: stockouts, waste, overtime, missed revenue. A forecast can be “accurate on average” and still expensive if it under-forecasts during peaks. That’s why you should align metrics with the decision threshold defined earlier.

  • MAE (mean absolute error): easy to interpret, robust, good default for units.
  • RMSE: penalizes large misses more; useful when spikes are costly.
  • MAPE: percent error; can explode when actuals are near zero.
  • WAPE/SMAPE: often more stable than MAPE for business reporting.

Before any advanced model, build a baseline and compare fairly. Baselines are not “toy models”; they are the standard you must beat. In interviews, strong candidates always start with a naive baseline because it proves they understand evaluation and leakage. Common baselines include:

  • Naive: tomorrow equals today (or next period equals last period).
  • Seasonal naive: next Monday equals last Monday (captures weekly seasonality).
  • Moving average: forecast equals the mean of the last k periods (smooths noise).

Milestone project (this chapter): build and document a naive baseline. Using your simple dataset (date, sales), create predictions for a test window (e.g., last 28 days) where pred[t] = sales[t-1]. Compute MAE and WAPE on that window. Then write a short “model card” paragraph: what the baseline is, why it’s reasonable, what it fails to capture (trend, weekly seasonality), and when it might still be the best choice (very short horizons, stable processes). This one-page artifact becomes your reference point for every model you build next.

Section 1.6: Reproducibility: files, naming, and versioning for analysts

Section 1.6: Reproducibility: files, naming, and versioning for analysts

Excel is powerful, but it encourages “silent changes”: a cell edit here, a pasted range there, and suddenly last week’s results can’t be recreated. Reproducibility is the AI analyst’s advantage. It doesn’t require heavy infrastructure—just consistent project structure, naming, and versioning. The goal is simple: if someone opens your folder, they can understand what happened and rerun it.

Use a lightweight structure like:

  • /data/raw: immutable extracts (never edit manually).
  • /data/processed: cleaned, gap-filled, feature-ready datasets.
  • /notebooks or /src: analysis code (one notebook for baseline is fine).
  • /reports: charts, tables, and a short narrative of assumptions.
  • /models: saved forecasts or serialized models (even CSV outputs count).

Name files with dates and intent: sales_daily_store_01_raw_2026-03-01.csv, baseline_naive_test_last28days.csv. Keep a simple changelog in a README: what data range you used, how you handled missing dates, and which metric you reported. If you use Git, commit when the baseline is working and again when you add evaluation; if you don’t, at least duplicate your “processed” dataset with versioned filenames so you can backtrack.

Finally, document assumptions that Excel usually leaves implicit: the forecast horizon, the refresh cadence, the time zone, and the rule for missing days (drop, impute zero, forward-fill). These details prevent accidental leakage and ensure that when you later move to feature engineering (lags, rolling stats, calendar effects, promotions), you’re building on a stable foundation rather than a one-off spreadsheet.

Chapter milestones
  • Define the business question, forecast horizon, and decision threshold
  • Map Excel forecasting workflows to an AI pipeline you can repeat
  • Set up a simple dataset and a reproducible project structure
  • Choose success metrics and a baseline before modeling
  • Milestone project: build and document your first naive baseline
Chapter quiz

1. What is the key upgrade Chapter 1 recommends over typical Excel-based forecasting practices?

Show answer
Correct answer: Make the workflow reproducible by defining the decision first, creating a clean dataset, choosing a baseline and metric, and documenting the process
The chapter emphasizes upgrading familiar Excel habits into a repeatable, documented workflow with clear metrics and baselines before complex modeling.

2. Why does the chapter argue that many spreadsheet forecasts are already a form of “modeling”?

Show answer
Correct answer: Because they use structured steps like trendlines and FORECAST functions, even if the process is informal and hard to reproduce
Excel tools create implicit models, but they are often informal, not reproducible, and evaluated inconsistently.

3. When framing a forecasting problem, which set of elements does the chapter highlight as essential?

Show answer
Correct answer: Horizon, cadence, and granularity
The chapter states you should frame the problem using horizon, cadence, and granularity.

4. According to the chapter’s workflow, what should you do before touching a complex model?

Show answer
Correct answer: Choose a success metric and build a baseline forecast
The chapter stresses selecting a baseline and success metric first to evaluate improvements consistently.

5. What is the milestone project you should be able to complete by the end of Chapter 1?

Show answer
Correct answer: Build and document your first naïve baseline forecast
The milestone explicitly focuses on building and documenting a naïve baseline forecast as the first repeatable benchmark.

Chapter 2: Time-Series Data Prep Without Leaks

In Excel, forecasting often starts with a tidy table: dates in column A, values in column B, and maybe a couple of helper columns for month or weekday. In AI forecasting work, the same instincts apply—clean dates, sensible aggregation, and “don’t cheat”—but the stakes are higher because automated pipelines will happily scale your mistakes. This chapter turns your spreadsheet habits into a repeatable, leak-free workflow for time-series prep.

The goal is a modeling table where each row represents one time step for one entity (store, SKU, region, customer segment), with a timestamp, a target you want to forecast, and only features that would have been known at prediction time. You will: (1) clean timestamps and align calendars, (2) handle missing data with business logic (including stockouts), (3) flag anomalies and special events, (4) split data using backtesting rather than random sampling, and (5) build a reusable data quality checklist you can carry through all 10 scenarios in this course.

The milestone project at the end of this chapter is simple but powerful: produce a leak-free modeling table ready for baselines and feature engineering in Chapter 3+. If you can do that reliably, you will outperform many “model-first” attempts that collapse under real-world evaluation.

Practice note for Clean timestamps, align calendars, and standardize granularity: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Handle missing data, stockouts, and anomalies with business logic: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Create train/validation/test splits with backtesting: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Build a data quality checklist you can reuse across scenarios: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Milestone project: produce a leak-free modeling table: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Clean timestamps, align calendars, and standardize granularity: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Handle missing data, stockouts, and anomalies with business logic: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Create train/validation/test splits with backtesting: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Build a data quality checklist you can reuse across scenarios: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Milestone project: produce a leak-free modeling table: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Sections in this chapter
Section 2.1: Time index fundamentals: time zones, week definitions, fiscal calendars

Section 2.1: Time index fundamentals: time zones, week definitions, fiscal calendars

Your time index is the spine of the dataset. If it is bent—even slightly—everything built on top (lags, rolling means, seasonality, backtests) will be wrong in subtle ways. Start by defining a canonical timestamp column and a canonical granularity: hourly, daily, weekly, or monthly. Make this decision based on the business: replenishment cycles, reporting cadence, and forecast horizon. A daily forecast for perishable inventory makes sense; a monthly forecast for long-cycle B2B revenue often does not need daily noise.

Time zones are a common source of silent misalignment. Transaction systems may log in UTC while business reporting uses local time. Decide which “business time” you forecast in, then convert everything to that timezone before aggregating. For example, if a store closes at 10pm local time, sales at 1am UTC might belong to the prior local day. If you skip this step, you create artificial dips/spikes at day boundaries that models learn as fake patterns.

Weeks are another trap. “Week” might mean Monday–Sunday (ISO) or Sunday–Saturday (retail). It might also mean “fiscal week,” where weeks roll into a 4-4-5 calendar. You must encode and aggregate using the same definition your stakeholders use to judge success. If finance reports weekly sales on a 4-4-5 calendar, then your weekly granularity must follow that calendar; otherwise backtests will compare different week buckets and your error metrics won’t match the business dashboard.

  • Practical workflow: parse raw timestamps → convert to business timezone → create a normalized period key (date, week_id, fiscal_period) → aggregate target and inputs to that key.
  • Common mistake: mixing transaction-time aggregation with reporting-time evaluation (e.g., forecasting “daily” using UTC days but evaluated on local days).

By the end of this section, you should have a complete calendar table for your chosen granularity that includes every expected time step (even if no data exists yet), plus columns like holiday flags, fiscal periods, and week-of-year computed consistently.

Section 2.2: Missingness patterns: gaps, late arrivals, and imputation choices

Section 2.2: Missingness patterns: gaps, late arrivals, and imputation choices

Missing data in time series is rarely random. It usually signals a process: the store was closed, the sensor failed, the product was out of stock, or the data arrived late. Treating all missingness as “fill with zero” (a common Excel instinct) can poison a model because it changes the meaning of the target.

Start by categorizing missingness into three buckets. Gaps are missing time steps in the calendar (no row at all). Fix this by reindexing to the complete calendar and making missing explicit. Nulls are rows present but target/value missing; these often indicate late-arriving data or extraction issues. Structural zeros are true zeros (e.g., store closed), which should be represented as zero but ideally accompanied by a “closed” flag so the model learns the right reason.

Stockouts deserve special handling. If demand exists but inventory is zero, observed sales become “censored demand,” not true demand. Filling stockout days with zero teaches the model that demand vanishes during stockouts, which is backward. Prefer a business-rule approach: keep observed sales as-is, but add an is_stockout flag and, if available, include inventory-on-hand as a feature. For some scenarios, you may exclude stockout periods from training when the objective is to forecast unconstrained demand.

  • Imputation choices: forward-fill is reasonable for slowly changing covariates (price lists, store attributes) but usually wrong for targets. Interpolation can smooth over real volatility; use it only when the target is a continuous physical measure and you have domain justification.
  • Late arrivals: if you train on data that will be revised later (e.g., returns, claim adjustments), freeze a “data as of” snapshot per training cut-off to avoid accidental future corrections leaking into the past.

The practical outcome is a dataset with a complete time grid, explicit missingness indicators, and documented rules for each variable: when you fill, when you flag, and when you drop.

Section 2.3: Outliers and special events: flags vs smoothing

Section 2.3: Outliers and special events: flags vs smoothing

Outliers in forecasting are not just “bad points.” They can be data errors, but they can also be the most important business signals: promotions, weather shocks, system migrations, or one-time contracts. The mistake many analysts make (often inherited from chart-cleaning in Excel) is to smooth first and ask questions later.

Use a two-pass approach. First, detect candidate anomalies with simple, explainable rules: z-scores on log-transformed targets, robust deviations from a rolling median, or percent changes beyond plausible thresholds. Second, classify them using business context. Ask: was there a promotion? a holiday? a store closure? a known outage? If you can explain the spike, you usually should not remove it—you should encode it.

This leads to the key judgment: flagging vs smoothing. Flagging keeps the raw value but adds features like is_promo, holiday_name, price_drop_pct, system_outage. Smoothing replaces the value (winsorization/capping, median replacement) and is appropriate mainly when you are confident it is measurement error. A practical guideline: if the business could have known the cause in advance (planned promotion, holiday calendar), flag it; if it is an unrepeatable glitch (double-counting bug), correct it.

  • Common mistake: removing all holiday spikes as “outliers,” then complaining the model under-forecasts holidays.
  • Another mistake: capping targets using global thresholds that distort small entities more than large ones; prefer per-entity robust rules.

End this step with an “event ledger”: a small table of special dates (and entity-specific events) that will later become features. This is one of the highest-ROI artifacts you can build because it transfers across scenarios and model types.

Section 2.4: Leakage patterns (future info) and how to detect them

Section 2.4: Leakage patterns (future info) and how to detect them

Leakage is any use of information at training time that would not be available when making a real forecast. In time series, leakage is especially easy to introduce because many features are “derived” and because Excel-style workflows often compute aggregates over the full dataset.

Watch for four common leakage patterns. (1) Target leakage via rolling features: a 7-day moving average that includes “today” (or, worse, future days) when predicting today+H. Ensure all lags/rolls are computed with a strict shift so they only use prior timestamps relative to the forecast origin. (2) Global normalization: scaling using the full-series mean/standard deviation, which uses future values. Fit scalers on training windows only. (3) Post-period attributes: variables like “end-of-month inventory,” “final billed revenue,” or “delivered quantity” that are only known after the period closes. Replace them with “as-of” versions if available. (4) Backfilled corrections: revised historical data that wasn’t known at the time; if your evaluation is meant to mimic real operations, train on snapshots as they existed then.

Detection is partly technical, partly investigative. Technically, run a “time-travel test”: pick a cutoff date, rebuild features using only data up to the cutoff, and compare to your existing features. If values differ, you have leakage or late-arriving updates. Investigatively, interview data owners: ask which fields are finalized later, which are estimates, and what the reporting lag is.

  • Practical rule: every feature should have a clear “availability time.” Document it. If you cannot explain when it is known, do not use it.
  • Excel-to-AI translation: avoid formulas like AVERAGE($B$2:$B$1000) over full ranges; in code, avoid groupby transforms that don’t respect time ordering.

If you take nothing else from this chapter, take this: leakage produces impressive validation metrics and disastrous live performance. Build your pipeline to make leakage hard by construction.

Section 2.5: Backtesting and rolling-origin splits

Section 2.5: Backtesting and rolling-origin splits

Random train/test splits are inappropriate for forecasting because they let the model learn from the future to predict the past. Instead, you evaluate using rolling-origin backtesting (also called walk-forward validation). This mirrors how the forecast will be used: train on history up to a point, predict the next horizon, then roll forward.

Define three elements clearly: horizon (how far ahead you predict, e.g., 7 days), granularity (daily/weekly), and lookback (how much history you use for features and model training). Then design splits. For example, for daily demand with a 14-day horizon: create folds where each fold trains through day T, validates on days T+1…T+14, then advances T by 7 or 14 days. This produces multiple validation segments across seasons and promotions, giving you a fair comparison among baselines and ML models.

Keep a final test window untouched until the end—this is your closest proxy to future performance. If your data has strong seasonality, ensure the test window includes representative seasonal periods (or explicitly accept that you are testing a narrow regime).

  • Common mistake: tuning hyperparameters on the test set because it’s “the latest data.” Keep test sacred; tune on backtest folds.
  • Operational detail: when forecasting multiple entities, ensure each fold respects per-entity start dates; do not create folds that require lags before an entity exists.

The output of this step is a split specification you can reuse: fold cutoffs, horizon, stride, and the exact rules used to build features for each fold. This specification becomes part of your documentation and your QA checklist.

Section 2.6: Data QA: sanity checks, unit tests, and documentation

Section 2.6: Data QA: sanity checks, unit tests, and documentation

Forecasting projects fail more often from data issues than from model choice. The remedy is a lightweight, reusable QA checklist that runs every time you rebuild the modeling table. Think of it as replacing ad-hoc Excel spot-checks with systematic tests.

Start with sanity checks: confirm there is exactly one row per (entity, timestamp) at the chosen granularity; confirm timestamps are monotonic within entity; confirm the calendar is complete (no unexpected gaps) or, if gaps exist, they are flagged and explained. Check target ranges and units: negative sales might be valid if returns are included—document it—or might indicate a join error. Compare aggregates to known reports (weekly totals vs finance dashboard) to catch misaligned week definitions and timezone shifts early.

Then add unit-test style assertions that prevent leakage and feature bugs: lags must be null for the first k periods; rolling features should be null until enough history exists; features should not change when you rebuild them using a historical cutoff (“time-travel invariance”); and no feature should have a correlation pattern that screams leakage (e.g., a feature equal to the target shifted negatively).

  • Documentation artifacts: a data dictionary listing each column, definition, source, and availability time; a log of imputation rules; an event ledger for anomalies/promotions; and a split specification for backtesting.
  • Milestone project deliverable: a leak-free modeling table with: canonical time index, complete calendar, explicit missingness flags, anomaly/event flags, and fold assignments for rolling-origin evaluation.

When you carry this QA package into the remaining scenarios, you will move faster and with more confidence. You will also earn trust: stakeholders forgive model error more readily than they forgive inconsistent data definitions. This chapter’s work is how you avoid both.

Chapter milestones
  • Clean timestamps, align calendars, and standardize granularity
  • Handle missing data, stockouts, and anomalies with business logic
  • Create train/validation/test splits with backtesting
  • Build a data quality checklist you can reuse across scenarios
  • Milestone project: produce a leak-free modeling table
Chapter quiz

1. Which modeling-table design best supports leak-free time-series forecasting in this chapter?

Show answer
Correct answer: One row per time step per entity, including a timestamp, the target, and only features known at prediction time
The chapter’s goal is a table indexed by time step and entity, with features restricted to what would have been known when forecasting.

2. Why does the chapter emphasize cleaning timestamps, aligning calendars, and standardizing granularity before modeling?

Show answer
Correct answer: Because inconsistent time definitions can create misaligned rows and hidden leakage when pipelines scale
A consistent time index and granularity prevent subtle errors that automated pipelines can amplify, including accidental leakage.

3. When you encounter missing data in a sales time series, what approach does the chapter recommend?

Show answer
Correct answer: Apply business logic (including stockouts) to determine how to represent missingness and what it means
Missingness can mean different things (e.g., stockout vs. no demand), so handling should follow business logic rather than a single default rule.

4. Which split strategy aligns with the chapter’s guidance for evaluating time-series forecasts?

Show answer
Correct answer: Use backtesting with train/validation/test splits that respect time order
The chapter explicitly recommends backtesting over random sampling to avoid training on the future and to reflect real deployment.

5. What is the milestone project outcome for Chapter 2?

Show answer
Correct answer: A leak-free modeling table ready for baselines and feature engineering in later chapters
The milestone is to produce a leak-free modeling table that can be used for baseline models and feature engineering in Chapter 3+.

Chapter 3: Baselines That Beat Overfitting

In Excel, forecasting often starts with “copy last week” or “same week last year,” then you tweak numbers until the plot looks reasonable. That instinct is not naive; it’s the foundation of professional forecasting. In AI workflows, those quick heuristics become formal baselines: simple models that are cheap to run, easy to explain, and surprisingly hard to beat when data is noisy or sparse.

This chapter turns your Excel habits into a repeatable baseline workflow. You’ll implement naive, seasonal naive, and moving-average forecasts, then add exponential smoothing as a stronger “still simple” competitor. The key shift is that you won’t judge models by a single holdout month or by visual comfort alone. Instead, you’ll run backtests across multiple windows, summarize results in a scorecard, and pick a champion baseline to beat in later chapters.

By the end, you’ll produce a baseline leaderboard across multiple series (for example, one per product/store), using consistent horizons, fair comparisons, and analysis that explains not only which baseline won—but where it failed. That champion baseline becomes your guardrail: if a fancy model can’t beat it reliably, it’s not ready for production.

  • Practical outcome: a baseline “starter kit” you can reuse for any forecasting scenario.
  • Practical outcome: a scorecard and plots that support decision-making, not just model-building.

Keep one rule in mind throughout: baselines are not training wheels. They are benchmarks. A baseline that is fast, stable, and honest about uncertainty can outperform overfit models that look great on a single slice of history.

Practice note for Implement naive, seasonal naive, and moving-average baselines: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Add exponential smoothing and understand when it works: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Run backtests and summarize results in a model scorecard: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Select a champion baseline to beat in later chapters: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Milestone project: baseline leaderboard across multiple series: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Implement naive, seasonal naive, and moving-average baselines: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Add exponential smoothing and understand when it works: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Run backtests and summarize results in a model scorecard: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Select a champion baseline to beat in later chapters: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Sections in this chapter
Section 3.1: Why baselines matter: speed, trust, and benchmarking

Baselines matter because forecasting is as much an operational discipline as it is a modeling exercise. In business settings, you need answers quickly, you need to explain them, and you need to know whether a more complex approach is genuinely adding value. Baselines deliver all three: speed (minutes, not days), trust (they match intuitive patterns), and benchmarking (they define the “minimum acceptable” performance).

In an AI workflow, baselines protect you from overfitting in two ways. First, they set a bar: if your machine learning model doesn’t beat a seasonal naive forecast across backtests, it’s likely capturing noise or leaking information. Second, they prevent “evaluation drift,” where teams accidentally change splits, horizons, or metrics midstream and convince themselves they improved.

  • Speed: You can compute baselines for hundreds of series quickly, making them ideal for early data validation.
  • Trust: Stakeholders understand “same as last week” or “same as last year,” so baseline comparisons are persuasive.
  • Benchmarking: Baselines define the champion to beat in later chapters when you add features and more powerful models.

Common mistake: choosing a baseline that quietly uses future information. For example, using a centered moving average (which averages future points) or performing seasonal decomposition on the entire dataset before splitting will inflate baseline performance. In this chapter, you’ll treat baselines with the same leakage discipline as any ML model: fit only on the past available at each forecast origin.

Engineering judgement: invest in baselines early when the problem is underspecified (unknown seasonality, inconsistent calendars, shifting promotions). Baselines make those issues visible and help you decide whether you need better data, a better horizon definition, or a segmentation strategy before you need a more complex algorithm.

Section 3.2: Naive and seasonal naive forecasting patterns

The naive forecast is the simplest translation of an Excel habit: tomorrow equals today. Formally, for horizon h, the forecast at time t is ŷ(t+h) = y(t). This baseline is surprisingly effective when the series behaves like a random walk (high noise, little structure) or when you forecast short horizons.

Seasonal naive is the grown-up version for periodic patterns: next Monday equals last Monday; next December equals last December. If the seasonal period is s (e.g., 7 for daily with weekly seasonality, 12 for monthly with yearly seasonality), then ŷ(t+h) = y(t+h-s). This baseline often becomes the “must-beat” benchmark for retail, web traffic, staffing, and demand series with strong weekly/annual cycles.

  • When naive works: stable level, short horizon, limited seasonality.
  • When seasonal naive works: repeating cycles, consistent calendar effects, weak trend.
  • When both fail: rapid growth/decline, regime changes, promotion-driven spikes, missing season history.

Implementation details that matter in practice: (1) align timestamps and frequency—seasonal naive only makes sense when time steps are consistent; (2) handle missing values explicitly—if last week’s observation is missing, decide whether to impute, skip, or fall back to naive; (3) define horizon and granularity before coding—forecasting “next 4 weeks” on daily data is different from “next month” on monthly aggregates.

Common mistake: setting season length incorrectly (e.g., using 12 for monthly data that has quarterly seasonality, or using 7 for daily data when the real driver is business days). Start with a domain-informed guess, then verify by plotting: overlay weeks, overlay years, and check whether “same period last cycle” is plausibly predictive.

Section 3.3: Moving averages and seasonal decomposition intuition

Moving averages smooth noise by averaging recent observations. In Excel, you might add a 7-day moving average line to “see the trend.” As a forecasting baseline, a simple moving average predicts the future as the average of the last k points: ŷ(t+h) = mean(y(t-k+1)…y(t)). This can outperform naive when single-point observations are noisy, but it can lag behind turning points because it reacts slowly to change.

Two practical variants are worth keeping in your toolkit. First, a rolling mean baseline (same average for all horizons) is stable and robust. Second, a rolling median can be better when outliers (stockouts, one-time spikes) distort averages. If your business data has promotions or incident-driven spikes, the median baseline may be a surprisingly strong contender.

  • Choose k with intent: k=7 for weekly smoothing on daily data, k=28 for “month-like” smoothing, k=3 for quarterly smoothing on monthly data.
  • Use trailing windows only: avoid centered moving averages, which leak future values into the present.
  • Fallback logic: if fewer than k historical points exist, shrink k or revert to naive.

Moving averages also connect to seasonal decomposition intuition: many series look like level + seasonality + noise. A rolling average approximates the level component, while seasonal naive approximates the seasonality component. You don’t need full decomposition to benefit from the intuition: if your rolling average beats seasonal naive, the series may be dominated by noisy fluctuations around a stable level; if seasonal naive wins, calendar structure is likely the primary signal.

Common mistake: comparing a moving average baseline to a naive forecast without ensuring identical backtest windows. Because moving averages need k points of history, they often “start later,” which can unintentionally exclude difficult early periods. For fair comparison, restrict evaluation to timestamps where all baselines can produce forecasts.

Section 3.4: Exponential smoothing: level, trend, seasonality

Exponential smoothing is the next step up: still interpretable, still fast, but more adaptive than fixed-window averages. Where a moving average weights the last k points equally, exponential smoothing weights recent observations more heavily and decays weights smoothly into the past. This makes it responsive to changes without being as jumpy as the naive forecast.

Think of exponential smoothing as a family of models. Simple exponential smoothing focuses on level only and works well when there’s no strong trend or seasonality. Holt’s method adds a trend component, helping when the series is consistently rising or falling. Holt-Winters adds seasonality (additive or multiplicative), often becoming a strong baseline for business time series with repeating cycles.

  • Level (α): how quickly the model updates its baseline estimate.
  • Trend (β): how quickly it updates the slope over time.
  • Seasonality (γ): how quickly it updates seasonal indices.

When does it work best? Use simple smoothing when the main challenge is noise. Use Holt when you see sustained growth/decline. Use Holt-Winters when there is stable seasonality and enough history to learn it (typically multiple seasonal cycles). Practical judgement: if you only have one year of monthly data, a 12-month seasonal model may be fragile; seasonal naive might be safer.

Common mistakes: (1) fitting smoothing parameters on the full dataset before backtesting—this leaks information; (2) using multiplicative seasonality when values hit zero (it can break or explode); (3) ignoring intermittent demand—exponential smoothing can struggle when many periods are zero. In later chapters you’ll handle those cases with alternative strategies, but here your goal is to establish whether smoothing provides a meaningful, consistent lift over simpler baselines.

Section 3.5: Error analysis: bias, drift, and segment performance

A baseline leaderboard is not complete without error analysis. Two models can have similar average error, yet one systematically under-forecasts during promotions or over-forecasts during slow seasons. Your job as an AI analyst is to move beyond “lowest MAPE wins” and identify patterns that affect business decisions.

Start with bias: compute the mean signed error (forecast minus actual). A negative bias indicates chronic under-forecasting (risking stockouts), while positive bias indicates over-forecasting (risking waste and excess inventory). Bias is often more actionable than absolute error because it points to policy changes (safety stock, staffing buffers) and model improvements (trend terms, holiday handling).

Next, look for drift: plot error over time across backtest windows. If errors worsen steadily, the process generating the data is changing—new pricing, new channels, shifting customer behavior. Drift is a signal to shorten training windows, add external drivers, or re-segment the series (for example, separate “pre-promotion” and “post-promotion” periods).

  • Segment performance: evaluate by product class, store, region, or volume tier. Baselines often behave differently for high-volume versus low-volume series.
  • Horizon performance: measure error at each step ahead (t+1, t+2, …). Some baselines are strong short-term but degrade quickly.
  • Outlier sensitivity: compare mean-based vs median-based baselines on series with spikes.

Common mistake: relying on a single metric that behaves poorly with zeros or low volumes (for example, MAPE can explode). Use a small set of complementary metrics—MAE for scale-dependent interpretability, sMAPE or WAPE for comparability, and bias for directional risk. The point is not to collect metrics; it’s to explain why a baseline wins and under what conditions it fails.

Section 3.6: Reporting: scorecards, plots, and executive-ready summaries

To make baselines useful to the business, package results as a clear report: a model scorecard, a few standard plots, and a one-paragraph executive summary. This is where you translate backtests into decisions: which baseline becomes the champion to beat, and what constraints will guide later modeling.

Run backtests using rolling-origin evaluation (also called walk-forward validation). Define a training window, a forecast horizon, and a step size. For each origin, fit (or compute) the baseline using only history up to that point, forecast the next horizon, store errors, and repeat. This avoids the “one holdout period” trap and tells you whether a baseline is consistently good or just lucky.

  • Scorecard table: rows = models (naive, seasonal naive, moving average, exponential smoothing variants); columns = MAE/WAPE/sMAPE, bias, runtime, coverage (% forecasts produced).
  • Plots: actual vs forecast for representative series; error vs time (drift); error by horizon; and a leaderboard bar chart by metric.
  • Champion selection: choose the simplest model that performs strongly and consistently across segments, not only on average.

Your milestone project in this chapter is a baseline leaderboard across multiple series. Don’t cherry-pick a single “hero” product. Instead, compute metrics per series, then summarize with medians and percentile bands (p50, p90) to reflect typical and worst-case performance. This is especially important when you have many small series where averages can be misleading.

Executive-ready summary template (adapt it): “Across 120 product-store series, seasonal naive is the current champion (median WAPE 18%, low bias). Exponential smoothing improves median WAPE by 2 points but increases over-forecasting bias in low-volume items. Errors spike during promotion weeks, suggesting we need explicit promotion features in later chapters.” This kind of statement shows judgment, not just computation—and it sets up the next modeling steps with a clear baseline to beat.

Chapter milestones
  • Implement naive, seasonal naive, and moving-average baselines
  • Add exponential smoothing and understand when it works
  • Run backtests and summarize results in a model scorecard
  • Select a champion baseline to beat in later chapters
  • Milestone project: baseline leaderboard across multiple series
Chapter quiz

1. Why does Chapter 3 treat simple heuristics like “copy last week” as formal baselines in an AI forecasting workflow?

Show answer
Correct answer: They are cheap, explainable benchmarks that are often hard to beat on noisy or sparse data
The chapter frames these heuristics as strong, practical benchmarks—especially when data is limited or noisy.

2. What is the key shift in how models are evaluated in this chapter compared with judging by a single holdout month or visual comfort?

Show answer
Correct answer: Run backtests across multiple windows and summarize results in a scorecard
The chapter emphasizes repeated backtesting and scorecards for fair, reliable comparisons.

3. In the baseline workflow, what is the purpose of selecting a “champion baseline”?

Show answer
Correct answer: To serve as a guardrail benchmark that advanced models must reliably beat before production
A champion baseline is the benchmark; if a fancier model can’t beat it consistently, it isn’t production-ready.

4. What does the chapter suggest a good baseline leaderboard across multiple series (e.g., product/store) should emphasize?

Show answer
Correct answer: Consistent horizons and fair comparisons, plus analysis showing not only what won but where it failed
The chapter highlights consistency, fairness, and diagnosing failures—not just declaring a winner.

5. Which statement best reflects the chapter’s stance on baselines versus overfit models?

Show answer
Correct answer: Baselines are benchmarks, and a fast, stable baseline can outperform models that look great on one slice of history
The chapter stresses that baselines are serious benchmarks and can beat overfit models that don’t generalize.

Chapter 4: Feature Engineering for Business Drivers

In Excel, forecasting often means “pick a trend line, add seasonality, then adjust by judgment.” In an AI workflow, you keep the same instincts—trend, seasonality, known events, and business drivers—but you express them as features (inputs) built from historical data in a way that is reproducible, testable, and leakage-safe. This chapter is about turning the drivers you already discuss in business reviews—price, promos, inventory constraints, marketing, holidays—into columns that a model can learn from, while keeping your evaluation fair.

The core mindset shift is to separate three things that Excel tends to blend: (1) what is known at forecast creation time, (2) what is unknown and must be predicted, and (3) what is “decided” by the business (promotions, pricing, budgets). Good feature engineering respects this boundary. If you accidentally let future information seep into training features, you will build a model that looks brilliant in backtests and fails immediately in production.

We will engineer time-series features (lags, rolling statistics, and calendar effects), incorporate business drivers (price, promo, inventory, marketing), and end with a driver-aware dataset ready for machine learning. You will also design a feature store template so you can repeat the same pattern across the 10 scenarios in this course without reinventing your pipeline each time.

As you read, keep asking: “Would I know this value on the day I generate the forecast for the next horizon?” If the answer is no, it must either be excluded, lagged appropriately, or replaced with a planned/forecasted version of that driver (e.g., planned promo calendar instead of realized promo lift).

Practice note for Create lag and rolling-window features safely: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Add calendar features and encode holidays and seasonality: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Incorporate business drivers: price, promo, inventory, and marketing: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Design a feature store template for repeated scenario work: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Milestone project: driver-aware dataset ready for ML: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Create lag and rolling-window features safely: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Add calendar features and encode holidays and seasonality: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Incorporate business drivers: price, promo, inventory, and marketing: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Design a feature store template for repeated scenario work: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Sections in this chapter
Section 4.1: Lags, leads (what not to do), and rolling statistics

Lag features translate the Excel habit of “look at last week/month” into explicit model inputs. For a daily sales series y, common lags are y(t-1), y(t-7), y(t-14), and y(t-28). For weekly data, y(t-1), y(t-4), y(t-8), and y(t-13) often capture short-term momentum and annual seasonality proxies. The key rule: lags must reference only timestamps strictly earlier than the prediction time.

Rolling-window features capture the idea of “average over the last N periods” or “recent volatility.” Examples include a 7-day rolling mean of sales, a 28-day rolling sum, a rolling standard deviation, and a rolling max/min. These help models react to regime changes (e.g., sudden demand increase) and stabilize noisy series. In practice, you should compute rolling features using a shift first (e.g., roll over y shifted by 1) so the current target value is not included in its own feature. A safe pattern is: shift by one time step, then apply rolling aggregation.

What not to do: create “lead” features (future target values) or accidentally include the current period in a rolling statistic. In spreadsheets, it’s easy to select a range that includes today; in code, it’s easy to forget the shift. Either mistake produces leakage and unrealistic backtest performance. Another common trap is building lags after filtering to a subset (e.g., only promo weeks) which breaks time continuity. Build lags on the complete time series per entity (store/SKU), then filter if you must.

Engineering judgment: pick lags that match business rhythms. If payroll is biweekly and demand spikes near paydays, a 14-day lag can help. If replenishment cycles are weekly, include 7-day lag. For rolling windows, include at least one short window (reactive) and one longer window (stable), like 7 and 28 days. Finally, remember multi-entity series: compute lags within each group (store_id, sku_id) so you don’t mix histories across items.

Section 4.2: Calendar effects: day/week/month, holidays, paydays, fiscal periods

Calendar features represent predictable seasonality that is not “learned” well from lags alone, especially when you have limited history or changing patterns. Start with basic time components aligned to your granularity: day-of-week for daily data, week-of-year for weekly data, month and quarter for monthly data. For daily series, day-of-week is often one of the strongest features; for B2B, month-end and quarter-end flags can dominate.

Holidays require more than a single date flag. Many businesses see effects before and after a holiday (stock-up and recovery). Practical features include: is_holiday, days_to_holiday (capped), days_after_holiday (capped), and holiday_name. For retail, create separate indicators for major holidays (e.g., Thanksgiving, Christmas) and optionally “holiday season” windows. If you operate across regions, holiday calendars should be keyed by geography; do not assume a single national calendar.

Paydays and fiscal periods are “business calendar” features that Excel users often apply manually. Encode payday cycles (e.g., 1st/15th, last business day, weekly payroll) as binary flags and distance-to-payday features. For fiscal calendars, add fiscal_month, fiscal_week, fiscal_quarter, and “is_fiscal_year_end.” These matter because promotions, budget flushes, and procurement often align to fiscal boundaries rather than standard months.

Common mistakes: using the same holiday effects for all categories (electronics vs groceries), ignoring holiday movement across years (e.g., Easter), and failing to align calendar features to the forecast horizon. If you forecast weekly, compute features at the weekly level (e.g., “week contains holiday”) rather than daily flags. Outcome-wise, calendar features help models generalize to future dates where you have no sales yet but you do know the calendar with certainty.

Section 4.3: Causal drivers and proxy variables: selecting what matters

Business drivers are variables that plausibly influence demand and are either known in advance (planned) or can be forecasted separately. The usual shortlist includes price, promotions, inventory/availability, marketing spend or impressions, distribution changes, and competitor actions. In an AI analyst workflow, you should be explicit about which drivers are decision variables (you set them), which are constraints (inventory limits), and which are external signals (weather, macro indicators).

Price features often work better as relative changes than raw values: percent_discount vs regular_price, log(price), and price_index vs category average. Promo features should distinguish mechanics: promo_flag, promo_depth, display_flag, coupon_flag, and promo_duration. Inventory and availability are critical for separating demand from supply. Create features like in_stock_flag, on_hand_units, weeks_of_supply, and a “censored demand” indicator (sales hit a cap because you stocked out). Without these, the model may learn that “low sales” equals “low demand” when it was actually “no inventory.”

Marketing can be represented as spend, impressions, clicks, GRPs, or email sends. The practical challenge is timing: marketing effects can lag. Consider lagged marketing features and rolling sums (e.g., 7-day spend) to approximate carryover. When you lack direct driver data, use proxy variables: website traffic as a proxy for interest, store footfall as a proxy for opportunity, or category-level sales as a proxy for market trend. Proxies are not perfect, but they often improve forecasts when the true drivers are unavailable.

Selection judgment: start with drivers that are (1) measurable, (2) available historically at the same granularity, and (3) known or planned for the forecast horizon. If future promo plans are known, use planned_promo rather than realized_promo; if you only know marketing budgets monthly, do not fabricate daily precision. A strong practical outcome here is a “driver-aware” dataset where each row represents an entity-time point with target y and a vetted set of drivers that respect what would be known at prediction time.

Section 4.4: Categorical encoding for time-series contexts

Many business drivers and descriptors are categorical: store_id, sku_id, region, channel, category, promo_type, holiday_name. In Excel, you might pivot by these categories; in ML, you must encode them numerically. The safest default is one-hot encoding for low-cardinality fields (e.g., channel with 3 values). For high-cardinality identifiers (thousands of SKUs), naive one-hot explodes dimensionality and can overfit.

Two practical approaches are commonly used. First, target encoding (mean encoding): replace a category with the historical average of the target for that category. In time-series contexts, you must compute this in a leakage-safe way—using only past data relative to each row (often via expanding windows) or computing encodings on training folds only and applying to validation/test. Second, hierarchical aggregation: encode SKUs using their category-level statistics (category mean, brand mean) and keep the SKU id out of the model, or include it only in models that handle high-cardinality well (some tree methods with regularization, or embeddings in deep learning).

Also consider interaction-like categorical features that reflect business structure: region×channel, store_cluster, price_tier, and lifecycle_stage (new, mature, end-of-life). Lifecycle is especially important because new items have limited history; features like “weeks_since_launch” (numeric) plus a lifecycle bucket (categorical) provide models with context that pure lags cannot.

Common mistakes: encoding categories using the full dataset (leaks future outcomes), allowing category values that only appear in the test period to crash your pipeline, and failing to stabilize rare categories. Practical mitigations include adding an “unknown” bucket, combining rare categories into “other,” and applying smoothing in target encoding. Done well, categorical encoding lets one model learn across many entities while still respecting their differences.

Section 4.5: Scaling, transformations, and handling zero-inflation

Excel forecasters often “eyeball” a log chart or manually dampen extremes. In ML, transformations make those choices consistent. For many demand series, a log-like transform reduces the impact of spikes and makes relationships more linear. Because demand can be zero, use log1p(y) rather than log(y). For price and marketing spend, log transforms often reflect diminishing returns (a $1k to $2k increase matters more than $101k to $102k).

Scaling (standardization or min-max) is essential for some models (linear regression with regularization, neural nets) and less critical for tree-based models. However, scaling can still help when you mix features with very different magnitudes (inventory units vs binary flags). A practical workflow is: fit scalers on the training period only, then apply to validation/test to avoid leakage. Keep the scaler parameters as part of your saved pipeline so production uses the exact same transform.

Zero-inflation—many zeros in the target—shows up in intermittent demand (spare parts), low-volume SKUs, or sparse channels. Standard regression can struggle because it learns “predict near zero” too often. Practical feature tactics include: add a “was_sold_recently” flag (e.g., any sale in last 28 days), rolling count of non-zero days, and time_since_last_sale. You can also consider a two-stage modeling approach (classification for zero vs non-zero, then regression for positive demand), but even before changing model classes, these features help tree-based methods separate “no demand” regimes from “some demand” regimes.

Common mistakes: transforming the target for training but forgetting to invert predictions properly, scaling using the entire dataset, and applying aggressive clipping that removes real promo spikes. Practical outcome: a dataset and preprocessing pipeline that stabilize learning without hiding meaningful business variation.

Section 4.6: Feature documentation: assumptions, sources, and refresh cadence

Feature engineering becomes truly valuable when it is repeatable across scenarios—this is where a lightweight feature store template comes in. You do not need enterprise infrastructure to start; you need consistent definitions and a refresh plan. Document each feature with: name, description, grain (per store-SKU-day, per region-week, etc.), data source (table/file/system), transformation logic (including window sizes and shifts), and the “availability time” rule (when the value is known relative to the forecast creation date).

Assumptions matter as much as code. For example: “promo_flag uses the planned promo calendar and is known 8 weeks ahead,” or “inventory_on_hand is only available with a 1-day delay, so we lag it by 1 day.” These statements prevent subtle leakage and align stakeholders on what the model can legitimately use. Also record exception handling: how missing price is imputed, how stockouts are flagged, and whether outliers are capped.

Refresh cadence is the operational bridge from analysis to production. Some features update daily (sales, inventory), others weekly (marketing performance), and others annually (holiday calendars, fiscal mapping). Your template should include a “last refreshed” timestamp and a dependency list so downstream pipelines know when to recompute. If you work with multiple entities, add a coverage report: % missing by feature, number of entities affected, and date ranges with gaps.

Milestone project for this chapter: produce a driver-aware modeling table with a clear schema: identifiers (entity keys), timestamp, target y, leakage-safe lags and rolling stats, calendar features, vetted business drivers, encoded categoricals, and transformation metadata. Save a data dictionary alongside it. This becomes your reusable starting point for the remaining forecasting scenarios—faster iteration, fewer mistakes, and forecasts you can defend in business terms.

Chapter milestones
  • Create lag and rolling-window features safely
  • Add calendar features and encode holidays and seasonality
  • Incorporate business drivers: price, promo, inventory, and marketing
  • Design a feature store template for repeated scenario work
  • Milestone project: driver-aware dataset ready for ML
Chapter quiz

1. What is the main purpose of expressing trend, seasonality, known events, and business judgment as features in an AI forecasting workflow?

Show answer
Correct answer: To make forecasting reproducible, testable, and leakage-safe using historical data
The chapter emphasizes turning business instincts and drivers into reproducible, testable inputs while keeping evaluation fair and leakage-safe.

2. Which check best helps prevent data leakage when creating lag and rolling-window features?

Show answer
Correct answer: Ask whether you would know the value on the day you generate the forecast for the next horizon
The chapter’s rule of thumb is: if you wouldn’t know it at forecast creation time, it must be excluded, lagged, or replaced with a planned/forecasted version.

3. The chapter recommends separating which three categories that Excel workflows often blend together?

Show answer
Correct answer: What is known at forecast creation time, what is unknown and must be predicted, and what is decided by the business
A core mindset shift is explicitly distinguishing known, unknown-to-predict, and business-decided variables to avoid leakage and keep evaluation fair.

4. If a driver value is not known at forecast creation time (e.g., realized promo lift), what is the chapter’s recommended approach?

Show answer
Correct answer: Exclude it, lag it appropriately, or replace it with a planned/forecasted version (e.g., planned promo calendar)
Using unknown future information makes backtests look strong but fails in production; planned/forecasted versions keep features realistic.

5. Why does the chapter introduce a feature store template as part of the workflow?

Show answer
Correct answer: To repeat the same feature engineering pattern across scenarios without reinventing the pipeline each time
The feature store template supports consistent, reusable feature engineering across the course’s multiple scenarios.

Chapter 5: Train Forecasting Models You Can Explain

In Excel, “training a model” often means choosing a function (TREND, FORECAST.ETS), checking whether the line “looks right,” and then adjusting assumptions until the numbers feel usable. In an AI forecasting workflow, training is more disciplined: you define what a forecast is (horizon and granularity), decide how predictions will be generated over time, build a baseline, and only then introduce machine learning models that you can defend to stakeholders.

This chapter focuses on models that win in real organizations: regression-based models (especially regularized linear models) and tree-based models (random forests and gradient boosting). Both can be trained on time-series features (lags, rolling averages, calendar flags, promotions) and both can be explained—if you set them up properly.

The key shift from Excel is fairness and repeatability. You will train models using time-series backtesting folds, tune them without leaking future information, and compare them against strong baselines like seasonal naive. Your goal is not a fancy algorithm; it’s a champion model that reliably beats the baseline and produces forecasts you can explain, stress-test, and use for planning.

  • Practical outcome: a champion ML model, tuned via backtesting, with prediction intervals and a driver narrative the business can understand.
  • Common failure mode: impressive metrics on a random split that collapse when deployed because the training process peeked into the future.

As you read, keep one question in mind: “If a planning manager asks why next month is higher than usual, can I answer using model logic and data drivers—not just confidence?”

Practice note for Train regression-based models (regularized linear) as a strong first ML step: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Train tree-based models (random forest/gradient boosting) for nonlinearity: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Tune models with time-series cross-validation and avoid overfitting: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Generate prediction intervals and scenario forecasts for planning: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Milestone project: champion ML model that beats the baseline: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Train regression-based models (regularized linear) as a strong first ML step: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Train tree-based models (random forest/gradient boosting) for nonlinearity: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Tune models with time-series cross-validation and avoid overfitting: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Sections in this chapter
Section 5.1: Modeling framing: direct vs recursive vs multi-output forecasting

Before you pick an algorithm, decide how the model will produce a 1-step-ahead forecast versus a 12-step-ahead forecast. This is the forecasting equivalent of choosing whether your Excel model copies a formula forward, or calculates each month independently.

Recursive (iterative) forecasting trains a 1-step model (predict t+1 from history), then feeds its own predictions back in to get t+2, t+3, and so on. It’s simple and works with almost any regressor, but errors can compound. It’s tempting because it resembles “drag the formula right,” yet it can drift badly on long horizons.

Direct forecasting trains a separate model for each horizon (one model for t+1, another for t+2, etc.). It often improves stability for longer horizons, but it increases complexity: more models to maintain, more tuning, and sometimes inconsistent behavior across horizons.

Multi-output (multi-horizon) forecasting trains one model that outputs a vector of future values (t+1…t+H) at once. Some algorithms support this natively; others require wrappers. The advantage is consistent horizon behavior and shared learning; the risk is that you can hide leakage if you accidentally include features that would not be available at prediction time for all horizons.

  • Rule of thumb: If you need 1–4 weeks ahead and operational accuracy matters, recursive with a strong feature set can be fine. If you need 3–12 months ahead for planning, direct or multi-output often behaves better.
  • Engineering judgment: Align framing with how the forecast is used. A weekly replenishment forecast cares about short-horizon accuracy; an annual budget forecast cares about stable long-horizon behavior and interpretability.
  • Common mistake: training a direct model for t+12 using features that only exist at t+11 (e.g., a rolling statistic computed using data beyond the cutoff date). Treat each horizon’s feature availability explicitly.

Your framing decision affects every next step: feature construction, cross-validation, interval estimation, and how you communicate uncertainty. Make the decision once, document it, and keep it consistent when comparing models.

Section 5.2: Regularized linear models and interpretability

Regularized linear models (Ridge, Lasso, Elastic Net) are often the best first ML step after baselines because they are fast, stable, and explainable. They convert your time-series feature table into a set of weighted drivers: lags, rolling means, calendar effects, price, and promotion indicators. This feels familiar to Excel users because the model is still “a weighted sum,” but with training that chooses weights optimally and guards against overfitting.

Why regularization matters: time-series feature sets can explode (many lags, many rolling windows). Without regularization, a linear model can latch onto noise and produce fragile forecasts. Ridge shrinks coefficients smoothly; Lasso can set some to exactly zero (feature selection); Elastic Net blends both.

  • Practical workflow: standardize numeric features (especially if you mix price, units, and binary flags), fit Ridge as a default, then test Elastic Net if you want sparsity for simpler narratives.
  • Interpretability win: coefficients tell a story. Example: “A 10% increase in price is associated with a 4% decrease in demand, controlling for seasonality and promotions” (after you choose an appropriate target transform, such as log).
  • Common mistake: interpreting coefficients when features are highly collinear (e.g., lag-1 and 7-day rolling mean). Regularization helps, but coefficient stories should be phrased cautiously: describe directionality and relative importance, not causal claims.

To keep the model explainable, curate features deliberately: include a compact set of lags (e.g., 1, 7, 28), a few rolling windows (7, 28), calendar effects (day-of-week, month), and business drivers (promo, price, stockouts). You are not trying to “win Kaggle”; you are building a model your future self can debug.

Finally, always compare against a baseline fairly. A regularized linear model that doesn’t beat seasonal naive on backtesting folds is a signal: either the features are weak, the horizon framing is wrong, or the series is dominated by seasonality that a baseline already captures well.

Section 5.3: Tree-based models: feature interactions and robustness

Tree-based models (Random Forest, Gradient Boosting such as XGBoost/LightGBM/CatBoost) handle nonlinearity and feature interactions that linear models cannot. In forecasting, this matters when effects change by context: promotions work differently on weekends, price sensitivity differs by season, and stockouts create threshold behavior (demand collapses when inventory hits zero).

Random forests average many trees and are robust, but can struggle to extrapolate trends because trees predict within the range they have seen. Gradient boosting often wins accuracy by correcting errors iteratively, but it can overfit if you tune aggressively without time-aware validation.

  • Feature interactions you get “for free”: promo × holiday, price × month, lag-7 × day-of-week. You don’t have to hand-engineer interaction terms as in linear regression.
  • Robustness advantage: trees can be less sensitive to outliers and monotonic transformations, and they can cope with missingness depending on the library (though you should still treat missingness intentionally, not accidentally).
  • Common mistake: feeding raw timestamps as numeric values (e.g., 20240115) and letting the tree split on them. Instead, use meaningful time features (month, week-of-year) and lags/rolling stats. Trees don’t “understand time” unless you encode it.

To keep tree models explainable, limit your feature set to drivers you can justify, and avoid “mystery” engineered features you cannot describe. Also remember the extrapolation issue: if the business expects growth beyond historical peaks, a pure tree model may under-forecast. A practical approach is to keep a linear model as a challenger or to add trend features (time index) so the model has a path to represent gradual change.

Tree-based models are excellent candidates for your milestone champion, especially when demand is driven by promotions, pricing, and calendar interactions. Just be disciplined about validation so you don’t confuse memorization with learning.

Section 5.4: Hyperparameter tuning with backtesting folds

In Excel, you might tune a model by adjusting parameters until last year “fits.” In ML forecasting, tuning must respect time. The only honest evaluation is to simulate repeated “train on the past, predict the future” cycles. This is usually called backtesting or time-series cross-validation.

A backtesting setup defines: (1) an initial training window, (2) a forecast horizon H, and (3) how far you move forward between folds (step size). Each fold trains only on data available up to the cutoff date, then predicts the next H periods. You aggregate metrics across folds to estimate how the model will behave in production.

  • Tuning workflow: pick a small grid first (e.g., Ridge alpha values; boosting depth/learning rate/number of trees), run backtesting, then refine around the best region.
  • Overfitting prevention: keep a final “last chunk” of time as a holdout that you never touch during tuning. Backtesting guides selection; the holdout confirms you didn’t optimize to the folds.
  • Common mistake: using random cross-validation or shuffling. This leaks future information because the model sees patterns that should be unknown at the forecast date.

Be consistent about metrics and aggregation. If the business cares about units at the SKU-week level, evaluate there—not only on totals. If the cost of under-forecasting is higher than over-forecasting, consider asymmetric metrics or at least track bias (mean error) alongside accuracy (MAE, RMSE, MAPE/sMAPE).

Most importantly, compare tuned models to baselines within the same backtesting framework. A “champion” is not the model with the best single split; it’s the model that wins reliably across folds and produces stable errors over time.

Section 5.5: Prediction intervals and uncertainty communication

Point forecasts are incomplete for planning. Inventory, staffing, and budgeting decisions need a sense of uncertainty: “How bad could it be?” and “How likely is a shortfall?” Prediction intervals turn a forecast into a range with a stated coverage (for example, an 80% or 95% interval).

There are several practical ways to generate intervals in regression and tree-based workflows:

  • Quantile regression (recommended): train models to predict specific quantiles (e.g., 10th, 50th, 90th). Many gradient boosting libraries support quantile loss directly. This yields asymmetric intervals, which is often realistic for demand.
  • Residual-based intervals: backtest the model, collect forecast errors by horizon, and use empirical percentiles of residuals to form intervals around new predictions. This is simple and explainable, but assumes the error distribution is stable.
  • Bootstrap/ensemble spread: use variability across an ensemble (or bootstrapped samples) as an uncertainty proxy. Useful, but interpret carefully: model disagreement is not the same as true uncertainty.

Intervals become even more valuable when you add scenario forecasts. Instead of one future, you create “what-if” inputs: planned promotions, price changes, or supply constraints. A scenario forecast answers: “If we run a 20% discount in week 42, what is the expected lift and what range should we plan for?”

Common mistakes: presenting intervals without stating coverage (“Is that 50% or 95%?”), using symmetric intervals when the error is skewed, or mixing scenario changes with historical features in a way that causes leakage (e.g., using realized future promo flags rather than planned promo schedules).

When communicated well, intervals reduce the pressure for fake precision. They also help teams choose decisions that are robust: plans that work under the 10th–90th percentile range, not only under the median.

Section 5.6: Model explainability: feature importance and driver narratives

Explainability is not a single chart; it’s a discipline of linking model behavior to business drivers. Your milestone deliverable is not merely “a model that beats baseline,” but a model you can defend in a review meeting: what drives demand, how stable those drivers are over time, and when the model is likely to be wrong.

For regularized linear models, explanation starts with coefficients and standardized effects. You can group related features (all lags, all calendar flags, all promo features) and summarize their contribution. For tree-based models, use feature importance carefully: prefer permutation importance (measured on a time-respecting validation fold) over default “gain” importance, which can overweight high-cardinality features.

  • Driver narrative template: (1) baseline seasonality level, (2) recent momentum (lags/rolling), (3) calendar effects, (4) promotions/pricing, (5) constraints (stockouts, capacity), (6) residual risk (what’s not explained).
  • Local explanations: for a specific forecast spike, use tools like SHAP values (especially for boosting models) to show which features pushed the prediction up or down for that date/SKU.
  • Common mistake: treating importance as causality. Importance means the model relied on a feature, not that changing it will necessarily cause the outcome to change (unless you have a causal design).

To complete the milestone project, choose a champion model using backtesting results: it must beat the seasonal naive (or your chosen baseline) on the primary metric, maintain acceptable bias, and produce coherent explanations. Document the model framing (direct/recursive/multi-output), the feature list, the validation design, and one example “forecast review” narrative with intervals and key drivers. That documentation is what turns a model into an asset your team can trust and reuse.

Chapter milestones
  • Train regression-based models (regularized linear) as a strong first ML step
  • Train tree-based models (random forest/gradient boosting) for nonlinearity
  • Tune models with time-series cross-validation and avoid overfitting
  • Generate prediction intervals and scenario forecasts for planning
  • Milestone project: champion ML model that beats the baseline
Chapter quiz

1. What is the key shift from an Excel-style forecasting approach to the AI forecasting workflow described in Chapter 5?

Show answer
Correct answer: A disciplined process: define the forecast (horizon/granularity), decide how predictions are generated over time, build a baseline, then add explainable ML
The chapter emphasizes fairness and repeatability: start with clear forecast definitions and strong baselines before introducing explainable ML models.

2. Why does Chapter 5 recommend time-series cross-validation (backtesting folds) instead of a random split?

Show answer
Correct answer: It prevents using future information during training and better reflects real deployment conditions
Random splits can leak future information; backtesting evaluates models in a time-ordered way that matches how forecasts are actually used.

3. Which model choice best matches the chapter’s guidance when you suspect nonlinear relationships in the data?

Show answer
Correct answer: Tree-based models such as random forests or gradient boosting
The chapter positions tree-based models as a strong option for capturing nonlinearity while still remaining explainable when set up properly.

4. What is the “common failure mode” highlighted in Chapter 5 when evaluating forecasting models?

Show answer
Correct answer: Strong metrics on a random split that collapse when deployed because the training process effectively peeked into the future
The chapter warns that random splits can create overly optimistic results that fail in production due to leakage and unrealistic evaluation.

5. In the chapter’s milestone project, what defines a “champion” ML model?

Show answer
Correct answer: A model that reliably beats a strong baseline (e.g., seasonal naive), is tuned via backtesting, and produces explainable forecasts with prediction intervals
The goal is a defendable, repeatable forecasting system: better than baseline, tuned without leakage, and usable for planning with intervals and driver narratives.

Chapter 6: 10 Real Business Scenarios + Portfolio-Ready Delivery

Up to this point, you have practiced the mechanics: defining horizon and granularity, preventing leakage, building baselines, engineering lags and rolling features, and fitting regression/tree models. In the workplace, the difference between a “working model” and an analyst who is trusted with forecasting is delivery: choosing the right framing for the business question, producing stakeholder-ready artifacts, and setting up monitoring so the forecast remains reliable after you leave the notebook.

This chapter applies one reusable template to 10 common scenarios. Your goal is not to memorize 10 pipelines; it is to learn a repeatable way to: (1) translate an Excel forecasting habit (trendline, moving average, “same as last week”) into explicit baselines; (2) declare business constraints (stockouts, staffing limits, campaign schedules); (3) build a notebook/report that a non-technical stakeholder can read; and (4) hand off a model with a checklist and monitoring signals for drift, accuracy decay, and data quality.

Use a consistent “forecasting story arc” in every scenario: Problem framing → Data audit → Baselines → Feature set → Model + tuning → Backtest comparison → Decision threshold/ROI → Monitoring → Handoff. When you do this across multiple industries, your portfolio demonstrates breadth and professional judgment—not just code.

Practice note for Apply the template to 10 scenarios: sales, demand, staffing, churn proxy, and more: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Create a reusable forecasting notebook/report structure for stakeholders: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Add monitoring signals: drift, accuracy decay, and data quality alerts: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Package a portfolio case study and interview-ready talking points: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Milestone project: final forecast report + model handoff checklist: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Apply the template to 10 scenarios: sales, demand, staffing, churn proxy, and more: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Create a reusable forecasting notebook/report structure for stakeholders: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Add monitoring signals: drift, accuracy decay, and data quality alerts: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Practice note for Package a portfolio case study and interview-ready talking points: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.

Sections in this chapter
Section 6.1: Scenario set: retail demand, e-commerce orders, call-center volume

These three scenarios share a similar demand pattern: strong seasonality, calendar effects, and event-driven spikes. Your first decision is granularity: daily for stores and orders, and 15–60 minute intervals for call-center volume if staffing is intraday. Then set the horizon based on actionability: 1–14 days for replenishment and staffing, 4–12 weeks for planning.

Template application: Start with baselines that mirror how Excel users think: naive (yesterday equals today), seasonal naive (same day last week), and moving average (7-day or 28-day). Backtest them first; many teams skip this and jump to complex models, losing the ability to explain incremental value. Then add features: lags (1, 7, 14, 28), rolling mean/median, rolling std (volatility), day-of-week, week-of-year, holidays, payday flags, and “closed store” indicators. For e-commerce, include shipping cutoff times and site outages; for call-center, include product incidents or release dates as event flags.

Common mistakes: (1) Leakage by using future promo calendars incorrectly—promo is valid as a known-in-advance feature only if it is truly committed at prediction time. (2) Ignoring missingness: call-center logs often drop intervals; treat missing intervals as zero volume only if the system guarantees it, otherwise mark as missing and impute carefully. (3) Overreacting to outliers: a one-off flash sale should become an event feature if it repeats; otherwise cap/winsorize for model stability.

Practical outcome: Your stakeholder report should answer: “How many units/orders/calls should we expect, and what is the uncertainty?” Even if you do not build full probabilistic forecasts, provide scenario bands by backtest residual quantiles (e.g., P10/P90) and clearly state when spikes are not predictable without event inputs.

Section 6.2: Scenario set: SaaS revenue, pipeline, and renewals forecasting

SaaS forecasting is rarely a pure time series. Revenue is the result of conversion funnels, renewal schedules, and sales cycle dynamics—so your framing must separate what is scheduled (known renewals) from what is uncertain (new bookings). A practical approach is to forecast components and recombine them: renewal revenue, expansion, churned revenue, and new business.

Template application: Choose monthly granularity for executive planning and weekly for sales operations. Baselines might include “same month last year” for seasonality, a trailing 3-month moving average, and a simple renewal schedule baseline (sum of contracts expiring next month × historical renewal rate). For features, add calendar (month, quarter-end flags), cohort age (time since customer start), product tier mix, and pipeline metrics (open opportunities by stage, weighted pipeline). Use leakage-safe features: pipeline as-of date must be snapped to the forecast creation date; avoid using opportunity fields updated after the fact (close date changes are a classic leakage source).

Modeling choices: Regression-based models work well when you have stable relationships (e.g., weighted pipeline to bookings). Tree-based models handle non-linear effects such as quarter-end pushes and threshold behaviors (discounting). Tune via time-based cross-validation (rolling origin), and compare against the baselines with the same backtest windows.

Common mistakes: (1) Confusing “forecasting revenue” with “forecasting bookings” and mixing them in one target. (2) Not handling plan changes: pricing, packaging, or new sales territories can create structural breaks. Use change-point indicators and segment models (by region or product) when appropriate.

Practical outcome: Deliver a decomposition view: “Expected renewals + expected expansion + expected new bookings,” with drivers and risks. This is interview-ready because it shows business understanding, not only metrics.

Section 6.3: Scenario set: inventory replenishment and stockout-aware forecasting

Inventory forecasting is where business constraints are non-negotiable. If you forecast sales from observed sales, you may be learning censored demand: stockouts hide true demand. Your workflow must explicitly address when “zero sales” means “zero demand” versus “no inventory.”

Template application: Frame the problem as either (a) demand forecasting (what customers would buy) or (b) sales forecasting (what you will sell given constraints). For replenishment, demand is usually the right target, but you may only observe sales. Add inventory-on-hand and stockout flags as features, and consider excluding stockout periods from training or using a correction method (e.g., impute demand during stockouts using similar weeks or nearest neighbors across stores/SKUs). Always document the choice and its limitations.

Baselines: Seasonal naive per SKU-store is often surprisingly strong. A moving average baseline should be computed only on in-stock days to avoid under-forecasting. Evaluate with a cost-aware lens: under-forecasting can cause stockouts, over-forecasting can create holding costs. In your report, include an asymmetric metric or a business-weighted score (e.g., weighted absolute error where under-forecasting costs 3× over-forecasting).

Common mistakes: (1) Treating replenishment lead time as an afterthought. Your horizon must align with lead time + review period. (2) Not aggregating correctly: replenishment might be store-level but procurement may need vendor-level aggregation. Build forecasts at the lowest reliable level, then roll up using consistent hierarchy rules.

Practical outcome: Produce a “reorder recommendation table” that uses forecast + safety stock logic (even if simplified), and state clearly what the model assumes about lead time variability and service level.

Section 6.4: Scenario set: marketing leads, spend response, and promo planning

Marketing forecasting often mixes time-series forecasting with causal assumptions. Leads next week depend on seasonality and also on spend, channel mix, creative changes, and promotion calendars. Your engineering judgment is to separate forecasting under a fixed plan (what happens if we keep spend as scheduled) versus planning/optimization (what spend should be).

Template application: For lead volume, choose daily or weekly granularity depending on campaign cadence. Baselines: seasonal naive by weekday (for daily) or by week-of-year (for weekly), plus a moving average. Features: lags and rolling stats of leads, spend by channel, impression/click metrics, holiday flags, and campaign start/stop indicators. For promos, include discount depth and promo type as known-in-advance inputs only if the promo calendar is locked at prediction time.

Spend response: Use models that can capture diminishing returns (trees can approximate; regression can use transforms like log(spend)). Be careful interpreting “feature importance” as causal effect. In stakeholder language: “This model predicts well given historical patterns; it does not prove causation without a designed experiment.”

Common mistakes: (1) Data latency: ad platform metrics can arrive late or be restated; build data quality checks for sudden backfills. (2) Mixing attribution windows: conversions may be reported days after clicks, creating apparent “future information.” Align metrics by the timestamp they were known.

Practical outcome: Provide a planning view: forecast leads under three spend scenarios (baseline, +10%, −10%) and show sensitivity. This turns a forecast into a decision tool.

Section 6.5: Scenario set: operations staffing, delivery ETAs, and capacity planning

Operations scenarios are where forecast errors become service-level failures. Staffing forecasts need interval-level accuracy and robust monitoring; ETA and capacity planning require careful definition of the target and what is “known” at prediction time.

Staffing (workforce volume): Forecast arrival volume (calls, tickets, orders) and then translate to staffing using a service model (even a simplified one). Baselines: same interval last week (seasonal naive), and a rolling average by interval-of-week. Features: interval-of-day, day-of-week, holidays, product incidents, and weather (if relevant). A frequent mistake is smoothing away peaks; for staffing, peaks matter more than averages. Consider pinball loss (quantile) targets if you can, or at least report high-percentile error behavior.

Delivery ETAs: ETAs are not purely time series; they are supervised prediction with time as an index. Define what you predict: “time from dispatch to delivery” or “arrival timestamp.” Use leakage-safe features: distance, route, hub load, traffic as-of time, driver capacity, and pickup delay. Backtesting must respect chronology: train on past deliveries, predict future ones, and avoid features updated after delivery (e.g., “actual route”).

Capacity planning: Often monthly/weekly and tied to constraints (warehouse throughput, production line limits). Your report must include constraint checks: if forecast exceeds capacity, the decision becomes “what to prioritize,” not “improve the model.”

Monitoring signals: Add three alerts: data quality (missing intervals, timestamp shifts), drift (feature distribution changes like traffic patterns), and accuracy decay (rolling MAE/MAPE by week). In ops, monitoring is part of delivery, not an optional add-on.

Section 6.6: Portfolio delivery: narrative, visuals, limitations, and next steps

Your portfolio should look like something you could send to a stakeholder and then hand off to an engineering team. Use a reusable notebook/report structure with consistent headings: Executive summary (what decision this supports), Data (sources, granularity, known gaps), Method (baselines → features → model), Results (backtest table + plots), Recommendations, Monitoring, and Handoff checklist.

Visuals that interviewers recognize immediately: (1) Time-series plot with train/test split and predicted vs actual, (2) error over time (rolling MAE) to show stability, (3) residual diagnostics (by day-of-week or by promo vs non-promo), and (4) forecast decomposition or driver chart (top features, but labeled carefully as predictive drivers, not causal truth). Add a simple baseline comparison bar chart; it signals discipline.

Limitations section (required): State what the model cannot know: unplanned promotions, black swan events, inventory constraints, late-arriving data, policy changes. Mention how you prevented leakage and what assumptions you made about feature availability at prediction time. This is often the difference between a “student project” and a professional artifact.

Monitoring + handoff checklist: Include: data freshness SLA, schema checks, missingness thresholds, outlier rules, retraining trigger (e.g., MAE worsens by 20% for 3 consecutive weeks), and a rollback plan to a baseline model. Provide a model card: target definition, horizon, granularity, training window, evaluation approach, and known failure modes.

Milestone project: Pick one scenario and deliver a final forecast report plus a model handoff checklist. Prepare talking points: why this horizon/granularity, which baseline won and why, what features mattered, what you would monitor, and what you would do next (probabilistic forecasts, hierarchical reconciliation, or causal experiments). This is your portfolio-ready delivery.

Chapter milestones
  • Apply the template to 10 scenarios: sales, demand, staffing, churn proxy, and more
  • Create a reusable forecasting notebook/report structure for stakeholders
  • Add monitoring signals: drift, accuracy decay, and data quality alerts
  • Package a portfolio case study and interview-ready talking points
  • Milestone project: final forecast report + model handoff checklist
Chapter quiz

1. According to Chapter 6, what most distinguishes a “working model” from being trusted with forecasting in a workplace setting?

Show answer
Correct answer: Delivery: business framing, stakeholder-ready artifacts, and monitoring for reliability
The chapter emphasizes that trust comes from delivery: clear framing, readable outputs, and ongoing monitoring—not just a model that runs.

2. What is the intended learning goal of applying one reusable template to 10 scenarios?

Show answer
Correct answer: To learn a repeatable method that adapts to different business problems
The chapter states the goal is not memorization but a repeatable approach that works across scenarios.

3. Which set best reflects the chapter’s “repeatable way” to move from Excel habits to professional forecasting delivery?

Show answer
Correct answer: Translate Excel habits into explicit baselines; declare business constraints; build a stakeholder-readable notebook/report; hand off with a checklist and monitoring signals
Chapter 6 lists these four elements as the repeatable method for reliable delivery beyond the notebook.

4. Which sequence matches the chapter’s recommended “forecasting story arc” for each scenario?

Show answer
Correct answer: Problem framing → Data audit → Baselines → Feature set → Model + tuning → Backtest comparison → Decision threshold/ROI → Monitoring → Handoff
The chapter provides this story arc as a consistent structure for communicating and delivering forecasts.

5. Why does Chapter 6 emphasize adding monitoring signals like drift, accuracy decay, and data quality alerts?

Show answer
Correct answer: To ensure the forecast remains reliable after it leaves the notebook and after you hand it off
Monitoring is framed as essential for keeping forecasts dependable over time and after handoff.
More Courses
Edu AI Last
AI Course Assistant
Hi! I'm your AI tutor for this course. Ask me anything — from concept explanations to hands-on examples.