Distribution: Cold-Chain Prediction and Demand Forecasting

📍 Where we are: Part V · Fill-Finish & Release, Learned — Chapter 20, the last of the manufacturing spine. The previous chapter, packaging and serialization, gave every vial of BATCH-2026-001 an SGTIN that can be verified anywhere in the chain. Now the product leaves the factory entirely, and the learning problem leaves with it: keep DP-001 cold enough, route it past the risky lanes, and make the right number of doses in the first place — so the molecule that began as a target in Chapter 4 finally reaches a patient with its quality intact.

The whole genealogy — WCB-CHO-001 → SEED-001 → BATCH-2026-001 → CLAR-001 → PApool-001 → DS-001 → DP-001, from working cell bank (WCB-CHO-001) through seed (SEED-001), production batch (BATCH-2026-001), clarified harvest (CLAR-001), Protein A pool (PApool-001), drug substance (DS-001) and drug product (DP-001) — has been about building a molecule and proving it is what we say it is. None of that matters if the vial spends six hours on a hot tarmac in July. Distribution is the one stretch of the process the manufacturer does not physically control: the product is in the hands of couriers, customs, wholesalers, and pharmacies, and the only thing travelling with it is a temperature logger and a label claim that says "store at 2 to 8 degC" (2 to 8 degrees Celsius). The learning problem here is unusual for this book because it leaves the bioreactor behind and looks like a logistics problem — and yet it inherits the same discipline every prior chapter built: a deterministic, auditable core that decides, with the learned model scoring risk around it rather than in the path of release.

Two ML-shaped jobs live in the last mile, and they are as different from each other as the two halves of packaging were. The first is per-shipment: given a temperature trace that has already happened (or is happening live), how much of the product's stability budget has been spent, and has the label claim been breached? The second is per-lane and per-period: before a shipment leaves, which routes are likely to breach, and how many doses should we have made and positioned so the right product is in the right place at the right time? The first job is anchored by a fixed piece of chemistry — the Mean Kinetic Temperature — and the second is where forecasting and risk-scoring models earn their keep.

The simple version

Think of shipping ice cream across the country. You do not actually care about the average temperature of the trip — you care that it never got warm enough to melt, and a few minutes warm cost you far more than a few minutes extra-cold ever helped. A Mean Kinetic Temperature is the honest single number that captures that asymmetry: it is the temperature the trip "felt like" to the chemistry inside, weighted so the warm spells count for more. Once you can put a number on how much a trip costs the product, two more questions follow naturally: which delivery routes tend to get warm (so you pack those ones in a better cooler), and how many tubs to make and where to stock them so you are never short and never throwing melted ones away. Cold-chain ML is those three questions, in order.

What this chapter covers

The stability budget and Mean Kinetic Temperature (MKT): the Arrhenius-weighted equivalent temperature of a thermal history, why it is a fixed equation rather than a model, and how the excursion verdict rides on it.
Temperature-excursion prediction: forecasting a breach before the logger proves it, from the live trace and the shipment context.
Lane-risk scoring: a learned classifier that ranks routes, carriers, and seasons by their probability of a future breach, so scarce premium packaging and audits go where they matter.
Demand forecasting and supply-chain analytics: predicting how many doses to make and position, and why the bullwhip effect makes this its own hard problem.
The anatomy of one cold-chain shipment record, the GMP/GDP and draft Annex 22 angle that keeps the physics deterministic, and the honest limits of forecasting a chain you do not control.

The stability budget: what a shipment actually spends

Every biologic carries a shelf life earned in a formal stability study — DS-001 and DP-001 were placed on stability at the recommended 2 to 8 degC and at accelerated conditions, and the data fixed an expiry date and a label storage claim under ICH Q1A(R2), the guideline that governs stability testing of new drug substances and products — complemented for biologics by ICH Q5C, the biotechnological/biological-product stability companion [1]. That shelf life is a budget: it assumes the product stays inside its label claim. Every minute a shipment spends warmer than 8 degC draws down that budget faster than the study assumed, and — this is the subtlety — it draws down faster than the average temperature would suggest, because chemical degradation is exponential in temperature, not linear.

This is why distribution is not just "did it stay cold." A shipment that sat at 5 degC for three days and a shipment that averaged 5 degC but spiked to 15 degC for five hours have the same arithmetic mean and wildly different effects on the product. To account for that honestly you need a temperature metric that weights the warm hours the way the chemistry does — and that metric is the Mean Kinetic Temperature.

The degradation modes a warm excursion accelerates are the same CQAs (critical quality attributes) the QC chapter released the lot on: aggregation (the high-molecular-weight species size-exclusion chromatography, SEC, measures), fragmentation, charge-variant drift (the acidic and basic species cation-exchange chromatography, CEX, resolves), the subvisible particles tied to immunogenicity risk, and deamidation/oxidation of the sequence the developability chapter screened. None of these reverse when the vial cools back down — the budget only ever spends, never refills — which is why a cold dip below 2 degC does not "bank" credit against a later warm spell. That irreversibility is exactly what the Arrhenius weighting in the next section encodes: it sums the rate of an irreversible reaction over the trip, it does not average a reversible quantity.

The cold side fails by a different mechanism, and that difference matters for how the budget is accounted. Below 2 degC the risk is usually freezing, not slow chemistry: ice freeze-concentrates the protein and buffer salts, shifts pH as one phosphate salt crystallizes ahead of the other, and drives interfacial aggregation at the ice-liquid boundary. A warm-weighted metric like MKT cannot see this — a cold dip actually lowers MKT — which is why the deterministic core accounts hours-below-2 degC separately and never lets a cold excursion bank credit against the warm budget.

Mean Kinetic Temperature: the deterministic physics core

The Mean Kinetic Temperature (MKT) is the single, constant temperature that would cause the same amount of Arrhenius-driven chemical degradation as the actual, fluctuating temperature history of a shipment. It comes straight from the Arrhenius equation — reaction rate scales as k = A · exp(-Ea / (R·T)), where A is a pre-exponential constant, Ea is an activation energy, R is the gas constant, and T is absolute temperature in kelvin — and Haynes' 1971 formula inverts the time-average of that rate back into an equivalent temperature:

MKT (in kelvin) = -(Ea / R) / ln( mean over i of exp( -Ea / (R · T_i) ) )

with each T_i a temperature reading in kelvin. Read the formula from the inside out and the asymmetry is mechanical, not magical. For each reading you compute exp(-Ea / (R·T_i)), which is proportional to the instantaneous degradation rate at that temperature; you take the plain arithmetic mean of those rates over the trip; then you ask which single constant temperature would produce that mean rate, and invert the exponential to get it back as a temperature. Because exp is convex, the mean of the rates is dragged upward by the few hot readings far more than it is pulled down by the cold ones — Jensen's inequality made physical — so the equivalent temperature you recover is always at or above the arithmetic mean of the trace, and the gap grows with the size and duration of the warm excursions. That is precisely the asymmetry the budget needs, and it is why the cold dips in our example trace earn no credit.

A second subtlety lives in the choice of Ea. The convention from the USP general chapter on MKT and ICH Q1A(R2)'s worked stability examples fixes an activation energy of Ea = 83.144 kJ/mol together with R = 8.3144 × 10⁻³ kJ/(mol·K) — chosen so that Ea/R works out to exactly 10000 K, which is why the two figures share so many digits (a deliberate convention, not a coincidence; the shipped code carries the full CODATA value R = 8.314462618 × 10⁻³ and the two agree to every digit MKT prints) — which is why MKT is reproducible across companies rather than a tuning knob [1][2]. A higher Ea would make the calculation more sensitive to warm spikes (a more convex weighting) and a lower one less; pinning it by convention is what stops MKT from quietly becoming a parameter a vendor could fit to make a lane look safer. The reading cadence also matters: the general MKT is duration-weighted (each instantaneous rate counts in proportion to its dwell time), which collapses to the equal-weight mean only when readings are evenly spaced — a logger sampling every ten minutes, as ours does, satisfies that; an irregular log must be resampled (or dwell-weighted) before the mean is taken.

The single most important property of MKT, for this book, is that it is not a machine-learning model. It is a deterministic equation with no fitted parameters: feed it a thermal history and an activation energy and it returns one number, the same number every time, derivable on paper. That makes it the physics core of the cold-chain decision, the cold-chain analogue of the deterministic GS1 rules that carried the critical decisions in the packaging chapter. The release-relevant verdict — is this shipment inside its label claim, and is its accumulated thermal stress within the qualified budget? — rides on this deterministic core, not on a learned classifier. Note that MKT is necessary but not sufficient on its own: a shipment can have an acceptable whole-trip MKT and still fail because it crossed an absolute label limit (the 2 to 8 degC band is a hard claim, not just a budget), so the verdict combines the MKT against the qualified storage MKT and the time-out-of-band excursion accounting. Both are arithmetic. The learned models in this chapter score risk; they never overrule the arithmetic that decides whether a shipment is acceptable.

Evidence

MKT and excursion accounting against a label claim are (production) practice across regulated cold-chain distribution — the metric is defined in the USP general chapter on MKT (Haynes' equation, default Ea = 83.144 kJ/mol) and underpins ICH Q1A(R2) stability storage statements; the calculation itself is fixed chemistry, peer-reviewed and standardized, not a vendor model [1][2]. What is learned — excursion prediction and lane-risk scoring — sits one layer out and is (pilot) in most of the industry: cold-chain platforms (Controlant, Sensitech, Tive, and the temperature-monitoring vendors) collect the logger data at scale, but predictive lane-risk models layered on top are an applied, mostly vendor-self-reported capability rather than a settled, validated one. Treat the MKT/excursion arithmetic as production-grade and the predictive scoring as advisory.

The cold side of the budget: freeze-concentration

So far the budget has been a warm story, and that is the trap. The asymmetry the MKT lens captures is the asymmetry of warm excursions — warm is faster chemistry, and Arrhenius weighting is exactly the right tool for faster chemistry. But the cold side does not fail by slow-chemistry-run-backwards; below 0 degC it fails by a different mechanism entirely, and the MKT lens is blind to it because that lens only knows how to speed up or slow down the same reactions. A cold dip does not just earn no credit — it can be the single most damaging thing that happens to the vial, and a warm-weighted average will rate that trip better than a trip that stayed comfortably in band.

The mechanism is freeze-concentration, and it is worth understanding because it explains why "it only got a little cold" is not reassuring. As a formulation freezes, pure water is what crystallises into ice first; everything dissolved in it — the protein, the buffer salts, the surfactant, the stabilising sugar — is excluded from the growing ice and crowded into the shrinking pocket of liquid that has not yet frozen. That unfrozen phase can become dramatically more concentrated than the original formulation, and two bad things follow. The first is a pH shift: a buffer holds pH steady only while its two salt forms stay in their designed ratio, and when one form crystallises out of solution ahead of the other (the classic case is one of the two sodium phosphate salts precipitating first), the surviving liquid swings in pH — sometimes far enough to push the protein toward a region where it is unstable. The second is aggregation at the ice interface: protein molecules adsorb and partially unfold against the vast, cold ice-water surface that a freeze creates, and that interfacial stress — together with cold-denaturation, the genuine phenomenon where some proteins are less stable cold than warm — drives the formation of the same aggregates and subvisible particles the QC chapter screens for. For many mAb formulations this freeze pathway, not slow warm degradation, is the dominant cold-chain failure mode, which is exactly why the label claim has a hard floor at 2 degC rather than just an upper limit.

The data consequence is the part this chapter has to get right: a single MKT number is the wrong feature for a freeze event. MKT answers "how much faster did the warm chemistry run," and a freeze does not run that chemistry faster — so the metric that summarises a warm trip so well summarises a freeze trip away to nothing, or worse, flatters it. A model that has to tell a warm excursion apart from a freeze needs features that describe the freeze on its own terms: the excursion's minimum temperature (how far below freezing it went, since deeper freezes concentrate more), the time spent below freezing (how long the freeze-concentrated state persisted), and the freeze/thaw cycle count (because each cycle is a fresh round of interfacial stress, and repeated cycling is far worse than one long hold). These are distinct, physically meaningful inputs that collapse to a single uninformative value the moment you average them into an MKT. The cold-chain lane-risk model the chapter builds in coldchain.py already accounts hours-below-2 degC separately from the warm budget for exactly this reason; the mechanistic point here is that an honest cold-chain model should keep warm-excursion and freeze-excursion features distinct — a minimum-temperature and a cycle-count feature alongside the warm-side MKT and hours-above-8 degC — rather than letting one Arrhenius-weighted number stand in for two unrelated failure mechanisms.

Temperature-excursion prediction: catching the breach before the logger proves it

The simplest cold-chain ML question is reactive: a shipment arrives, you download the logger, you compute MKT and the time-above-8 degC, and you decide. That is valuable but late — the product is already at the wholesaler. The predictive version is more useful: from the live trace so far, plus the shipment's context (where it is, how many legs remain, the qualified shipper's remaining hold time, the forecast weather along the route), estimate the probability that the shipment will breach its label claim before it does, so the receiving site can intervene — expedite, reroute, or quarantine on arrival.

This is a time-series classification problem with a familiar bioprocess shape, and it shares the small-data, mostly-normal character of the serialization anomaly problem: the vast majority of shipments arrive clean, breaches are rare, and confirmed product-impacting excursions are rarer still. Two framings compete. The first is a survival / time-to-event model — a family of models that predicts when (or whether) an event happens, built to cope with records where the event has not occurred yet. The question "will this shipment breach before it arrives?" is exactly a time-to-event question with right-censoring (most shipments arrive without the event ever occurring, so their breach time is unknown rather than zero), so a Cox proportional-hazards or a discrete-time hazard model — two standard such models, where a hazard is the instantaneous chance of breaching in the next moment given the shipment has survived clean so far — fits the data's true shape and naturally handles the censoring that a plain classifier ignores. The second, and the one in wider practical use, treats it as rolling-window classification: at each logger reading you featurise the trace-so-far and ask a binary classifier for the breach probability over the remaining horizon.

The features that matter are not exotic. The strongest is the qualified packaging's remaining thermal hold time: a qualified shipper is rated to hold 2 to 8 degC for, say, 96 hours, and how much of that budget is left after the legs already flown is the single dominant predictor — it is the physical mechanism (once the phase-change coolant is exhausted, the payload drifts toward ambient at a rate set by insulation and thermal mass, with shippers qualified against seasonal profiles). Around it sit the trace's recent slope and rolling variance (a rising slope is the early signature of a coolant nearing exhaustion), the running MKT and time-near-band so far, the number and type of remaining handoffs (tarmac dwell and customs holds are where breaches concentrate), and ambient/weather along the forecast route. Because the positive class — here a breach — is tiny, the same imbalance toolkit the rest of the book leans on applies — class weighting, threshold tuning against a recall target rather than accuracy, and reporting precision–recall (AUPRC, which stays honest when breaches are rare) rather than a flattering ROC-AUC (which can look excellent simply because almost every shipment is correctly called a non-breach), just as the release-prediction model did. One leakage trap is worth naming, because it is the easy way to build a model that looks brilliant and predicts nothing: every feature at reading t must be computed from the trace up to t only — trailing windows, the running (not whole-trip) MKT, the slope of the last hour — and the label must be the breach status over the forward horizon. A centered rolling statistic or the full-trip MKT silently feeds the model the future it is supposed to forecast, and the held-out score flatters a model that has simply been shown the answer. To make it concrete (illustratively — this describes our synthesised trace, not a real model run): on our own 72-hour trace the breach does not show in the logger until the tarmac leg at hour ~40, but a rolling-window classifier watching the rising slope and the dwindling shipper hold-time margin could have flagged the rising probability a leg earlier — buying exactly the intervention window the reactive download cannot. The model is advisory: a high predicted-breach probability triggers a human decision and a closer watch, exactly as the FDA's 2023 Artificial Intelligence in Drug Manufacturing discussion paper and the draft EU/PIC/S GMP Annex 22 expect of ML that touches a quality-relevant outcome [3][4].

Lane-risk scoring: where the learned model actually pays off

The highest-value learned layer in distribution is not per-shipment at all — it is per-lane. A lane is a route-plus-mode-plus-carrier combination (Frankfurt to São Paulo, air freight, carrier X, summer), and across a year a manufacturer ships thousands of times over a few hundred lanes. Some lanes breach far more often than others — long-haul air with multiple customs handoffs in a hot season, short-margin packaging, an unreliable ground leg at the destination. If you can score each lane by its probability of a future breach before you ship, you can do something genuinely cost-effective: send the expensive active-cooled containers and the redundant data-loggers down the risky lanes, and ship the safe lanes in cheaper passive packaging. Scarce mitigation goes where the risk is.

This is a supervised classification problem with real labels — a past shipment either breached its label claim or it did not — so unlike the serialization anomaly case it does not have to be unsupervised. The features describe the lane and the shipment plan: total distance, number of handoffs, season, the lane's historical excursion rate, and the qualified shipper's hold-time margin against the planned transit time. A gradient-boosted tree ensemble — many small decision trees (each a cascade of yes/no questions on the features), trained in sequence so that each new tree corrects the errors the ones before it made — is a good fit here for the same reasons it dominates tabular risk-scoring everywhere — it handles mixed feature scales and non-linear interactions (a long distance is only dangerous when the hold margin is also tight) without much tuning, and it yields feature importances a logistics team can read. The output is a probability, ranked, and the model is again advisory: it prioritises which lanes get an upgraded shipper or a data-logger audit, it never decides that a specific shipment is safe to release.

Three details separate a credible lane-risk model from a misleading one. First, the label must be the deterministic verdict, not a guess: each historical shipment is labelled breach-or-not by the same MKT/excursion arithmetic the physics core computes, so the learned layer is trained on the auditable truth, never on a second model's opinion. Second, validation has to respect time and lane: a random train/test split leaks, because two shipments on the same lane in the same week are near-duplicates and a future model never gets to peek at later seasons — the honest evaluation is a forward (time-blocked) or leave-one-lane-out split, scored on the rare positive class with AUPRC alongside ROC-AUC. Third, calibration matters more than ranking once the score drives spend: a calibrated probability means what it says — a 0.7 score should correspond to a breach really happening about 70% of the time, not merely rank above a 0.6 lane — and if you upgrade every lane scoring above some breach probability, that threshold is a money decision, so the probabilities should be calibrated using isotonic or Platt scaling (two standard recalibration methods) on a held-out fold. The threshold itself is then set against the cost asymmetry: a missed breach ruins a batch of doses, a false alarm wastes one premium shipper.

The last mile, learned: a deterministic Mean Kinetic Temperature core (cyan) accounts every shipment trace against the 2-to-8 degC label claim and decides the in-claim verdict, while a learned layer (violet) scores risk around it — predicting a breach before the logger proves it and ranking lanes so scarce premium packaging goes where the risk is — with every learned output advisory and a human deciding. Original diagram by the authors, created with AI assistance.

A runnable model: coldchain.py

The example module examples/platform/ml/coldchain.py builds both layers and keeps them deliberately separated. The deterministic core computes MKT and excursion accounting on a shipment trace; the learned layer trains a gradient-boosted lane-risk classifier. Because the series' simulator does not model logistics — there is no committed shipping or weather dataset — the shipment trace is synthesised and the lane history is synthetic and clearly labelled illustrative; but the running example's identity is kept (the product is mAb-A, the lot is DP-001 from BATCH-2026-001, the label claim is 2 to 8 degC), and the MKT arithmetic itself is exact chemistry, not a model. It runs standalone with no services.

The physics core is twelve lines and reads as the equation. mean_kinetic_temperature lifts each Celsius reading into kelvin, takes the plain mean of exp(-Ea/(R·T)) over the trace, and inverts it through -(Ea/R)/ln(...) — Haynes' formula, no fitted parameters. excursion_summary then accounts the trace against the hard 2 to 8 degC band: it counts the readings above 8 degC and below 2 degC, scales the counts by the minutes_per_step cadence into hours-out-of-band, and returns the mean, max, min, MKT, and a boolean excursion flag — the exact field set the anatomy card lays out below.

# examples/platform/ml/coldchain.py (excerpt)
import numpy as np
from sklearn.ensemble import GradientBoostingClassifier

R_GAS = 8.314462618e-3          # gas constant, kJ/(mol*K)
EA_DEFAULT = 83.144            # activation energy, kJ/mol (ICH Q1A worked example)
KELVIN = 273.15
LABEL_LOW_C, LABEL_HIGH_C = 2.0, 8.0   # DP-001 label claim: store 2-8 degC


def mean_kinetic_temperature(temps_c: np.ndarray, ea: float = EA_DEFAULT) -> float:
    """Haynes' MKT: the single equivalent temperature (degC) of a thermal history.

    MKT_K = -(Ea/R) / ln( mean_i exp(-Ea/(R*T_i)) ), T_i in kelvin. Always >= the
    arithmetic mean because Arrhenius weighting penalises the warm excursions.
    """
    t_k = np.asarray(temps_c, float) + KELVIN
    weighted = np.mean(np.exp(-ea / (R_GAS * t_k)))
    mkt_k = -(ea / R_GAS) / np.log(weighted)
    return float(mkt_k - KELVIN)


def excursion_summary(temps_c, minutes_per_step=10.0, ea=EA_DEFAULT) -> dict:
    """Account a shipment trace against the 2-8 degC label claim — deterministic."""
    t = np.asarray(temps_c, float)
    step_h = minutes_per_step / 60.0
    warm, cold = t > LABEL_HIGH_C, t < LABEL_LOW_C
    return {"mean_c": round(float(t.mean()), 3), "max_c": round(float(t.max()), 3),
            "min_c": round(float(t.min()), 3),
            "mkt_c": round(mean_kinetic_temperature(t, ea), 3),
            "hours_above_8C": round(float(warm.sum() * step_h), 2),
            "hours_below_2C": round(float(cold.sum() * step_h), 2),
            "excursion": bool(warm.any() or cold.any())}


def train_lane_risk(seed: int = 2026) -> dict:
    """Gradient-boosted breach-probability model over a synthetic lane history."""
    X, y = synth_lane_history()                       # ILLUSTRATIVE: no logistics data ships
    Xtr, Xte, ytr, yte = train_test_split(X, y, test_size=0.3,
                                          random_state=seed, stratify=y)
    clf = GradientBoostingClassifier(n_estimators=200, max_depth=3,
                                     learning_rate=0.05, random_state=seed).fit(Xtr, ytr)
    auc = roc_auc_score(yte, clf.predict_proba(Xte)[:, 1])
    return {"clf": clf, "auc": round(float(auc), 3)}

The learned layer is honest about being synthetic. synth_lane_history generates each past shipment's breach label from a plausible logistic relationship — more legs, summer, a high historical rate, and a low packaging margin all raise the breach odds — plus Gaussian noise, so the gradient-boosted classifier has a real signal to recover while the data stays unmistakably illustrative. train_lane_risk does a stratified split (preserving the breach ratio across train and test) — leak-free here only because synth_lane_history draws every row independently, with no lane-grouping and no time axis to leak across; a real lane history, where two shipments on one lane in one week are near-duplicates, would instead need the time-blocked or leave-one-lane-out split the section above insists on. It fits 200 shallow trees at a 0.05 learning rate and reports the held-out ROC-AUC and the feature importances; score_lane then asks the fitted model for the breach probability of a future lane plan. The acceptance gate is an assert that the model clears ROC-AUC > 0.8, that the synthesised trace flags its warm excursion, that MKT exceeds the arithmetic mean, and that the hot/long lane scores riskier than the cool/short one — necessary evidence, not GMP sufficiency.

Running python platform/ml/coldchain.py prints the following. The MKT and excursion arithmetic on the trace is exact (deterministic given the synthesised trace); the lane-risk ROC-AUC and the two scored lanes are deterministic given the seed but rest on a synthetic, illustrative lane history, not real shipment outcomes:

cold-chain stability budget for DP-001 (label claim 2-8 degC):
  trace: 432 points over 72 h  mean=5.466 degC  max=14.999 degC  min=3.703 degC
  arithmetic mean 5.466 degC  <  MKT 5.734 degC (Arrhenius weights the warm hours harder)
  time above 8 degC = 4.17 h ; below 2 degC = 0.0 h ; excursion flagged = True

lane-risk classifier (ILLUSTRATIVE synthetic lane history):
  GradientBoosting on 840 train / 360 test lanes (base breach rate 0.347)  ->  held-out ROC-AUC = 0.807
  feature importances: {'distance_km': 0.274, 'n_handoffs': 0.134, 'summer': 0.058, 'hist_excursion_rate': 0.191, 'shipper_hold_margin_h': 0.343}
  scored lane A (600 km, 2 legs, winter, spare shipper): breach prob = 0.020  -> ship as-is
  scored lane B (8200 km, 6 legs, summer, tight shipper): breach prob = 0.971  -> upgrade shipper / add logger

ASSERT ok: MKT > mean, the warm excursion is flagged, and the hot/long lane scores riskier (illustrative).

Read this output the way a cold-chain quality lead would. The deterministic core does the load-bearing work: the trace averaged 5.466 degC and would have passed a naive average-temperature check, but it spiked to nearly 15 degC and spent 4.17 hours above 8 degC, so the excursion is flagged and the MKT of 5.734 degC sits above the arithmetic mean — the Arrhenius weighting making the warm hours cost more, exactly as the chemistry demands. That 0.268 degC gap between mean and MKT is small here only because the excursion was brief; stretch the warm spell and the gap widens, which is the entire reason MKT exists, and it is computed, not learned. Only then does the advisory layer run: the lane-risk model recovers a real signal from the synthetic history (ROC-AUC = 0.807 against a 0.347 base breach rate), and its feature importances put the qualified shipper's hold-time margin (0.343) and distance (0.274) at the top — the two physically obvious drivers — and summer shows only 0.058. But read that number carefully: the synthetic season effect is real and additive (it is one of the larger terms in the data-generating logit), and its small importance here is mostly an artefact of impurity-based importance, which under-credits a binary feature because a tree splits the data by asking yes/no questions, and it can ask the single summer? question only once, while it can keep re-asking about a continuous feature like distance or hold margin at many different cut-points. The honest readout of which driver matters how much is a permutation-importance or SHAP analysis on a held-out fold, not the Gini importances printed here; we show the Gini numbers only because they are what the fitted estimator exposes for free. The two scored lanes show the payoff: a short winter lane with spare packaging scores a 0.020 breach probability and ships as-is, while a long-haul summer lane with six handoffs and a tight shipper scores 0.971 and gets an upgraded container and a logger. Rules-and-physics decide; the model prioritises where to spend mitigation. One honest caveat about this toy run: the synthetic base breach rate is 0.347, far higher than a real chain's, so ROC-AUC is informative here and the module reports it alone. On a real lane history — where breaches are rare — you would add AUPRC and calibrate the probabilities (isotonic on a held-out fold) before letting the score drive packaging spend, exactly as the lane-risk section above prescribes.

Demand forecasting: making the right number of doses

Cold-chain risk is the downstream half of distribution. The upstream half — and the one with the largest financial stake — is demand forecasting: predicting how many doses of mAb-A to make and where to position them, far enough ahead that the long biomanufacturing lead time (a batch is months from seed to release) can actually meet the demand. Forecast too low and patients go without; forecast too high and product expires unused, which for a 2-to-8 degC biologic with a finite shelf life is pure write-off.

The forecasting problem has a particular structure that shapes the model choice. Pharmaceutical demand is a mix of a slow trend (a drug's adoption curve), seasonality (some therapies are seasonal; manufacturing and ordering have their own calendar rhythms), and sharp, hard-to-predict shocks (a competitor withdrawal, a guideline change, a pandemic). Classical statistical forecasters — exponential smoothing, ARIMA, and their seasonal variants (SARIMA, ETS) — remain strong baselines — the simple, well-understood methods any fancier model must out-predict to justify itself — and are often hard to beat for a single, stable product line, which is why any honest forecasting comparison reports them as the bar to clear. Machine-learning forecasters earn their place when there are many related series to learn across (every product, every region, every distribution centre): gradient-boosted regressors on lag-and-calendar features and, increasingly, global deep sequence models (a single network trained jointly over all series — the DeepAR/N-BEATS/temporal-fusion-transformer lineage) can borrow strength across series — letting a thin or new series lean on the patterns learned from related, data-rich ones — that a per-series ARIMA cannot, and they can emit a probabilistic forecast (quantiles, not just a point — a range with probabilities attached, such as a 90% chance demand stays below some level, instead of one best-guess number), which is what an inventory policy actually needs to set a safety stock. How such a forecast is scored matters as much as the model: a probabilistic forecast is evaluated with a quantile (pinball) loss, not RMSE, because the cost of being wrong is asymmetric and lives in the tails an inventory policy reads; it is backtested with a rolling (expanding-window) origin rather than one holdout, so the evaluation respects time; and it is reported scale-free — for instance as MASE against a seasonal-naive baseline — so "good" means "beats the naive forecaster," the only honest bar across products of very different volume.

The hard part is not the algorithm; it is the bullwhip effect. Small fluctuations in patient-level demand amplify into large swings in factory orders as each tier of the supply chain (pharmacy, wholesaler, distributor) adds its own safety stock and reordering logic. A forecast that fits the factory order history is fitting the bullwhip, not the underlying demand, and it will overreact. The supply-chain analytics that work pull the forecast back toward true consumption signals — point-of-dispense data, prescription trends — and treat the order stream as a distorted observation of demand rather than demand itself. The forecast is also not the end product: it feeds an inventory policy (a base-stock or newsvendor calculation that weighs the cost of a stockout against the cost of expiry), and for a short-shelf-life biologic that policy must respect the remaining shelf life of stock already positioned — the same stability budget the MKT core tracks, now used forward. This is where ML supply-chain platforms concentrate their effort, and where the honest evidence is thinnest.

Evidence

Named demand-forecasting and supply-chain ML deployments are real but their headline numbers are single-company self-reported and must be labelled as such. Sanofi's plai supply-chain platform (built with Aily Labs) is reported at roughly 80% accuracy for predicting low-inventory positions and around 65% for tracing a risk to its root cause — but the 80% figure traces to a June 2023 release and the 65% to a separate undated corporate page, and neither is independently verified (self-reported) [5]. Comparable supply-chain "control tower" programs are run by Merck KGaA (Darmstadt) with Aera Technology and Merck KGaA with Palantir (vendor/self-reported; note the deployment is Merck KGaA's, not Merck & Co.'s) [5]. A 2024 survey found roughly 65% of pharma supply-chain leaders have limited confidence in AI for disruption prediction — the honest counterweight to the vendor headlines [5]. Treat demand-forecasting ML as a real, value-generating (production) capability and every specific percentage as illustrative/self-reported, never as established fact.

Anatomy of one cold-chain shipment record

A cold-chain shipment, like every artifact in this series, is not a bare "arrived OK" — it is a structured record that ties the lot to its full thermal history, the deterministic stability verdict, the advisory risk scores, and the lineage back to the batch and forward to the dispensing site. Dissect one DP-001 shipment the way a distribution-quality reviewer would.

One cold-chain shipment, fully unpacked: the label claim and ICH Q1A shelf life it must protect, the deterministic stability core (cyan) — full trace, mean, MKT, hours out of band, and the in-claim verdict that is physics not ML — the advisory learned layer (violet) — predicted breach probability and lane-risk score, marked illustrative — the qualified packaging and its remaining hold margin, the chain-of-custody handoffs, the deterministic integrity verdicts, and the lineage tying the shipment to DP-001 and forward to the pharmacy where the patient finally receives the dose. Original diagram by the authors, created with AI assistance.

Read the card field by field, top to bottom, and the chapter is laid out as a record.

Header — shipment identity. A shipment_id, the lot (DP-001) and parent BATCH-2026-001, the lane (Frankfurt to São Paulo, air freight, carrier, season), and the planned versus actual transit times. This is the join key: the lane string is exactly what the lane-risk model is keyed on, and the lot is what carries the genealogy.
Label-claim block — the budget every later field is measured against. store 2 to 8 degC, the ICH Q1A(R2) shelf life and the resulting expiry date [1]. Everything below is scored against this claim; the claim is the contract, not a target.
Deterministic stability core (cyan, marked deterministic) — the heart of the card. A reference to the full temperature trace (every logger reading, e.g. 432 points at 10-minute cadence), the arithmetic mean_c, the max_c and min_c, the mkt_c, the hours_above_8C and hours_below_2C, and the in-claim-or-out excursion verdict — each marked physics, not ML, because this is the critical-decision logic a regulator reads and it must be reproducible on paper. These are the precise fields excursion_summary emits.
Learned layer (violet, marked advisory and illustrative). The predicted breach probability for this shipment (from the excursion-prediction model, had it run live) and the lane-risk score for this lane with its top feature contributions (hold-time margin, distance). Useful for triage, never the decider — and stamped illustrative because the underlying lane history is synthetic.
Qualified-packaging block — the field the lane-risk model leans on most. The shipper type (passive vs active-cooled), its validated rated_hold_time_h, and the remaining hold-time margin against the planned transit. This is shipper_hold_margin_h, the top-importance feature in the run output, and the physical mechanism behind it: once the coolant is spent, the payload tracks ambient.
Chain-of-custody block. The ordered handoffs (origin dock → air → customs → ground → pharmacy) with timestamps and dwell at each — the same n_handoffs the model counts, and the place tarmac and customs delays show up.
Integrity block (marked deterministic). The pass/fail verdicts that gate the shipment: within_label_claim, within_qualified_hold_time, and logger_continuous (no gaps — a silent logger dropout is itself a failure, because an unobserved leg cannot be cleared). Each is a rule, not a score.
Relationships panel (violet) — the lineage. This shipment derivedFrom DP-001, is reported-to the temperature-monitoring platform (Controlant/Sensitech/Tive class), and is verified-at the dispensing pharmacy — the final node in the running example's genealogy, where the molecule that began as a target reaches a patient — modeled as a graph in Book 4's genealogy spine, where this same derivedFrom edge roots every node back to the working cell bank.

From record to graph: the semantics that make the training set trustworthy

A field-by-field card is human-readable, but the lane-risk model does not learn from a card — it learns from a table, and a table is exactly where the meaning of those fields quietly leaks away. The shipment record above is, in plant reality, assembled from systems that disagree on how to name the very same fact, the same semantic-interoperability gap (different systems recording one physical fact in irreconcilable ways) the data book anchors on a bioreactor temperature: the temperature-monitoring platform names the trace one way, the ERP names the lot another, the warehouse-management system names the lane a third. The disciplined fix is the one the rest of this series uses — model the record against a shared vocabulary rather than a fragile column layout, so a feature is pulled by what it means, not by where it happens to sit in a CSV.

Three concrete benefits follow, and each closes a hole the panel-grade version of this chapter would otherwise leave open.

Features keyed by meaning, not by column name. The lane-risk model's strongest feature, shipper_hold_margin_h, is only as trustworthy as the guarantee that every row's value really is the remaining qualified hold time in hours and not, say, the rated hold time pulled from the wrong column after a vendor changed an export header. When the feature is bound to an ontology term — a property IRI with a fixed rdfs:range and a QUDT unit, hour, declared once — the training pipeline reads it by that IRI, and a units mismatch or a renamed column surfaces as a validation failure instead of a silently corrupted feature. This is the same affectsQuality/datatype-property discipline Book 4 builds for release attributes: a value you read is typed and unit-bearing, an edge you walk is typed at both ends.
The training data validated by the same gate as the release data. The deterministic excursion verdict is already a closed-world check — is the monomer-style required field present, singular, and in range? — and that is exactly the shape of a SHACL (Shapes Constraint Language) node shape, the release-gate mechanism Book 4 builds for the CQA panel. Running the same shape over the training set, not just over the live shipment, is what guarantees the model is trained on complete, in-range inputs: a shipment whose trace reference is missing, whose hours_above_8C is absent, or whose lane is untyped fails the shape and is excluded from training rather than feeding the model a half-observed row. The model inherits the data-completeness guarantee for free.
Lineage as the grouping key for honest validation. The section above insisted that a credible lane-risk model is validated leave-one-lane-out, never on a random split, because two shipments on one lane in one week are near-duplicates. That grouping key is a graph fact: the derivedFrom/reported-to/verified-at edges of the relationships panel are PROV-O-style provenance (the W3C provenance vocabulary, the bp:derivedFrom spine Book 4 roots every lot on), and grouping the cross-validation folds by the lineage edge — leave-one-lane-out, or leave-one-batch-out for a release model — is the difference between a leakage-free score and a flattering one. The ontology that makes the digital thread queryable is the same structure that makes the model's evaluation honest.

There is a deeper distinction underneath all three, and it is the one the panel's ontology lens asks for. A foundational ontology like BFO (Basic Formal Ontology, the upper ontology Book 4's spine sits on) splits the world into continuants — things that persist through time, like the DP-001 lot and the qualified shipper — and occurrents — things that happen, like the shipment itself and each logger reading. Keeping that split explicit is what stops the model from confusing a measurement (one reading at one instant, an occurrent) with the thing measured (the lot, a continuant): the trace is a series of occurrents about one continuant, the MKT is a deterministic function over that series, and the lane-risk score is a prediction about a future occurrent. Get that typing right in the graph and the join keys in the anatomy card stop being fragile strings and become edges a query can walk — the same knowledge-graph digital thread the open-source stack builds, now carrying the cold-chain leg.

Data integrity on the logger trace: ALCOA+ and the standards the record rides on

The temperature trace is not just a feature source — it is GMP/GDP evidence, and a regulator reads it against the ALCOA+ data-integrity principles (Attributable, Legible, Contemporaneous, Original, Accurate — plus Complete, Consistent, Enduring, Available) that govern any record a regulator may inspect. The integrity block's logger_continuous check is precisely the Complete principle made executable: a silent logger dropout means the record is no longer complete, so an unobserved leg cannot be cleared — the deterministic gate refuses it rather than assuming it was cold. The trace's per-reading timestamps are the Contemporaneous and Original principles (each reading recorded at the moment it happened, kept as captured rather than re-keyed); the shipment-to-lot binding is Attributable. This is the same Part 11 / Annex 11 electronic-records discipline the QC chapter holds the release data to, applied to the one leg of the process the manufacturer does not physically control — which is exactly why the integrity verdicts are deterministic rules, not learned scores.

The record also has to travel between those disagreeing systems without losing meaning, and that is a standards problem, not a modelling one. The shipment's manufacturing context — the parent BATCH-2026-001, the lot DP-001, the disposition — is an ISA-95 / B2MML payload (the manufacturing-operations standard and its XML serialization the data book grounds the batch record on), so the lane-risk model's join keys (lot, lane, BATCH-2026-001) are not invented column names but the same identifiers the MES and ERP already exchange. The cold-chain leg is one more event on that shared backbone, which is what lets a forecast keyed on mAb-A and a stability verdict keyed on DP-001 line up against the same master-data identity the genealogy spine carries — rather than two spreadsheets that happen to spell the lot the same way.

The unsolved part: forecasting a chain you do not control

Be honest about why the last mile resists machine learning more than any other step in this book. The first difficulty is observability. Once DP-001 leaves the loading dock it is in systems the manufacturer does not own — couriers, customs, third-party logistics, wholesalers — each with its own data, much of it arriving late, incomplete, or not at all. A model can only learn from the excursions it sees, and the most dangerous ones (a logger that failed silently, a leg with no data) are precisely the ones it cannot. This is not random missingness — it is informative missingness: data tends to go dark exactly on the chaotic legs where breaches happen, so the gaps correlate with the label, which biases any model trained on the visible subset toward optimism. Unlike a bioreactor, where every second is captured in the historian, the cold chain is a partial-observation problem, and a model trained on the visible excursions systematically under-counts the invisible ones. This is why the integrity block's logger_continuous check is deterministic and gating: an unobserved leg is treated as a failure rather than silently assumed clean.

The second difficulty is rare, consequential, and partly non-stationary. Real product-impacting excursions are rare, so the labelled positive class is tiny — the same small-data ceiling that constrains the rest of bioprocess ML, made worse here because the events that matter most are the rarest, and a single bad lane can dominate the entire positive class so that a model "learns" one route rather than a generalisable risk. And the chain is non-stationary in ways no historical model anticipates: a new carrier, a rerouted lane, a heatwave outside the training distribution, a pandemic that breaks every demand pattern at once. A lane-risk model is a snapshot of a chain that is always changing, and its decay can be both fast and silent — the MLOps discipline of monitoring for the population shift (the same PSI/drift checks drift.py runs on a process signal) and retraining under change control is not optional here, it is the only thing standing between a confident model and a stale one. The split between physics and learning is the safety net: when the lane-risk model goes stale, it scores the wrong lanes as risky — wasteful, but not unsafe, because the deterministic MKT core still decides every actual release.

The third difficulty is the demand-forecasting accountability gap. A forecast is a decision under uncertainty whose consequences — a stockout that denies a patient a dose, or a write-off of expired product — land months later and far from the model, and they are asymmetric (a stockout of a life-sustaining biologic is not the dollar-equivalent of a write-off), which a symmetric error metric like RMSE quietly ignores. The bullwhip effect means the very data the model is trained on (order history) is a distorted shadow of the demand it is trying to predict, and the cleanest signal (point-of-dispense consumption) is the hardest to get. The result is that demand-forecasting ML genuinely adds value at the margin but is structurally incapable of the accuracy its vendors' headline numbers imply — which is why this chapter labels every one of those numbers illustrative and self-reported, and why the honest framing is decision support under deep uncertainty, not prediction.

What this chapter adds to the model suite

This chapter contributes examples/platform/ml/coldchain.py to the Book 5 example suite: a standalone module with two deliberately separated layers. The deterministic stability core computes the Mean Kinetic Temperature (with the ICH Q1A Ea = 83.144 kJ/mol convention) and accounts a shipment trace against the 2-to-8 degC label claim — the MKT arithmetic is exact chemistry, asserted to exceed the arithmetic mean and to flag the synthesised warm excursion. The learned advisory layer trains a gradient-boosted lane-risk classifier on a synthetic lane history and scores the breach probability of a future shipment, asserted to be predictive (ROC-AUC > 0.8 on the held-out split) and to rank a hot, long, multi-leg lane riskier than a short, cool one. The module makes the chapter's central architectural point executable: the physics core decides, the learned layer advises. It coordinates with — and does not duplicate — the drift module (drift.py), which supplies the monitoring discipline a deployed lane-risk model would need against the non-stationarity above, and the serialization anomaly module (serialization_anomaly.py), which handles the integrity of the supply-chain event stream while this module handles its thermal and demand risk. The shipment trace and lane history are clearly labelled synthetic/illustrative because no logistics dataset ships with the series; the running example's identity (DP-001, BATCH-2026-001, the 2-to-8 degC claim) and the MKT chemistry are real.

Why it matters

Distribution is where the entire investment of the manufacturing spine is either preserved or thrown away in the open air. A biologic that passed every release test in QC is worthless if it spends an afternoon above its label claim, and the difference between a shipment that is fine and one that is ruined is often invisible to the naked eye and to the arithmetic mean — which is exactly why the Mean Kinetic Temperature, a piece of fixed chemistry rather than a clever model, is the load-bearing tool here. The learned layers add genuine value at the margins: lane-risk scoring sends scarce premium packaging where breaches actually happen, excursion prediction buys time to intervene, and demand forecasting keeps doses flowing without write-offs. But the discipline this chapter insists on — the deterministic MKT/excursion core decides the release-relevant verdict, the learned models only score risk around it — is the same architecture the regulators are converging on, and it is what lets a manufacturer deploy ML in the one stretch of the process it does not control without putting a probabilistic model in the path of a patient's dose.

In the real world

The distribution leg itself — everything after the product leaves the manufacturer's physical control — is governed by Good Distribution Practice (GDP), the distribution-side counterpart to manufacturing's GMP, whose temperature-control and excursion-handling expectations the deterministic MKT/excursion verdict is built to satisfy (Book 1 develops the global cold chain and GDP in depth). Cold-chain monitoring itself is fully (production): every regulated biologic shipment carries a temperature logger, and platforms from Controlant, Sensitech, Tive, and others collect that data at scale, with MKT and excursion accounting computed against the label claim as standard, USP-defined practice [1][2]. The predictive layer on top — excursion prediction and lane-risk scoring — is more (pilot) than productized: the data exists and the models are credible, but validated, deployed lane-risk prediction is an applied capability vendors are adding rather than a settled standard, and the published numbers are vendor/self-reported.

Demand forecasting and supply-chain analytics are a real (production) value source with honestly soft evidence. Sanofi's plai (with Aily Labs) is the most-named example — roughly 80% accuracy on low-inventory prediction and around 65% on risk-to-root-cause, both single-company self-reported and traceable to different, partly undated sources [5]; Merck KGaA with Aera Technology and Merck KGaA with Palantir run comparable supply-chain control-tower programs (vendor/self-reported). The honest counterweight is the survey finding that around 65% of pharma supply-chain leaders have limited confidence in AI for disruption prediction [5] — and the broader frame this whole book keeps returning to: the ISPE Pharma 4.0 reality is that production ML clusters in monitoring and human-in-the-loop decision support, not autonomous control. A cold-chain risk score or a demand forecast is squarely advisory under the FDA's 2023 Artificial Intelligence in Drug Manufacturing discussion paper and the draft Annex 22, which keeps the deterministic stability verdict — not a learned model — as the thing that decides whether a dose is fit for a patient [3][4].

Key terms

Cold chain — the temperature-controlled distribution path that keeps a biologic within its label storage claim (here 2 to 8 degC) from fill-finish to the patient.
Stability budget — the degradation allowance implied by an ICH Q1A shelf life, which assumes the product stays in its label claim; warm excursions draw it down faster than the average suggests, and the spend is irreversible.
Mean Kinetic Temperature (MKT) — the single constant temperature that would cause the same Arrhenius-weighted chemical degradation as a fluctuating thermal history; a fixed equation, always at or above the arithmetic mean, not a machine-learning model.
Arrhenius equation — the physical law that reaction rate scales as k = A·exp(-Ea/(R·T)), the basis for weighting warm excursions more heavily than the arithmetic mean would; the convexity of the exponential is what creates the asymmetry.
Activation energy (Ea) — the temperature-sensitivity parameter of the Arrhenius rate; fixed by convention at 83.144 kJ/mol for MKT so the number is reproducible rather than a tuning knob.
Excursion — any period a shipment spends outside its label claim; accounted deterministically as hours above 8 degC or below 2 degC.
Freeze-concentration — the cold-side failure mechanism the MKT lens misses: as ice forms, the protein and buffer concentrate into the shrinking unfrozen phase, which can drive a pH shift (one buffer salt crystallises ahead of the other) and ice-interface/cold-denaturation aggregation; needs minimum temperature, time-below-freezing, and freeze/thaw cycle count as features, not a single MKT.
Excursion prediction — forecasting that a shipment will breach its label claim before the logger proves it, from the live trace and shipment context; framed as rolling-window classification or a time-to-event/survival model; advisory, not a release decision.
Lane — a route-plus-mode-plus-carrier combination; the unit at which risk-scoring is most cost-effective.
Lane-risk scoring — a supervised classifier (typically gradient-boosted trees) ranking lanes by future-breach probability so scarce premium packaging and audits go where breaches actually happen; validated with time-blocked or leave-one-lane-out splits and calibrated probabilities.
ROC-AUC / AUPRC — two scores for a yes/no classifier. ROC-AUC summarises how well the model ranks breaches above non-breaches; AUPRC (precision–recall) is the more honest score when breaches are rare, because ROC-AUC can look flatteringly high simply because almost every shipment is correctly called a non-breach.
Qualified shipper hold time — the validated number of hours a passive or active container holds 2 to 8 degC; the remaining margin against planned transit is the strongest single breach predictor.
Demand forecasting — predicting how many doses to make and position, far enough ahead to meet the long biomanufacturing lead time without stockouts or expiry write-offs; baselined by ARIMA/ETS and extended by global deep sequence models that forecast many series jointly and probabilistically.
Bullwhip effect — the amplification of small demand fluctuations into large factory-order swings as each supply-chain tier adds safety stock; the reason fitting order history fits distortion, not demand.
Deterministic-core / advisory-layer split — the cold-chain analogue of the packaging chapter's rules-first design: MKT and excursion accounting decide the release-relevant verdict, learned models only score risk.
Semantic feature binding — pulling a model feature (e.g. shipper_hold_margin_h) by its ontology property IRI and declared unit rather than by a fragile column name, so a renamed header or a units mismatch surfaces as a validation failure instead of a silently corrupted input; the same SHACL shape that gates release data can validate the training set is complete and in range.
Lineage grouping key — the PROV-O/derivedFrom provenance edge used as the cross-validation grouping for leave-one-lane-out (or leave-one-batch-out) validation, so near-duplicate shipments on one lane cannot leak between train and test; the digital-thread graph that makes lineage queryable is what makes the evaluation honest.
ALCOA+ / Part 11 — the data-integrity principles (Attributable, Legible, Contemporaneous, Original, Accurate, plus Complete, Consistent, Enduring, Available) and the electronic-records rule the logger trace is held to as GMP/GDP evidence; the deterministic logger_continuous check is the Complete principle made executable.

Where this leads

The manufacturing spine is complete — DP-001 has reached a patient, its label claim intact, the last node of the genealogy closed. Every chapter from here on steps back from a single unit operation to look at the whole system. The next chapter, Hybrid Models and Digital Twins: The Dominant Paradigm, gathers the thread that has run quietly through the entire book — the union of mechanistic physics and learned components that has been winning at every step, from the bioreactor soft sensor to the chromatography twin to the very MKT-plus-classifier split this chapter just built — and names it as the dominant practical paradigm of ML in biomanufacturing.

What this chapter covers​

The stability budget: what a shipment actually spends​

Mean Kinetic Temperature: the deterministic physics core​

The cold side of the budget: freeze-concentration​

Temperature-excursion prediction: catching the breach before the logger proves it​

Lane-risk scoring: where the learned model actually pays off​

A runnable model: coldchain.py​

Demand forecasting: making the right number of doses​

Anatomy of one cold-chain shipment record​

From record to graph: the semantics that make the training set trustworthy​

Data integrity on the logger trace: ALCOA+ and the standards the record rides on​

The unsolved part: forecasting a chain you do not control​

What this chapter adds to the model suite​

Why it matters​

In the real world​

Key terms​

Where this leads​