본문으로 건너뛰기

The Learning Problem: Why Bioprocess Breaks the Data-Science Rulebook

📍 Where we are: Part I · Foundations of Learning in Bioprocess — Chapter 1, the first chapter after the preface. Books 1 through 4 built the process, its data, its open-source plumbing, and its knowledge graph. Book 5 turns the last lens on the same spine — learning — and the honest place to start is not a model but a warning: the textbook you learned data science from was written for a world that bioprocess does not live in.

A machine-learning textbook makes a quiet promise on its first page: you have a large dataset of independent examples, drawn from a stable distribution, and your job is to fit a function that generalizes from them. Almost every theorem, every train/test convention, every "just collect more data" instinct rests on that promise. A biologics manufacturing process keeps almost none of it. The examples are batches, and there are dozens of them, not millions. They are not independent — sister batches share a cell bank, a media lot, an operator. The distribution is not stable — the cells drift, the process ages, the raw materials change supplier. And the ground truth you want to learn arrives from a bench assay once or twice a day, weeks at a time, at a cost that makes every label precious. This chapter is about what happens when you take the rulebook into that room — and why the response, worked out over the rest of the book, is hybrid modeling and disciplined data work, not bigger networks.

The simple version

Imagine learning to forecast the weather, but with three catches. First, you only get a handful of complete days to learn from — not decades of records, just a few dozen. Second, the true temperature is only measured twice a day, by hand, so most of the time you are guessing in between. Third, the climate itself keeps shifting under you, so a rule that held last month quietly stops working. A data-science textbook assumes the opposite of all three: oceans of data, cheap ground truth, a fixed climate. Bioprocess is the hard version. That is not a reason to give up on learning — it is the reason the kind of learning that wins here looks different from what wins on internet-scale data.

What this chapter covers

  • The demo-versus-routine gap: why a model that dazzles in a conference talk so rarely becomes a system that runs every shift in a GMP plant — and why that gap is structural, not a maturity problem that time alone fixes.
  • The learning taxonomy, mapped to bioprocess: supervised, unsupervised, and reinforcement learning, each pinned to the real tasks it does — soft sensing, anomaly detection, advanced control, vision inspection.
  • Why living systems break the rulebook: the five assumptions textbook ML makes that a bioreactor violates — the small-data ceiling, the cold-start cadence, run-to-run variability, non-stationarity, and the asymmetric cost of a wrong call.
  • The maturity ladder: production, pilot, and research — and why naming which rung a claim sits on is half of thinking clearly about this field.
  • The evidence-tier convention: the four-rung scale this book attaches to every external number, so a marketing headline and a verified result never get read the same way.

The gap between the demo and the plant floor

The defining fact about machine learning in biomanufacturing is the distance between what gets shown and what gets run. Industry surveys make the gap quantitative: the ISPE 7th Pharma 4.0 survey finds AI/ML carrying the most pilots and the fewest scaled implementations of any digital technology it tracks, a "pilots" category that stays high and stubbornly does not graduate [1]. McKinsey's State of AI reports the same shape one level up: roughly 88% of organizations use AI somewhere, while only about 6% report enterprise-wide impact [2]. The demo is everywhere; the deployment is rare.

It is tempting to read that as immaturity — give it a few years and the pilots will scale. This book argues the opposite, and the closing verdict makes the case in full: the gap is structural. A demo succeeds under conditions a plant cannot grant. It runs on a curated dataset, on retrospective data, with the modeler in the loop, and with no consequence if it is wrong on a Tuesday. Routine GMP grants none of those: the data arrives messy and late, the model must run unattended, and a wrong call can scrap a batch worth a fortune or, worse, let a bad one through. The applications that have crossed into production — multivariate monitoring, Raman soft sensors, vision inspection of vials, review-by-exception — share a family resemblance: they infer or monitor rather than autonomously decide, and they sit inside a human-supervised loop. Nothing on the production list autonomously adjusts a critical quality attribute, because the things that make a demo easy are exactly the things GMP removes.

It is worth being precise about what a demo quietly assumes that a plant takes away, because each removed assumption maps to a later chapter. A demo assumes the data is already assembled — but in a plant it lives in silos that do not agree on a batch ID, the readiness problem the next chapter spends itself on. A demo assumes the validation is informal — but a GMP model must be locked, version-controlled, and governed by a predetermined change-control plan, the burden the MLOps and regulatory chapters carry. A demo assumes the modeler will notice when it breaks — but a deployed soft sensor must detect its own drift against a reference that arrives twice a day. And a demo assumes being wrong is free — but here a false negative can reach a patient. Strip those four assumptions out and a model that aced the demo has four new, unglamorous jobs to do before it earns a place on the floor. That is the work the rest of Book 5 is about.

So the right posture for this whole book is neither boosterism nor dismissal. It is calibration: knowing which rung a given claim sits on, and why the rung above it is hard to reach. The rest of this chapter builds the vocabulary for that — first the taxonomy of what learning even is here, then the five reasons the rungs are far apart, then the two ladders (maturity and evidence) we grade every claim on.

The learning taxonomy, mapped to real bioprocess tasks

"AI" in a vendor deck is a fog. The first discipline is to dissolve it into the three classical families of machine learning and pin each to the bioprocess task it actually does. The taxonomy is not academic decoration — which family a problem belongs to dictates what data it needs, how it must be validated, and how close to the critical path the regulators will let it get.

Supervised learning fits a function from labeled examples — inputs paired with the known answer. It splits by what the answer looks like. Regression predicts a continuous number, and this is the home of the soft sensor: a model that turns an in-line Raman or near-infrared spectrum into a glucose, lactate, or titer reading every minute or two, so an expensive bench measurement can be inferred continuously between samples. The golden run BATCH-2026-001 carries an end-of-process SEC monomer of 98.611%; a soft sensor's job is to estimate quantities like that during the run, not only at the end. Classification predicts a discrete label, and its strongest production form is vision inspection: a convolutional model that looks at a filled vial or syringe and calls it pass or reject for particulates, cracks, and fill defects. A model that flags BATCH-2026-004 — out-of-specification on host-cell protein at 128 ng/mg against a spec ceiling of 100 — as a probable OOS before the assay confirms it is classification too.

Unsupervised learning has no labeled answer; it learns the shape of normal and reports departures from it. In bioprocess this is multivariate statistical process control (MSPC) and anomaly detection: PCA and PLS models that fingerprint the whole multivariate trajectory of a healthy "golden" batch and raise a flag when a new run drifts outside the envelope — without ever being told in advance what a fault looks like. It is the most thoroughly deployed learning method in the industry precisely because it needs no scarce labels: a library of good batches is enough to define normal.

Reinforcement learning (RL) learns a control policy by trial and feedback — and this is where the rulebook bites hardest. Advanced process control in bioprocess is dominated not by pure RL but by model-predictive control (MPC), which optimizes setpoints against a process model over a rolling horizon; RL and MPC blur together in the research literature on closed-loop control of feeds and gas. The reason pure RL is rare here is exactly the reason this chapter exists: RL is famously data-hungry, learning from millions of trials, and a bioreactor offers a few dozen runs that cost weeks each. You cannot let an RL agent ruin ten thousand batches to learn a feed policy. So control here leans on models that encode physics, and learns only the thin residual that physics cannot write down — the hybrid pattern this book keeps returning to.

Two families sit in the gaps and matter precisely because of the cold-start cadence below. Semi-supervised and self-supervised learning try to exploit the flood of unlabeled spectra to make the trickle of labels go further — learning the structure of the data from the cheap signal, then fitting the expensive target with far fewer examples. They are not a fourth kind of learning so much as a coping strategy for the exact scarcity this chapter is about, and the same can be said of transfer learning (carrying a calibration from one product to a related one) and hybrid modeling (letting a mechanistic model carry the structure so the learned part has little left to fit). Every one of these is, at bottom, a way to spend fewer labels — which is why they keep winning here and why the hybrid chapter treats hybrid modeling as the field's default rather than an exotic option.

The taxonomy alone already explains the production list's shape. The deployed applications are supervised regression (soft sensors), supervised classification (vision), and unsupervised monitoring (MSPC) — families that tolerate small, labeled-or-unlabeled data and sit safely beside a human. The family that would autonomously decide — reinforcement learning in a critical loop — is the one the data and the regulators both hold back. The running example threads through all of them: the same golden run BATCH-2026-001 supplies a regression target (its titer trajectory), a monitoring envelope (its healthy multivariate fingerprint), and a control reference (its feed schedule), while its OOS sibling BATCH-2026-004 supplies the classification problem (predict the host-cell-protein failure before the assay returns). One process, one genealogy, every learning family — which is exactly how the example suite is built.

Hero diagram mapping the three machine-learning families onto bioprocess tasks across the manufacturing spine. A central column lists supervised, unsupervised, and reinforcement learning. Supervised branches into regression, labeled soft sensing of titer and glucose from a Raman spectrum, drawn as a sparkline feeding a predicted number, and classification, labeled vision inspection of a filled vial as pass or reject and OOS prediction of host-cell protein. Unsupervised points at multivariate statistical process control and anomaly detection, drawn as a golden-batch envelope with one trajectory drifting outside it. Reinforcement learning points at advanced process control and model-predictive control of feeds, drawn faded with a caption noting it is data-hungry and mostly research and pilot in bioprocess. A footer band labels each task with its production maturity: soft sensing, vision, and MSPC marked production in green, advanced control marked pilot in violet. The three learning families, pinned to what they actually do in a plant: supervised regression is the soft sensor, supervised classification is vision inspection and OOS prediction, unsupervised learning is golden-batch monitoring and anomaly detection, and reinforcement learning is advanced control — the one family the cold-start data regime and GMP both keep at arm's length from the critical path. Original diagram by the authors, created with AI assistance.

Why a random split lies, in eight lines

Before the deeper argument, one concrete demonstration that the rulebook's defaults fail here — because it is the single mistake that most often turns a real result into a fiction. The textbook reflex is to shuffle your rows and split them 70/30 into train and test. In bioprocess that reflex leaks: two Raman spectra taken an hour apart in the same batch are near-duplicates, so scattering them across the train/test line lets the model see the answer. The shared loader examples/platform/ml/dataio.py ships both splits side by side — a deliberately leaky random_split, kept only to expose the inflated number, and the honest batch_split that holds out whole batches.

# examples/platform/ml/dataio.py — the leaky split and the honest one, side by side
import numpy as np, pandas as pd

def random_split(df, frac_train=0.7, seed=2026):
"""A deliberately leaky ROW split, kept ONLY to demonstrate the inflated metric.
Near-duplicate within-batch neighbours land on both sides of the line."""
rng = np.random.default_rng(seed)
idx = rng.permutation(len(df))
cut = int(len(df) * frac_train)
return df.iloc[idx[:cut]], df.iloc[idx[cut:]]

def batch_split(df, batch_col, test_batches):
"""Hold out WHOLE batches — the only split that estimates performance on a
genuinely unseen run. The test batches were never seen during fit."""
is_test = df[batch_col].isin(set(test_batches))
return df[~is_test], df[is_test]

Fitting the same soft sensor under each split shows the comfortable lie next to the honest number:

loaded raman_spectra.parquet: 336 spectra x 701 wavenumbers, 6 batches

[random ROW split] train 235 / test 101 R2 = 0.992 <- LEAKED, do not trust
[batch GROUPED split] test batches {BATCH-2026-004, BATCH-2026-006}
train 224 / test 112 R2 = 0.949 <- honest, held-out batches

held-out batches were never seen during fit; the gap between 0.992 and 0.949
is the size of the lie a row split tells. In real Raman it is far larger.

Both numbers are high because the simulated spectra carry the titer signal cleanly, but the row-wise number is high for the wrong reason — it is measuring memorization of within-batch neighbours, not skill on a new run. The honest number, with BATCH-2026-004 (the OOS sibling) and BATCH-2026-006 held out as truly unseen, is the only one you could defend to a reviewer. The next chapter makes batch-grouped splitting the default the whole example suite is built on; here it is the first proof that bioprocess punishes a textbook habit.

The small-data ceiling, and four more broken assumptions

The row-split trap is a symptom. The disease is that a living process under GMP violates the load-bearing assumptions of textbook ML — five of them, each turning a "just do the standard thing" instinct into a failure mode.

1 — The small-data ceiling: you learn from dozens of runs, not millions. A batch costs weeks of occupancy and a fortune in media, cells, and labor. A campaign yields a handful of runs; a year yields dozens. This is the binding constraint of the field, and it inverts the textbook's central instinct. Where internet-scale ML answers every problem with "more data and a bigger model," bioprocess cannot — the data grows by ones, slowly, at enormous cost. Pure data-hungry models starve or overfit in this regime. The methods that win — hybrid models with a mechanistic backbone, transfer learning, Bayesian priors — are all, at bottom, ways to need fewer examples. This is why the hybrid-modeling chapter is the load-bearing one for the whole book: physics does the work that data cannot do on a few dozen runs.

2 — The cold-start cadence: ground truth arrives once or twice a day. The asymmetry is brutal and exact. The historian records online tags and the Raman probe every few seconds — thousands of cheap, fast points per batch. But the reference measurement, the actual ground truth for titer, metabolites, and viability, comes from a bench assay sampled roughly twice a day — about 28 times across a 14-day batch — and the release CQAs exactly once, at the end. The features are a flood; the labels are a trickle. The bioprocess ML literature calls this the cold-start problem, and it reshapes everything: the scarce resource is labels, not data, so a million Raman points from one batch is still one batch's worth of information about how the process behaves run-to-run. Confusing rows with information is the same error as the row split, in different clothes. The next chapter names this cadence the constraint that "no model can outrun."

3 — Run-to-run variability: the examples are not independent or identically distributed. Textbook ML assumes examples drawn i.i.d. from one distribution. Batches are neither independent nor identical. Sister runs share a cell bank, a media lot, an operator, a vessel — so they are correlated, not independent. And biological variability means two runs of the same recipe land in measurably different places: the reviews report that run-to-run variability "severely compromises transferability," so a model calibrated on one campaign can degrade on the next even with nothing obviously changed. This is why the honest split holds out whole batches, and why "it worked on our six batches" is a far weaker claim than the same sentence about six thousand independent samples would be.

4 — Non-stationarity: the process moves under the model. The textbook's fixed distribution does not exist here. Cells drift over passages; chromatography resin ages over cycles; raw-material lots change supplier; the process itself is tuned. A soft sensor calibrated this quarter can decay next quarter — model decay is fast and is the rule, not the exception. Worse, because ground truth is the cold-start trickle, drift is detected late: a sensor that began drifting at breakfast is not provably wrong until the evening reference comes back, so the drift flag is by construction a lagging indicator. A model here is never "done"; it is a thing you must distrust on a schedule, which is why the MLOps chapter treats monitoring and a predetermined relearning plan as part of the model, not an afterthought.

5 — The cost of a wrong call is asymmetric and large. In most ML settings a misprediction costs a click or a recommendation. Under GMP it can scrap a batch worth a fortune, or — the failure that actually matters — let a bad batch through to a patient. That asymmetry changes the math of acceptability: a model is not judged on average accuracy but on its behavior in the tail, and a confusion matrix cannot capture the cost of being confidently wrong about a medicine. It is also why the regulators fence learning models out of the critical path: a model that keeps learning is a moving target that traditional one-time validation was never built for, so the industry has converged on lock-then-relearn — freeze the model at validation, govern every update by a predetermined change-control plan. The wrong-call cost is the reason the demo-to-plant gap is a cliff, not a ramp.

These five are not a list of independent gripes; they interlock. Small data (1) is why pure learning stalls and hybrid wins. The cold-start cadence (2) is why labels, not features, are the scarce resource and why drift (4) is caught late. Run-to-run variability (3) is why the held-out unit must be the whole batch. And the wrong-call cost (5) is why the validation-versus-learning tension is unresolvable by engineering alone. Together they are one explanation for the gap the surveys keep measuring between what ML can demonstrate and what it is allowed — and able — to do in routine GMP.

Anatomy of the soft-sensor learning problem

To make the abstractions concrete, unpack the single most representative learning problem in upstream bioprocess: the titer soft sensor, a supervised regression that infers a continuous quantity from a spectrum. Laying it out field by field shows every assumption above in one frame — what the model gets, what it must produce, and exactly where the rulebook bends.

Anatomy identity card unpacking the titer soft-sensor learning problem. An indigo header reads learning problem, supervised regression, titer soft sensor, source BATCH-2026-001. A feature block lists the input: a 701-channel Raman spectrum wn_400 to wn_1800 shown as a sparkline, plus the aligned online state temperature, pH, and dissolved oxygen, all cheap and available every few minutes, marked thousands of rows per batch. A green target block holds the offline reference titer_g_L from the bench, marked expensive and sampled only twice per day, 28 times per batch, with a note that this is the cold-start scarcity. A cyan grouping block highlights the batch_id key that decides train versus test and warns that a row split leaks within-batch neighbours. A rose constraints block lists the five broken assumptions: small data of six batches not millions, cold-start cadence, run-to-run variability so batches are not i.i.d., non-stationarity and fast model decay, and the asymmetric cost of a wrong call under GMP. A violet maturity-and-evidence footer marks the soft sensor as production maturity for glucose, lactate, and titer, and notes viable cell density has no clean Raman signal. The titer soft sensor, fully unpacked: cheap fast features (a Raman spectrum plus online state) on one side, the scarce twice-a-day reference label on the other, the batch-id key that alone decides an honest split, and the five broken assumptions — small data, cold start, non-i.i.d. batches, non-stationarity, and asymmetric cost — that make this an ordinary-looking regression with an extraordinary set of caveats. Original diagram by the authors, created with AI assistance.

Read the card top to bottom and the whole chapter is laid out as fields. The features are the flood — a 701-channel spectrum and the online state, available continuously and nearly free. The target is the trickle — an offline titer that exists only because someone pulled a sample and ran a bench assay, one of about twenty-eight in the entire batch. The group key, batch_id, is the quiet field that decides everything: get the split wrong and the reported skill is fiction. And the constraints block carries the five broken assumptions as standing caveats, not footnotes. One subtlety the card names honestly: glucose, lactate, and titer have usable spectral signatures and their soft sensors are genuinely production, but viable cell density has no clean Raman signal, so VCD soft sensing remains the persistent weak spot — a reminder that "soft sensor" is not one solved thing but a family with very different maturity per analyte.

The unsolved part: whether the data ceiling ever lifts

The honest open question is not whether any single one of the five tensions can be eased — several are being chipped at — but whether the small-data ceiling itself can ever be escaped. The candidate escape routes are real but unproven. Foundation and bioprocess time-series models promise to amortize learning across many processes so a new product starts from a strong prior instead of a cold start; today they are aspiration, not product, and it is genuinely uncertain whether enough comparable, shareable bioprocess data will ever exist to train them. Federated learning offers a way to pool data across companies without sharing it, but it has not crossed from discovery into manufacturing, where the data is more guarded and more heterogeneous still.

And there is a deeper limit worth stating now, because it shadows the entire book: even if the data ceiling lifted, the regulatory ceiling might not. The draft EU GMP Annex 22 would permit only static, deterministic models in critical applications and explicitly exclude dynamic, continuously-learning, probabilistic, and generative AI. The binding constraint on autonomous bioprocessing may turn out to be not what a model can learn but what we are willing to let an unsupervised model decide about a human medicine. That is not a problem more data solves — it is a question about trust and accountability, and it is, rightly, unresolved. This chapter only names it; the verdict settles where it can.

The two ladders: maturity and evidence

Because the field's central problem is the gap between what is shown and what is real, this book grades every claim on two independent ladders, and conflating them is the most common error in reading this literature.

Maturity answers "how far has it gotten?" — a three-rung ladder:

  • (production) — running in a GMP or commercial plant, touching real material and real decisions. The short, solid list: MSPC monitoring, Raman soft sensing of glucose and titer, vision inspection of vials, mechanistic chromatography modeling, review-by-exception execution.
  • (pilot) — demonstrated at or near manufacturing scale, often peer-reviewed, but not standing in routine GMP use. Hybrid digital twins, model-predictive control of capture, Bayesian-optimization process development.
  • (research) — academic or early-stage, not yet at scale.

This book tags applications with that rung inline — (production), (pilot), (research) — so a reader always knows how far a technique has actually traveled.

Evidence tier answers a different question — "how good is the evidence?" — a four-rung ladder this book attaches to every external number:

  • peer-reviewed-independent — published and verified by someone other than the builder. This is the fact floor: only at or above it may a number be stated as established fact.
  • peer-reviewed-self-authored — published, but by the team that built it.
  • vendor-self-reported — a company's own disclosed figure, unverified.
  • press-release-only — a single headline, no method.

The two ladders are independent. Automated visual inspection is production maturity but only vendor-self-reported tier (Amgen's "roughly 95% of syringes and vials auto-released" is a real deployment but a self-reported number). A peer-reviewed hybrid-modeling result can be only pilot maturity yet reach peer-reviewed-self-authored tier. You need both rungs to know what to do with a claim. The discipline this book follows without exception: never quote a number without its tier in the same sentence, and treat any efficiency headline below the fact floor as illustrative, not fact. The closing chapter makes this runnable — a structured ledger of named deployments where, of the field's sixteen most-cited results, zero clear the independent fact floor. That is not cynicism; it is the quantified shape of the self-reporting problem, and the single habit that most separates a careful reader from a credulous one.

What this chapter adds to the model suite

Each chapter of Book 5 contributes runnable code to examples/platform/ml/, and this opening chapter lays the cornerstone the rest stands on:

  • examples/platform/ml/dataio.py — the shared data layer and, crucially, the leakage-aware split helpers introduced above. It loads the trilogy's committed datasets (raman_spectra.parquet, fedbatch_state.parquet, offline_assays.csv, hplc_results.csv) keyed on batch identity, and exposes three splits with their honesty made explicit in the names: batch_split (hold out whole batches — the honest default), temporal_split (extrapolate forward in a single batch's timeline), and random_split (kept only to demonstrate the inflated metric). The contrast random_split versus batch_split is this chapter's lesson compiled into an API: the next chapter formalizes it, and every later model imports dataio so that the leak-free split is the path of least resistance rather than a discipline anyone has to remember.

Why it matters

Everything in this book depends on getting the framing of this chapter right, because the most common way a bioprocess ML project dies is not a bad algorithm — it is a good algorithm applied as if the textbook's promises held. A team that splits its rows at random reports a fantasy R², files it, and watches the model collapse on the seventh batch. A team that reaches for an autonomous twin before its monitoring layer is solid builds on sand. A team that quotes a vendor's titer headline as fact loses its own credibility when the number cannot be reproduced. The antidote is the calibration this chapter installs: know which learning family a task belongs to, know which of the five assumptions it breaks, know which maturity rung it sits on, and know which evidence tier a number carries. None of that is a model architecture. All of it is the judgment that decides whether anything you build is real.

In the real world

The demo-to-plant gap is the most consistently measured finding in the field. The ISPE 7th Pharma 4.0 survey puts AI/ML at the most pilots and fewest scaled deployments of any digital technology, with the production deployments clustering exactly in monitoring, predictive maintenance, vision inspection, and human-in-the-loop documentation — never autonomous control of a CQA [1]. McKinsey's State of AI finds the same shape across industries: near-universal adoption, a sliver of enterprise-wide impact [2]. And the bioprocess ML reviews single out the small-data / cold-start regime and data leakage from improper validation as the two technical reasons reported successes so often fail to reproduce or transfer [3] — which is precisely the random-split lie this chapter opened with, named by the literature as a field-wide failure mode rather than a beginner's slip. The regulatory scaffolding converges on the same reading: the FDA's 2023 discussion paper on AI in drug manufacturing and the draft EU Annex 22 both keep learning models out of the critical path until they can be validated like the regulated objects they are [4]. The honest one-sentence summary the rest of the book unpacks: ML in biomanufacturing is production-grade for seeing and inferring, pilot-grade for optimizing, and deliberately fenced out of autonomously deciding — and the fence is there on purpose.

Key terms

  • Demo-to-plant gap — the structural distance between a model that performs in a curated demonstration and one that runs unattended every shift under GMP; measured by surveys as AI/ML's most-pilots, fewest-scaled profile.
  • Supervised learning — fitting a function from labeled examples; regression for continuous targets (the soft sensor), classification for discrete labels (vision inspection, OOS prediction).
  • Unsupervised learning — learning the shape of "normal" without labels; in bioprocess, MSPC golden-batch monitoring and anomaly detection.
  • Reinforcement learning / MPC — learning or optimizing a control policy by feedback; data-hungry, so mostly research/pilot in bioprocess, where physics-based MPC dominates the critical path.
  • Soft sensor — a regression model that infers an expensive offline quantity (titer, glucose, lactate) from a cheap in-line signal (a Raman spectrum) between reference samples.
  • Small-data ceiling — the binding constraint of bioprocess ML: dozens of costly runs to learn from, not millions, which is why hybrid modeling and priors beat black boxes.
  • Cold start — the once-or-twice-a-day cadence of offline reference measurements that limits how fast a model can learn and how late drift is detected.
  • Run-to-run variability — the biological non-independence and non-identity of batches that breaks the i.i.d. assumption and forces whole-batch held-out splits.
  • Non-stationarity / model decay — the process moving under the model (cell drift, resin aging, lot changes), so a model must be distrusted on a schedule, not validated once.
  • Data leakage (random-split trap) — reporting an inflated metric because near-duplicate within-batch neighbours fall on both sides of a row-wise split; the field's most common validation error.
  • Maturity ladder — production / pilot / research: how far a deployment has actually gotten.
  • Evidence tier — press-release-only / vendor-self-reported / peer-reviewed-self-authored / peer-reviewed-independent; the last is the fact floor at or above which a number may be stated as fact.

Where this leads

We have the frame: what learning is here, why the living process breaks the rulebook in five specific ways, and the two ladders we will grade every claim on. The binding constraint, the chapter has argued, is not the algorithm but the data — its scarcity, its cadence, its readiness. The next chapter, Data, the Fuel, goes one level down into exactly that: how to turn a real, messy bioprocess data estate — historian streams, sparse offline assays, hybrid paper-and-digital records — into the leak-free fuel a model can actually burn, building the dataio.py foundation this chapter previewed into the data layer the whole suite stands on. Fix the data first; the engine comes after.