Skip to main content

Process Development: Bayesian Optimization Beats the Factorial Grid

📍 Where we are: Part II · Discovery & Development, Learned — Chapter 7. The last chapter ranked the clones and handed us a winner — a CHO line that makes mAb-A. Now that line needs a process: a medium, a feed, a temperature, a pH. Classical development walks a factorial grid of experiments to find good settings. This chapter is about the learning method that walks far fewer of them, and walks them on purpose.

A clone is not a process. The cell line from WCB-CHO-001 can in principle make a great deal of antibody, but only inside a narrow envelope of conditions it has never been told. What feed schedule? What glucose setpoint? What temperature, and should it shift mid-run? Process development (PD) is the search for that envelope, and historically the search has been a design of experiments (DoE) — a structured grid of bioreactor runs, often a full or fractional factorial, that maps a response surface across a handful of factors. The grid is principled and auditable, and it is also expensive: a single Ambr or bench run costs weeks of a scientist's time and a parallel reactor slot, and the grid's size explodes with every factor you add.

This chapter argues that the single strongest, most defensible machine-learning application in PD is not a black-box predictor of titer but a search strategy: Bayesian optimization (BO) built on a Gaussian process (GP). BO does not try to map the whole response surface. It builds a probabilistic belief about where the optimum is, and at each step it spends the next precious run where that belief says it will learn the most. Against the same goal, BO routinely reaches a better optimum in a fraction of the runs a factorial grid needs — and unlike a grid, it gets smarter as it goes.

The simple version

Imagine hunting for the deepest point in a dark, hilly lake with only a few sonar pings to spend. A factorial DoE drops the pings on a fixed grid — evenly spaced, decided in advance, blind to what it finds. Bayesian optimization drops one ping, looks at the depth, builds a guess of where the lake floor probably dips, and aims the next ping at the most promising spot — balancing "go where it already looks deep" against "go somewhere we know nothing about." After a dozen pings it has usually found a deeper hole than the grid found with fifty, because every ping after the first was chosen on purpose. The Gaussian process is the running guess of the lake floor; the acquisition function is the rule for where to ping next.

What this chapter covers

We frame PD as sequential black-box optimization under a tight experimental budget, then build it up piece by piece: the search space of feed and process parameters; the Gaussian process as a surrogate model that returns a mean and an uncertainty everywhere; the acquisition function (Expected Improvement and friends) that turns that uncertainty into the next experiment; the multi-objective case where titer fights product quality and the answer is a Pareto front, not a point; the ML-assisted design space and how BO connects to Quality by Design (QbD) and ICH Q8; the high-throughput Ambr automation that makes the loop physically runnable; and an honest accounting of where autonomous self-driving labs actually stand. The runnable artifact, examples/platform/ml/bayesopt_doe.py, optimizes a feed policy against the fed-batch simulator and watches BO overtake a Latin-hypercube DoE within roughly a dozen simulated runs.

The task: sequential optimization under a brutal budget

Most of this book's learning tasks are prediction — given features, estimate a number. PD's central task is different. It is optimization: find the input setting x* that maximizes an objective f(x) — say, day-14 titer — where f is an expensive, noisy, black-box function you can only learn by running an experiment. You cannot differentiate f; you cannot evaluate it a million times; each evaluation is a two-week bioreactor run. This is precisely the regime BO was invented for, and it maps onto bioprocess PD almost without distortion [1].

Frame it concretely against our running example. The clone is fixed (WCB-CHO-001 → mAb-A). The decision variables are a small vector of process settings — for our simulator, a feed policy: how aggressively to bolus glucose and glutamine on feed days, plus a temperature-shift choice. The objective is the run's day-14 titer (with quality constraints we will add shortly). The budget is the number of reactor runs we are willing to spend before locking a process — in real PD often ten to forty, in our simulated demonstration a couple of dozen. The question BO answers is not "what is the response surface" but "where do I run next so that, after my budget is gone, the best run I have seen is as good as possible."

Why does a factorial DoE struggle here? Three reasons. First, the curse of dimensionality: a full factorial at three levels across five factors is 3^5 = 243 runs, which no PD team can afford, so they fall back to fractional designs that confound interactions. Second, the grid is non-adaptive — every point is chosen before the first result arrives, so a run that lands in an obviously hopeless corner is wasted. Third, a DoE typically fits a low-order polynomial response surface that cannot represent the sharp ridges and plateaus real bioprocess optima sit on. BO replaces the polynomial with a Gaussian process that bends to the data, and replaces the fixed grid with a feedback loop.

Evidence

The claim that BO reaches a competitive optimum in materially fewer experiments than classical DoE is supported across recent peer-reviewed bioprocess work — thermodynamics-aware BO validated in Ambr15 micro-bioreactors for media design [2], iterative BO for cell-culture media development [3], and a 2025 tutorial review of BO in bioprocess engineering [1] (all research, peer-reviewed-independent). Vendor "fewer experiments" headlines (e.g. 40–80%) are vendor-self-reported and are flagged as such later in this chapter; do not conflate the two tiers.

The Gaussian process: a belief about the surface, with honest error bars

The heart of BO is the surrogate model — a cheap stand-in for the expensive function. The standard choice is a Gaussian process, and the reason is its single most useful property: a GP returns, at every point in the search space, not just a predicted mean but a calibrated uncertainty. Where you have run experiments, the uncertainty collapses toward the measurement noise; far from any data, it balloons. That uncertainty field is what lets BO decide where exploring is worth it.

Formally, a GP places a distribution over functions: any finite set of points has a joint Gaussian distribution governed by a mean function (usually taken as zero or a constant after centering) and a covariance or kernel function k(x, x') that encodes how similar two settings' outcomes should be. The workhorse kernel is the Matérn (or squared-exponential) kernel, which says nearby settings give similar titers and lets you tune smoothness and a per-dimension length scale. Conditioned on the runs observed so far, the posterior mean and variance at a new candidate x have closed forms:

  • the posterior mean μ(x) is the GP's best guess of the titer at x;
  • the posterior variance σ²(x) is how unsure it is there.

The length scales are not guessed; they are fit by maximizing the marginal likelihood of the observed runs, which is why a GP with only a dozen data points can still be well-calibrated — it has very few parameters to learn, exactly the discipline the small-data chapter argued for. A GP is, in effect, the hybrid-modeling instinct of Book 2 applied to the search rather than the prediction: it builds in a smoothness prior so the data has less to do.

The acquisition function: turning uncertainty into the next run

A surrogate alone does not pick experiments. The acquisition function does. It scores every candidate setting by how useful running it would be, then BO runs the maximizer of that score. Every acquisition function is some balance of two impulses: exploitation (run where the mean is already high) and exploration (run where the variance is high, because you might be missing a better region). Tilt all the way to exploitation and BO climbs the nearest hill and stops; tilt all the way to exploration and it scatters like a grid. The art is the balance.

The most common acquisition function is Expected Improvement (EI). Let f* be the best titer observed so far. EI scores a candidate by the expectation, under the GP posterior, of how much it would improve on f*:

EI(x) = E[ max(f(x) − f*, 0) ]

Because the posterior at x is Gaussian with mean μ(x) and standard deviation σ(x), EI has a closed form built from the normal PDF and CDF. The intuition is exactly right: a point scores high either because its mean is well above f* (promising on average) or because its uncertainty is large enough that the upside tail pokes above f* (a long shot worth taking). A point that is confidently mediocre scores near zero, and BO never wastes a run there. Two other common choices are Upper Confidence Bound (UCB), μ(x) + κ·σ(x), which makes the exploration weight an explicit knob κ, and Probability of Improvement, which is greedier and less used. EI is the default because it needs no tuning knob and behaves well out of the box [1].

The loop, then, is five lines of logic: (1) fit the GP to all runs so far; (2) maximize EI over the search space to pick the next setting; (3) run that experiment — in our demo, simulate the fed-batch; (4) append the result; (5) repeat until the budget is gone. The expensive step is (3); everything else is milliseconds. That asymmetry — cheap to decide, costly to evaluate — is the whole reason BO exists.

Hero diagram of the Bayesian optimization loop for process development: on the left a small set of past fed-batch runs plotted as titer versus a feed-aggressiveness axis; a cyan Gaussian-process surrogate curve drawn through them with a shaded uncertainty band that is narrow at observed points and wide between them; below it a green Expected-Improvement curve peaking in an unexplored region where the upside tail is large; an arrow from that EI peak selecting the next setting; that setting feeding an indigo fed-batch reactor run box (the simulator) that returns a new titer point; and a return arrow folding the new point back into the surrogate, the loop labelled fit, select, run, update; a side panel contrasts a fixed factorial grid of evenly spaced points against the adaptive BO points clustering near the optimum. Bayesian optimization as a feedback loop: a Gaussian process turns a few expensive runs into a belief surface with honest error bars, Expected Improvement spends the next run where that belief says the payoff is largest, and the new result sharpens the surface — so the points cluster near the optimum instead of tiling a grid. Original diagram by the authors, created with AI assistance.

Building it on the simulator: BO versus a Latin-hypercube DoE

The runnable artifact makes the argument concrete by optimizing a feed policy against the 14-day fed-batch CHO simulator the whole series shares. The simulator integrates Monod-limited growth, a lactate-inhibited death phase, and bolus feeds on days 3, 5, 7, 9, 11, 13; its day-14 titer is the objective. We expose a four-parameter search space — a glucose-bolus multiplier, a glutamine-bolus multiplier, a glucose Monod-region setpoint proxy, and a day-7 temperature-shift flag — and ask BO to maximize titer under a constraint that lactate must stay in band (so the model cannot "win" by starving the culture into a corner the real process would reject).

The experiment is a head-to-head: a space-filling Latin-hypercube DoE (the strong modern baseline, far better than a coarse factorial) versus GP-BO with Expected Improvement, both given the same budget. We run them on the same simulator with the same seed so the comparison is fair, and we track the best titer found so far after each run.

# examples/platform/ml/bayesopt_doe.py — GP-BO over the fed-batch feed policy.
import numpy as np
from scipy.stats import norm, qmc
from sklearn.gaussian_process import GaussianProcessRegressor
from sklearn.gaussian_process.kernels import Matern, ConstantKernel, WhiteKernel
from bioproc_sim.fed_batch import simulate_with_policy # thin wrapper exposing the feed knobs

# 4-D search space: [glc_bolus_mult, gln_bolus_mult, glc_setpoint, temp_shift_flag]
BOUNDS = np.array([[0.6, 1.8], [0.6, 1.8], [3.0, 8.0], [0.0, 1.0]])
RNG = np.random.default_rng(2026)

def objective(x):
"""Run the fed-batch sim under feed policy x; return day-14 titer, penalized if lactate OOB."""
res = simulate_with_policy(x, batch_id="PD-DOE")
titer = res.state["titer_g_L"].iloc[-1]
lac_max = res.state["lactate_g_L"].max()
penalty = max(0.0, lac_max - 3.5) * 1.5 # keep lactate in band (constraint)
return titer - penalty

def expected_improvement(mu, sigma, f_best, xi=0.01):
sigma = np.maximum(sigma, 1e-9)
z = (mu - f_best - xi) / sigma
return (mu - f_best - xi) * norm.cdf(z) + sigma * norm.pdf(z)

def bayesopt(n_init=5, n_iter=15):
# 1. seed with a small space-filling sample
X = qmc.scale(qmc.LatinHypercube(d=4, seed=2026).random(n_init), BOUNDS[:, 0], BOUNDS[:, 1])
y = np.array([objective(x) for x in X])
kernel = ConstantKernel(1.0) * Matern(length_scale=[1.0] * 4, nu=2.5) + WhiteKernel(0.05)
for k in range(n_iter):
gp = GaussianProcessRegressor(kernel=kernel, normalize_y=True, n_restarts_optimizer=4)
gp.fit(X, y) # 2. fit surrogate to all runs so far
cand = qmc.scale(qmc.LatinHypercube(d=4, seed=k).random(2048), BOUNDS[:, 0], BOUNDS[:, 1])
mu, sd = gp.predict(cand, return_std=True)
x_next = cand[np.argmax(expected_improvement(mu, sd, y.max()))] # 3. maximize EI
X = np.vstack([X, x_next]); y = np.append(y, objective(x_next)) # 4. run + append
return X, y

def latin_hypercube_doe(n=20):
X = qmc.scale(qmc.LatinHypercube(d=4, seed=7).random(n), BOUNDS[:, 0], BOUNDS[:, 1])
return X, np.array([objective(x) for x in X])

if __name__ == "__main__":
Xb, yb = bayesopt(n_init=5, n_iter=15) # 20 runs total
Xd, yd = latin_hypercube_doe(n=20) # 20 runs total
print(f"BO best titer after 20 runs: {yb.max():.3f} g/L (best found at run {yb.argmax()+1})")
print(f"DoE best titer after 20 runs: {yd.max():.3f} g/L")
print(f"BO reached DoE's best ({yd.max():.3f}) after {np.argmax(yb >= yd.max())+1} runs")

A representative run prints the pattern the chapter is about (numbers illustrative, from a simulator run):

BO best titer after 20 runs: 6.142 g/L (best found at run 13)
DoE best titer after 20 runs: 5.873 g/L
BO reached DoE's best (5.873) after 9 runs

The headline is not the absolute titer — that is the simulator's, not a real plant's — but the shape: BO matched the 20-run DoE's best with roughly half the budget, then kept climbing while the DoE had nothing left to spend. After the initial space-filling seed, every BO run was placed on purpose, and the placements migrated toward the feed-aggressive, lactate-respecting corner the surrogate learned to trust. That is the entire economic argument for BO in PD, reproduced in code you can run.

Multi-objective BO: when titer fights quality

Real PD never optimizes titer alone. Push feed and temperature for maximum titer and you can quietly degrade glycosylation, raise aggregate (HMW), or shift charge variants — the very critical quality attributes the release panel will judge. The honest objective is therefore a vector: maximize titer and keep monomer purity high and hold a glycoform in spec. These objectives conflict, so there is no single best point — there is a Pareto front, the set of settings where you cannot improve one objective without sacrificing another.

Multi-objective Bayesian optimization (MOBO) searches for that front directly. The clean construction fits a separate GP per objective and replaces single-objective EI with a multi-objective acquisition function — most commonly Expected Hypervolume Improvement (EHVI), which scores a candidate by how much it would grow the volume of objective-space dominated by the current front. The output is not a recommendation but a menu: a Pareto set of process settings, each a different defensible trade-off between yield and quality, from which the team picks according to the target product profile.

Evidence

A 2025 ETH Zurich + Novo Nordisk study used multi-objective BO (per-objective GPs plus an NSGA-II front search, on a modified ProcessOptimizer) to develop biologics formulations, finding optimal formulations in 33 experiments and improving a diffusion-interaction parameter from 9.1 to 48.6 mL/g — research, peer-reviewed-independent [4]. For the culture side, a 2025 study explicitly targets ML-guided CHO bioprocess and media optimization for improved titer and glycosylation — the titer-versus-quality trade-off this section describes — research, peer-reviewed-independent [5]. MOBO for titer-versus-glycosylation is demonstrated, not yet routine GMP practice.

This is also where BO stops being "just an optimizer" and starts being a design-space tool — which is what makes it palatable to the quality organization.

From a point to a design space: BO meets QbD and ICH Q8

A regulator does not want your single best setting; under Quality by Design (QbD), codified in ICH Q8(R2), they want a design space — the multidimensional region of input ranges and process parameters that has been demonstrated to provide assurance of quality, inside which you may move without filing a change [6]. Classically a design space is carved out of a DoE's response surface. The learning version replaces that surface with a probabilistic model and reports the design space as a region where the probability of meeting all CQAs exceeds a threshold — a Bayesian probabilistic design space [7][8].

The GP that BO builds is such a model. Its posterior gives, at every setting, a predicted mean and a calibrated uncertainty for each CQA — exactly the ingredients to compute "probability this setting passes." So the same machinery that found the optimum can be reused to describe the safe region around it, with uncertainty made explicit rather than hidden inside a polynomial's residuals. That dual use is why BO sits so well with QbD: it is an optimizer for the development scientist and a design-space generator for the regulatory filing, from one fitted model. The connection runs straight into Book 4's ontology, where bp:FeedRate bp:affectsQuality bp:MonomerPct-CQA is exactly the type-level edge a probabilistic design space quantifies — BO supplies the evidence and the ranges that edge needs.

A caution belongs here. A GP's uncertainty is only as honest as its kernel and its data. With a dozen runs in four dimensions, the posterior far from the data is a prior assumption wearing the costume of a measurement, and a probabilistic design space drawn from it can look more authoritative than it is. The validation chapter and the FDA's model-credibility framework both insist the design space's claims be checked against held-out confirmation runs before they govern anything — the model proposes the region; confirmation runs ratify it.

Anatomy of one Bayesian-optimization iteration

The series signature is to take one record apart. For BO the record is not a batch or a prediction — it is one iteration of the loop, the unit that turns the entire run history into a single chosen experiment. Unpack it and the chapter's logic is laid out as fields.

Anatomy identity card unpacking one Bayesian-optimization iteration: an indigo header naming iteration 8 of the feed-policy campaign for mAb-A; an inputs block listing the run history so far as paired settings and titers; a cyan surrogate block showing the fitted Gaussian process with its Matern kernel, fitted length scales per dimension, and the noise term; a green acquisition block holding the Expected-Improvement function, the incumbent best titer f-star, and the EI-maximizing candidate setting with its predicted mean and uncertainty; an amber decision block naming the next experiment to run (the four feed-policy values) and the constraint check that lactate stays in band; a violet provenance block recording the random seed, the acquisition type EI, the budget consumed, and a link to the simulator run that will return the new titer; a footer noting the iteration appends one row to the history and the loop repeats. One BO iteration, fully unpacked: the run history in, a fitted Gaussian process with its kernel and length scales, the Expected-Improvement score over candidates, the chosen next experiment with its predicted mean and uncertainty and its constraint check, and the provenance — seed, acquisition type, budget — that makes the decision auditable and reproducible. Original diagram by the authors, created with AI assistance.

Read the card top to bottom. The inputs are the entire history — every (setting, titer) pair run so far — because BO is stateful in a way a one-shot predictor is not: the next decision depends on all prior results, not the latest one. The surrogate block is the fitted GP: its kernel family (Matérn ν=2.5), the fitted length scales per dimension (a short length scale on a dimension means titer is sensitive there — the model has learned which knobs matter), and the noise term that keeps it from interpolating sensor jitter. The acquisition block holds the incumbent best f*, the EI surface over thousands of candidate settings, and the argmax — the single setting EI says is most worth running, reported with its predicted mean and uncertainty so a scientist can see why it was chosen (high mean? high variance? both?). The decision block is the experiment to actually run, plus the constraint check that the candidate keeps lactate in band, so BO cannot recommend a setting the process would reject. The provenance block — seed, acquisition type, budget consumed, link to the run that will return the result — is what makes the iteration auditable and exactly reproducible, the discipline the MLOps chapter demands of any model that touches development decisions. The record is the loop made inspectable: nothing about the next experiment is arbitrary, and every field can be replayed.

High-throughput automation: the Ambr makes the loop physically runnable

BO's loop is only as fast as its slowest step — running the experiment. A loop that waits two weeks per single bench run, one at a time, is mathematically elegant and operationally hopeless. What makes BO practical in PD is high-throughput micro-bioreactor automation, above all the Ambr (advanced micro-bioreactor) systems that run 24 or 48 parallel 10–250 mL cultures with automated liquid handling, sampling, and feeding. Now a BO "iteration" can propose a batch of settings — 24 at once — run them in parallel, and fold all 24 results back into the surrogate. Batch BO acquisition functions (q-EI, q-EHVI) are designed for exactly this: pick q diverse, jointly informative settings rather than one, so the parallel slots are not wasted on near-duplicate runs.

This is the configuration the strongest real demonstrations use. Thermodynamics-aware BO for media design was validated in Ambr15 micro-bioreactors [2]; a recent self-driving perfusion-development study (DataHow, Sartorius, and Merck/Ares Trading) ran Bayesian optimal experimental design with a cognitive digital twin across 24 parallel Ambr250 mini-bioreactors over a 27-day cultivation, transferring learning between cell lines [9]. The pairing is the point: BO supplies the brains (where to run next), the Ambr supplies the hands (running 24 at once), and the historian and contextualization layer from Book 2 supply the memory (so each run's result lands as a clean, attributable row the surrogate can trust). Retrofitting in-line Raman onto Ambr — Sartorius's BioPAT Spectro — closes part of the measurement gap so the objective is read sooner [10] (vendor-self-reported for the integration claim).

The unsolved part: the cold start, transferability, and trusting the closed loop

BO's weakness is the same small-data ceiling that haunts the whole book, sharpened at the start of the loop. The cold start is acute: with zero prior runs the GP is pure prior, and its first handful of suggestions are barely better than space-filling guesses — BO earns its advantage only after the surrogate has data to bend. A poor kernel choice or a badly scaled search space can leave it worse than a grid for the first several runs. The standard mitigations are informed priors (seed the GP with a mechanistic model's predictions, the hybrid move) and transfer learning (warm-start from a related molecule's campaign) — but here lurks the field's documented hard problem: run-to-run variability in living systems severely compromises transferability, so a surrogate learned on one cell line or one scale may mislead on the next [1].

The deeper unsolved part is trusting a closed loop in a regulated setting. A BO campaign that merely proposes experiments for a scientist to approve is comfortably human-in-the-loop. The frontier — a self-driving lab that chooses, runs, and acts on experiments without a human between iterations — is exactly the autonomy the regulatory line is drawn against. The peer-reviewed self-driving demonstrations are explicit that they are pilot/research at PD scale (3–15 L), and their own authors stress the gap between robotic capability and device autonomy; the WuXi Biologics ISLFCC autonomous-lab result — a self-reported +26.8% average titer (illustrative; single-company, self-reported) across three CHO clones — is peer-reviewed but unreplicated and PD-scale, not GMP [11][9]. Draft EU GMP Annex 22 would exclude continuously-learning, adaptive models from critical GMP use entirely, which means a BO loop that keeps changing the process is fine in development but cannot drive a commercial process without a locked model and a predetermined change control plan [12]. The honest status: BO is a production-grade development accelerator and a strong design-space tool; the autonomous lab that runs PD end to end without a human is a vivid, real, but still pilot capability.

What this chapter adds to the model suite

This chapter contributes the optimization workhorse of Book 5's example suite:

  • examples/platform/ml/bayesopt_doe.py — a Gaussian-process Bayesian optimizer over the fed-batch simulator's feed policy, with an Expected-Improvement acquisition, a lactate-in-band constraint penalty, and a head-to-head Latin-hypercube DoE baseline. It demonstrates, on the shared simulator, BO matching a 20-run DoE's best with roughly half the runs and then surpassing it — the core economic claim of the chapter, in runnable code. The module is structured so the single-objective EI can be swapped for a multi-objective EHVI acquisition (titer versus a quality proxy) and so the GP posterior can be reused to sketch a probabilistic design space, wiring it forward to the QC/release and hybrid-model chapters.

It sits beside the suite's predictive models (the deep soft sensor, the clone-ranking model) as the one that decides what to run, not what will happen — the suite's only search algorithm.

Why it matters

Process development is where the cost of a biologic's manufacturing process is largely fixed, and where experiments are most expensive relative to their information. A method that finds a better process in half the runs is not a marginal convenience; it shortens timelines, frees parallel-reactor capacity, and — because the surrogate it leaves behind is a probabilistic model of the whole region, not a single point — it hands the regulatory filing a quantified, uncertainty-aware design space for free. BO is also the rare ML application in this book whose value does not depend on a flood of data: it is designed for the small-data, expensive-experiment regime that defeats black-box prediction, which is precisely why it is the strongest, most defensible learning method in PD and one of the few that crosses cleanly from research into routine development use.

In the real world

The production-grade reality is hybrid modeling plus experiment design, and the named players are consistent. DataHow — an independent ETH Zurich spin-off, not a Sartorius subsidiary — sells DataHowLab hybrid models with transfer learning and reports 30–60% (up to 80%) fewer experiments; its flagship Bristol Myers Squibb PD case (48 runs at 5 L, 12 CPPs, 18 CQAs) has a peer-reviewed companion headlining roughly 33% better prediction accuracy with about half the data — and the peer-reviewed figures (33%, ~half) are the ones to cite, not the vendor page's 22%/3x [13] (pilot/research; vendor efficiency headlines are vendor-self-reported). Sartorius ships the Umetrics MODDE/SIMCA DoE-and-modeling stack and the Ambr/BioPAT Spectro hardware that the loop runs on; its autonomous self-optimizing layer remains aspirational [10]. On the academic frontier, the ETH Zurich + Novo Nordisk multi-objective formulation work and the DataHow/Sartorius/Merck self-driving perfusion study are the clearest peer-reviewed BO demonstrations [4][9], while WuXi Biologics' ISLFCC is the most ambitious autonomous-lab claim (peer-reviewed, single-company, self-reported, PD-scale) [11]. The cross-industry maturity signal is sobering: the 7th ISPE Pharma 4.0 survey found AI/ML to have the most pilots and the fewest scaled implementations [14] — BO-driven PD is one of the few corners where the pilots are genuinely turning into routine practice.

Key terms

  • Bayesian optimization (BO) — a sequential strategy for maximizing an expensive, noisy, black-box function in few evaluations, by fitting a probabilistic surrogate and choosing each next experiment to maximize an acquisition score.
  • Gaussian process (GP) — the usual surrogate model; a distribution over functions that returns, at every point, a predicted mean and a calibrated uncertainty, governed by a kernel and fit by marginal likelihood.
  • Kernel (covariance function) — the GP's similarity prior (e.g. Matérn); its fitted per-dimension length scales reveal which process parameters matter.
  • Acquisition function — the rule that scores candidate experiments by usefulness, balancing exploitation (high mean) against exploration (high uncertainty); Expected Improvement (EI), UCB, and Probability of Improvement are the common choices.
  • Expected Improvement (EI) — the default acquisition; the expected amount by which a candidate would beat the best result seen so far, in closed form from the GP posterior.
  • Design of experiments (DoE) — the classical, non-adaptive grid of experiments (factorial, fractional-factorial, Latin-hypercube) that BO competes against and usually beats on budget.
  • Multi-objective BO (MOBO) — BO over several conflicting objectives (titer versus quality), returning a Pareto front of trade-offs rather than a single point; typically uses Expected Hypervolume Improvement (EHVI).
  • Pareto front — the set of settings where no objective can be improved without worsening another; the honest output when goals conflict.
  • Design space / QbD / ICH Q8 — the demonstrated region of conditions assured to give acceptable quality; a probabilistic design space reports it as a region where the modeled probability of meeting all CQAs exceeds a threshold.
  • Ambr (advanced micro-bioreactor) — automated 24/48-parallel micro-bioreactor systems that make BO's experiment loop physically fast enough to run; enables batch BO (propose q settings at once).
  • Cold start — BO's weak early phase, when the surrogate has too little data to be better than space-filling; mitigated by mechanistic or transfer-learning priors.
  • Self-driving lab — an autonomous loop that chooses, runs, and acts on experiments without a human between iterations; demonstrated at PD scale (pilot/research), excluded from critical GMP by draft Annex 22.

Where this leads

We have a process — a feed, a temperature, a design space — found in a fraction of the runs a grid would need. But every BO iteration leaned on being able to measure the objective: titer, glycosylation, purity. Those measurements come from the analytical lab, and the next chapter asks how learning transforms the instruments themselves. Analytical Methods: Chemometrics, Deep Spectroscopy, and Automated Chromatograms turns to the soft sensors and spectral models that read the very objectives this chapter optimized against — from PLS chemometrics to deep spectroscopy and automated chromatogram interpretation — the measurement layer the entire learning enterprise is built on.