Formulation and Fill-Finish: Computer Vision and the Lyophilizer

📍 Where we are: Part V · Fill-Finish & Release, Learned — Chapter 17. UF/DF handed over DS-001, the bulk drug substance at its target concentration in the formulation buffer. This chapter turns that bulk into the thing a patient actually receives — a filled, inspected, sometimes freeze-dried vial — and meets the single most production-deployed application of machine learning in all of biomanufacturing: deep-learning vision for automated visual inspection.

The drug substance is finished as a molecule. It is not yet a product. Formulation and fill-finish is the operation that takes the bulk DS-001 lot, dilutes or adjusts it to the final formulation, and dispenses it — millilitre by millilitre — into thousands of glass vials or prefilled syringes, which become the drug-product lot DP-001 in our running genealogy. Then, for many biologics, the product is freeze-dried (lyophilized) into a stable cake; every container is checked for fill accuracy, particulates, and closure integrity; and only the units that pass are released. It is the most physically discrete step in the whole process — the continuous broth of upstream has become a countable population of individual objects, each of which can be accepted or rejected on its own.

That discreteness is exactly why this is where machine learning has actually crossed into routine GMP production. Upstream, ML soft-senses a value that a human still acts on; here, in automated visual inspection (AVI), a convolutional network looks at a picture of one vial and makes an accept/reject decision that, after validation, can stand on its own. Amgen has publicly reported releasing roughly 95% of its syringes and vials through automated inspection (vendor/self-reported) [1] — a number that, with all its caveats, is the strongest single data point for production ML in the industry. This chapter builds the two honest, deployable pieces around it: a fill-weight control and reject model on our real fill line, and the framing of the vision problem itself, with a small CNN sketch.

The simple version

Think of the final station on any factory line that makes things people swallow or inject — a person in a hairnet holding each unit up to a bright light, turning it, looking for a fleck, a crack, a low fill, a crooked cap, then putting it in the "good" bin or the "scrap" bin. People are slow, they get tired, they disagree, and they reject far too many good units just to be safe. Automated visual inspection replaces that person with a camera and a trained eye made of software: the camera takes pictures of every vial, and a neural network — the same family of model that recognizes faces in photos — decides accept or reject in a fraction of a second, consistently, all day. The hard part is not the camera. It is teaching it every way a vial can be defective, and then proving to a regulator that it never misses the dangerous ones.

What this chapter covers

Why fill-finish is the natural home for production ML — discrete units, a binary accept/reject decision, and a defect that is visible.
Fill-weight and dose-accuracy control — deriving control and action limits from the line's own statistics, the in-process checkweigher, and why a fixed limit beats a learned threshold on a 478-to-2 class imbalance.
Deep-learning automated visual inspection (AVI) — the task framing, why it needs a CNN and not a tabular model, the defect library, false-reject reduction, and the production deployments (Amgen, Stevanato, Brevetti CEA, Syntegon).
Lyophilization cycle optimization — soft-sensing product temperature and sublimation endpoint, and design-space modeling of the freeze-drying cycle.
Container-closure integrity (CCIT) and aseptic/isolator environmental monitoring as emerging ML targets.
A runnable module, examples/platform/ml/fill_control.py, and a CNN sketch in vision_avi.py, with the anatomy of one AVI accept/reject record.
The honest open problem: the validation-versus-learning paradox — how do you keep a model that could keep learning frozen enough to validate, and validated enough to trust?

Why fill-finish is where production ML actually lives

Every prior chapter in this book wrestled with the same handicap: living systems, sparse offline reference, run-to-run variability, and a model that decays fast — the small-data ceiling that keeps pure ML stuck in advisory roles. Fill-finish breaks that pattern in three ways that matter.

First, the unit of decision is discrete and countable. A bioreactor produces one continuous trajectory; a fill line produces 480 individual vials (in our dataset, exactly that — BATCH-2026-001), each an independent object that can be inspected and judged on its own. That turns a hard regression-over-time problem into a clean per-item classification problem, the kind machine learning is genuinely good at.

Second, the decision is binary and consequential in a bounded way. Accept or reject this one vial. A wrong accept is a defect reaching a patient; a wrong reject is a scrapped unit. Both are costly, but neither is the open-ended "did I mis-control a CQA across a 14-day batch" problem that makes upstream ML so fraught. The error is local to one object.

Third — and this is the deepest reason — the defect is often visible, which means you can collect labelled training data in a way you almost never can upstream. A particulate, a crack, a low meniscus, a missing stopper: these are things a camera can see and a trained human can label. The bottleneck shifts from "we have no ground truth" to "we have to build a comprehensive defect library and validate against it" — a hard problem, but a tractable one, and one the industry has now solved well enough to deploy in commercial GMP. That is why, when you ask where ML has truly reached production in biomanufacturing, the honest answer clusters around monitoring, predictive maintenance, and — above all — vision inspection, not autonomous control of CQAs [2].

Fill-weight and dose accuracy: the case for not using ML

Before the glamorous vision problem, the unglamorous one that runs on every line: did each vial get the right amount of drug? An aseptic filler dispenses a nominal volume — 1.0 mL in our dataset — and an in-process-control (IPC) checkweigher weighs vials (continuously or by sampling) to confirm the delivered dose stays within a dose-accuracy specification, typically about ±5% of target for a liquid fill. This is a control problem, and it is the cleanest possible illustration of a lesson this whole book keeps insisting on: reach for the simplest tool the physics allows, and only escalate when it genuinely helps.

The right tool here is classical statistical process control, not machine learning. From the line's own within-run variation you derive individuals (I) control limits — using the moving-range estimate of short-term sigma, the same σ = MR̄ / d₂ math from the open-source SPC chapter — and a capability index against the dose spec. Crucially, control limits come from the process, not the spec: an action limit that fires when the line drifts is what catches a developing fill problem before any vial goes out of spec. On our real fill line the within-run moving range gives a centre of 1.0107 g and individuals limits of roughly 0.951 to 1.071 g, while the delivered-volume capability against the ±5% spec is Cpk ≈ 0.84 — a marginal capability that explains, honestly, why this line does reject the occasional vial: two of the 480, both for low fill.

Now the tempting move: why not learn the reject boundary with a classifier? You can — and the result is a quiet object lesson. With 478 good vials and 2 rejects, the class imbalance is brutal (about 0.4% positives). A class-balanced logistic regression on (fill_volume, fill_weight) does catch both true rejects, but only by drawing its boundary so conservatively that it also flags dozens of perfectly good vials as suspect. The learned threshold recovers the fixed limit; it does not beat it — and a learned, data-derived boundary is far harder to validate under GMP than a fixed numeric limit a reviewer can read in one line. This is the false-reject problem in miniature, and it foreshadows exactly the trade-off that makes deep-learning AVI valuable: the win from ML in fill-finish is not catching defects a rule misses, it is catching them without throwing away good product.

The fill-finish station, learned: a marginally-capable fill controlled by statistics not ML, an optional lyophilization cycle with a soft-sensed endpoint, and the production-grade deep-learning vision inspection that accepts or rejects each individual vial — the one place in biomanufacturing where a model's decision routinely stands on its own. Original diagram by the authors, created with AI assistance.

# examples/platform/ml/fill_control.py — fill-weight control + a reject model on
# the REAL fill line (480 vials of DP-001, BATCH-2026-001). The lesson is that the
# fixed checkweigh limit beats the learned threshold; the classifier only recovers it.
import numpy as np, pandas as pd
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import confusion_matrix

SPEC_LOW_ML, SPEC_HIGH_ML = 0.95, 1.05   # +/- 5% dose-accuracy spec on a 1.0 mL fill
D2 = 1.128                               # moving-range constant for n=2

df = pd.read_csv("examples/datasets/fill_events.csv")
w = df["fill_weight_g"].to_numpy()
sigma = np.abs(np.diff(w)).mean() / D2                 # short-term sigma from within-run MR
center = w.mean()
ucl, lcl = center + 3 * sigma, center - 3 * sigma      # individuals control limits

v = df["fill_volume_mL"].to_numpy()                    # capability vs the dose spec
cpk = min((SPEC_HIGH_ML - v.mean()) / (3 * v.std(ddof=1)),
          (v.mean() - SPEC_LOW_ML) / (3 * v.std(ddof=1)))

# The tempting (and inferior) ML move: learn the reject boundary on 478 good : 2 reject
X, y = df[["fill_volume_mL", "fill_weight_g"]].to_numpy(), df["reject"].astype(int).to_numpy()
clf = LogisticRegression(class_weight="balanced", max_iter=1000).fit(X, y)
tn, fp, fn, tp = confusion_matrix(y, clf.predict(X), labels=[0, 1]).ravel()

print(f"Fill line {df.batch_id.iloc[0]}: {len(y)} vials, {y.sum()} rejected (low_fill)")
print(f"Checkweigh I-limits (within-run MR): center={center:.4f} g  "
      f"LCL={lcl:.4f}  UCL={ucl:.4f}")
print(f"Dose-accuracy capability vs +/- 5% spec: mean={v.mean():.4f} mL  Cpk={cpk:.3f}")
print(f"Learned reject classifier: tp={tp} fp={fp} fn={fn} tn={tn}  "
      f"(recovers the limit, but over-rejects)")

Fill line BATCH-2026-001: 480 vials, 2 rejected (low_fill)
Checkweigh I-limits (within-run MR): center=1.0107 g  LCL=0.9507  UCL=1.0708
Dose-accuracy capability vs +/- 5% spec: mean=1.0008 mL  Cpk=0.841
Learned reject classifier: tp=2 fp=41 fn=0 tn=437  (recovers the limit, but over-rejects)

The confusion matrix is the whole argument on one line: the learned threshold catches both true low-fill vials (fn=0) but at the cost of 41 false rejects (fp=41) — over twenty good vials scrapped for every bad one caught. A fixed checkweigh limit catches the same two with zero over-rejection and a validation file a reviewer can read in a sentence. ML did not lose because it was badly tuned; it lost because the problem did not need it. Remembering which fill-finish problems need learning and which do not is half of doing this well.

Automated visual inspection: the production case for deep learning

Visual inspection is the opposite story. Every parenteral product must be inspected for visible particulates and container/closure defects — it is a compendial requirement, historically done by trained humans holding vials against black-and-white backgrounds. Human inspection has two chronic failures: it is inconsistent (inspectors disagree, and the same inspector disagrees with themselves across a shift), and it is over-conservative — to be safe, humans and the older rule-based machines that replaced them reject large fractions of good product. Industry experts have cited rule-based AVI false-rejection rates as high as roughly 20% of good vials (conference-reported) [3]. On a multi-million-vial campaign, a 20% false-reject rate is a staggering, invisible waste of perfectly good medicine.

This is where deep learning genuinely earns its place. The task is image classification: a camera images each vial — often from multiple angles, under controlled lighting, sometimes spinning the vial to set particulates moving so they can be distinguished from fixed glass features — and a convolutional neural network (CNN) classifies the resulting image as accept, or as one of several defect types (particulate, crack, low fill level, stopper/closure defect, cosmetic). A CNN is the right tool precisely because this is not a tabular problem: the signal is a spatial pattern of pixels, and convolutional layers are designed to learn local features (an edge, a speck, a meniscus) and combine them hierarchically into a defect/no-defect judgement. You cannot hand a logistic regression a flattened image and expect it to find a 40-micron particle; you need a model whose architecture matches the structure of the data.

Why a CNN, structurally

A convolutional network applies the same small learned filters across every position of the image, so a feature detector that learns "small bright speck against dark glass" works wherever the speck appears. Stacked convolution-and-pooling blocks build from local edges to larger structures while shrinking the spatial resolution, and a final classification head maps the learned features to the defect classes. The sketch in our example suite makes the shape contract concrete with three convolutional blocks, global pooling, and a six-class head:

# examples/platform/ml/vision_avi.py — an HONEST CNN sketch for AVI (no real vial
# images ship in the dataset). It fixes the architecture and shape contract; the
# real accept/reject numbers (Amgen ~95% auto-release) are vendor/self-reported.
import torch, torch.nn as nn

IMG_CH, IMG_HW = 1, 128                                  # grayscale ROI cropped to the meniscus
CLASSES = ["accept", "particulate", "crack", "fill_level", "stopper", "cosmetic"]

class VialInspectorCNN(nn.Module):
    def __init__(self, n_classes=len(CLASSES)):
        super().__init__()
        self.features = nn.Sequential(
            nn.Conv2d(IMG_CH, 16, 3, padding=1), nn.BatchNorm2d(16), nn.ReLU(),
            nn.MaxPool2d(2),                                          # 128 -> 64
            nn.Conv2d(16, 32, 3, padding=1), nn.BatchNorm2d(32), nn.ReLU(),
            nn.MaxPool2d(2),                                          # 64  -> 32
            nn.Conv2d(32, 64, 3, padding=1), nn.BatchNorm2d(64), nn.ReLU(),
            nn.AdaptiveAvgPool2d(1),                                  # -> 64 x 1 x 1
        )
        self.head = nn.Sequential(nn.Flatten(), nn.Dropout(0.3), nn.Linear(64, n_classes))

    def forward(self, x):                                            # x: (batch, 1, 128, 128)
        return self.head(self.features(x))

model = VialInspectorCNN().eval()                # LOCKED model: eval() only, no learning in prod
for p in model.parameters():
    p.requires_grad_(False)                      # weights frozen at validation (draft Annex 22)
vials = torch.randn(8, IMG_CH, IMG_HW, IMG_HW)  # stand-in for camera ROIs
with torch.no_grad():
    prob = torch.softmax(model(vials), dim=1)
    decisions = prob.argmax(dim=1)              # class 0 = accept; anything else = a reject reason
print(f"AVI CNN (sketch): {sum(p.numel() for p in model.parameters()):,} params, "
      f"{len(CLASSES)} classes, decisions={decisions.tolist()}")

AVI vial-inspection CNN (sketch) — locked at validation, eval() only
  classes        : ['accept', 'particulate', 'crack', 'fill_level', 'stopper', 'cosmetic']
  parameters     : 23,910
  input  (b,c,h,w): (8, 1, 128, 128)
  output (b,classes): (8, 6)
  decisions (argmax class id): [0, 0, 0, 0, 0, 0, 0, 0]

Our 24-thousand-parameter sketch is a toy; production AVI nets are far deeper and trained on proprietary defect libraries running to tens or hundreds of thousands of labelled images. But the structure is the real structure, and two design choices in the sketch are load-bearing in production. The model runs in eval() mode with weights frozen — a locked model that does not learn in production, the posture every regulator now expects. And the output is not a bare accept/reject but a probability over classes, so the system can route low-confidence vials to human review rather than guessing — the human-in-the-loop pattern that makes AVI deployable.

The real bottleneck: the defect library and the validation

The neural network is the easy part. The hard, regulated, expensive part of an AVI program is the defect library: a curated, labelled set of images covering every defect type at every severity, including rare and borderline cases, against which the model is trained and — far more importantly — validated. To prove an AVI system is at least as good as the manual method it replaces, you must demonstrate its detection performance on a representative population of known defects and known-good units, the way any compendial inspection method is qualified. Building and maintaining that library, with seeded defects and human-adjudicated ground truth, is where the years of work go. Amgen has been explicit that its move to roughly 95%-auto-released vials and syringes took years of effort and direct conversations with the FDA (vendor/self-reported) [1], and the first fully validated retrofit was a syringe line, not a vial line — a reminder that "the strongest production ML case" is still a multi-year, heavily-scrutinized program, not a plug-in.

The commercial landscape is real and named. Stevanato Group's Vision AI platform markets deep-learning inspection with claimed detection accuracy up to 99.9% and roughly an order-of-magnitude reduction in false rejects (vendor materials) [4]; Brevetti CEA absorbed the deep-learning specialist Brevetti AI (formerly Criterion AI) to build the same capability for regulated inspection [5]; Syntegon's AIM (AI-enhanced inspection) reports materially more particle detection with materially fewer false rejects (vendor-reported); and Cognex, Antares Vision, and Körber play in the same space. The headline numbers are vendor- or conference-reported, not peer-reviewed, and should be read as such — but the direction is consistent and corroborated across vendors and the one named pharma deployment: deep-learning AVI detects more real defects while rejecting far less good product than the rule-based machines it replaces.

Anatomy of one AVI accept/reject record

A deployed AVI station does not emit a bare "reject." Like every artifact in this series, the value is in what travels alongside the decision — the provenance that lets a reviewer, a regulator, or an investigation reconstruct why this vial went to the scrap bin and trust that the model that sent it there was the validated one.

One inspection is a whole record: the imaged region of a single serialized vial, the locked model and the validation library it was qualified against, the per-class confidence and the routing threshold that sends borderline vials to a human, and the links back through DP-001 and DS-001 to the release decision — provenance is what lets an automated reject stand under GMP. Original diagram by the authors, created with AI assistance.

Read the card top to bottom and the chapter's whole argument is laid out as fields. The input rows are the cheap, fast signal: a timestamp, the camera station and view, a reference to the cropped region-of-interest image, and the lighting recipe — because an AVI model is only valid under the imaging conditions it was trained on. The green core is the decision: a predicted_class (accept, or a named reject reason), the confidence distribution over all six classes, and an accept_confidence compared to a review_threshold that routes uncertain vials to a human rather than letting the model guess. The model-identity block pins the model_id, model_version, and the hash of the validation_library the model was qualified against — so the record proves which locked model made this call. The reconciliation rows tie the vision decision to other evidence on the same serial: the human_verdict for routed vials, and the in-process checkweigh_g weight. The violet relationships panel records lineage — validated_by the qualification study, trained_on the defect library, part_of the DP-001 lot deriving from DS-001, and feeds the release decision.

Lyophilization: soft-sensing the freeze that cannot be rushed

Many biologics are not stable as liquids, so after filling they are lyophilized — frozen, then dried under deep vacuum so the ice sublimes directly to vapour, leaving a dry, stable cake that is reconstituted before use. Lyophilization is slow (often days), energy-hungry, and unforgiving: dry too aggressively and the cake collapses or the product is damaged above its critical formulation temperature; dry too gently and you waste days of cycle time and freezer capacity. Cycle development and control is therefore a genuine optimization problem with real money and real product risk on both sides — exactly the kind of problem where modeling pays.

The learning targets here are mechanistic-soft-sensing problems. The quantity you most want to know during primary drying is the product temperature at the sublimation front and the moment primary drying ends (when all the free ice is gone), neither of which is easy to measure in every vial without intruding on the sterile boundary. Classical approaches estimate these from chamber pressure, shelf temperature, and the partial-pressure of water vapour, and the field has well-developed first-principles models of heat and mass transfer through the drying cake. The learning lens adds two things on top of that physics. A soft sensor fuses the indirect signals into a continuous estimate of product temperature and a predicted primary-drying endpoint, so the cycle stops at the right moment instead of a conservative fixed time. And design-space modeling of the cycle — the same Bayesian-optimization and surrogate-modeling logic from process development — explores the shelf-temperature / chamber-pressure operating space to find a cycle that is fast and keeps product temperature safely below its collapse point, replacing a costly factorial grid of full multi-day cycles. Because the physics is genuinely well understood, the strong play is hybrid: a mechanistic heat-and-mass-transfer backbone with a learned component for the parts that resist clean equations (vial-to-vial heterogeneity, edge effects, cake morphology), the dominant paradigm this book returns to in the digital-twins chapter.

Container-closure integrity and the cleanroom

Two more fill-finish targets round out the picture, both earlier on the maturity curve. Container-closure integrity testing (CCIT) asks whether the seal between the vial and its stopper actually holds — a leak means a compromised sterile barrier. Non-destructive CCIT methods (headspace gas analysis, high-voltage leak detection, vacuum decay) produce signals that a classifier can learn to map to pass/fail, and ML is being layered onto these inline gauges to flag marginal closures the way the checkweigher flags marginal fills. The same residual-against-expectation logic from the rest of this book applies: a closure whose headspace signature drifts from the population is suspect.

Aseptic and isolator environmental monitoring is the other frontier. The fill happens inside a Grade-A isolator, continuously monitored for non-viable particles and viable contamination — exactly the em_samples.csv data the QC chapter leans on, with its seeded Grade-A excursion. The learning opportunity is contamination-risk prediction: treating the particle counts, pressure differentials, and recovery patterns as a multivariate stream and predicting an excursion before it breaches a limit, rather than reacting after. This is monitoring, not control — squarely in the human-in-the-loop, advisory category regulators endorse — and it is one of the production-leaning clusters (monitoring and anomaly detection) where ML in pharma genuinely lives today, as opposed to the autonomous-control cluster where it mostly does not [2].

The unsolved part: the validation-versus-learning paradox

The honest open problem here is not a missing algorithm. It is a contradiction at the heart of regulated AI, and AVI is where it bites hardest because AVI is where a model's decision actually stands on its own.

The contradiction is this. The thing that makes machine learning valuable is that it can keep learning — see new defect types, adapt to a new vial geometry, improve as more labelled images accumulate. The thing that makes a regulated process trustworthy is that it does not change without control: a validated system must behave the way it did when it was validated, every time, or the validation is meaningless. A model that keeps learning is, by definition, a moving target that traditional one-time validation was never built for. Amgen has framed its own AVI journey explicitly around this tension — the validation-versus-learning paradox — and the resolution it and the regulators have converged on is to lock the model at validation: freeze the weights, validate the frozen artifact like any other piece of GMP software, and only ever update it through a formal, pre-planned change-control process [1][6].

The regulatory frame around this is now sharp. The FDA's risk-based credibility framework — its seven-step approach for establishing the credibility of an AI model for a given context of use, explicitly covering the manufacturing phase where AI output affects product quality — scales the evidence you must bring to the consequence of the decision [7]. The draft EU GMP Annex 22, the first manufacturing-specific AI rule, goes further for critical applications: it permits only static, deterministic models and excludes dynamic, continuously-learning, probabilistic, and generative AI from critical GMP use, demanding a model locked at validation governed by a predetermined change-control plan [8][9]. And the consequences of ignoring the human-review boundary are no longer hypothetical: in April 2026 the FDA issued its first AI-citing cGMP warning letter, to a firm that used AI agents to generate specifications and production records without quality-unit review [10].

So the paradox is managed, not solved. Locking the model buys validatability at the cost of the very adaptability that motivated ML; every improvement now costs a re-validation. The genuinely open questions are the ones the predetermined-change-control-plan concept only gestures at: how do you pre-specify a model's allowed evolution tightly enough to validate yet loosely enough to be useful; how do you detect that a locked AVI model has silently decayed (a new glass supplier, a new lighting bulb, a new particulate type it was never trained on) before it misses a real defect; and how much of the accept/reject judgement can ever responsibly leave the human entirely. The 95%-auto-release number is real and impressive — and the remaining 5% routed to humans is not a rounding error. It is the standing admission that even the strongest production ML case in biomanufacturing keeps a person in the loop on purpose.

What this chapter adds to the model suite

This chapter contributes two modules to examples/platform/ml/, and they are deliberately a matched pair that makes opposite points:

fill_control.py — fill-weight control and a reject model on the real fill_events.csv line. It derives individuals control limits and a dose-accuracy Cpk from the line's own statistics, then trains a class-balanced logistic classifier on the 478-to-2 imbalance to show that the learned threshold only recovers the fixed checkweigh limit while adding false rejects and validation burden. The honest lesson encoded as an assertion: a fixed limit catches every low-fill vial with no over-rejection; ML does not improve detection here.
vision_avi.py — a CNN sketch for automated visual inspection. With no vial images in the dataset, it fixes the architecture and the shape contract (a six-class defect head over a 128×128 grayscale ROI), runs one forward pass with frozen weights in eval() mode to embody the locked-model discipline, and labels every real-world performance number (Amgen ~95% auto-release) as vendor/self-reported.

Together they encode this chapter's two-sided thesis: ML in fill-finish is sometimes the wrong tool (fill weight) and sometimes the strongest tool in the whole book (vision) — and telling the two apart is the skill.

Why it matters

Fill-finish is where machine learning stops being a promise and becomes a routine. The discreteness of the unit, the binary decision, and the visibility of the defect together make this the one operation where a learned model's judgement can — after a multi-year, heavily-validated program — stand on its own in commercial GMP. Get the AVI program right and you detect more real defects while scrapping far less good medicine, which is both a safety win and an enormous economic one. Get the fill-weight problem right by not over-engineering it, and you keep your validation burden where it belongs. And carry the validation-versus-learning discipline through both — locked models, human-in-the-loop routing, predetermined change control — and you have a template for how ML earns trust everywhere else in the plant. Fill-finish is the proof of concept the rest of biomanufacturing's ML ambitions are measured against.

In the real world

The deepest production deployment is Amgen's: roughly 95% of syringes and vials released through automated visual inspection, the result of a multi-year effort built around the validation-versus-learning paradox and direct FDA engagement (vendor/self-reported) [1]. On the equipment side, deep-learning AVI is a real, competitive product market — Stevanato Vision AI, Brevetti CEA (via its Brevetti AI acquisition), Syntegon AIM, Cognex, Antares Vision, and Körber all sell it — with vendor-reported gains of an order-of-magnitude fewer false rejects and substantially higher particle detection than the rule-based machines they replace [4][5]. Lyophilization cycle modeling and soft-sensing of the drying endpoint are mature in cycle development and increasingly in control; CCIT-ML and cleanroom contamination-risk prediction are earlier, in the monitoring cluster where pharma ML is genuinely advancing. What unifies all of it is the regulatory posture now hardening around AI in manufacturing — the FDA credibility framework, draft Annex 22's exclusion of adaptive and generative AI from critical use, and the Purolea warning letter as the enforcement anchor — which together say, unambiguously, that the model may be smart but the medicine's safety still rests on a locked artifact and a human who can be held accountable [7][8][10].

Key terms

Fill-finish — the operation that dispenses bulk drug substance into final containers (vials, syringes), optionally lyophilizes them, inspects them, and releases the passing units as the drug-product lot.
Automated visual inspection (AVI) — camera-plus-software inspection of every container for particulates and defects; the deep-learning version is the strongest production ML case in biomanufacturing.
Convolutional neural network (CNN) — the image-classification model family used for AVI, whose convolutional layers learn local-to-global spatial features (a speck, an edge, a meniscus) that a tabular model cannot.
Defect library — the curated, labelled image set of every defect type and severity, used to train and — critically — validate an AVI model; the real bottleneck of an AVI program.
False reject (over-rejection) — scrapping a good unit; the chronic failure of human and rule-based inspection that deep-learning AVI most reduces.
In-process-control (IPC) checkweigher — the gauge that weighs filled vials to confirm dose accuracy against a typically ±5% spec; a control problem best solved with statistics, not ML.
Capability index (Cpk) — how well a process fits inside its spec; our fill line sits near Cpk 0.84, marginal, which is why it rejects the occasional vial.
Lyophilization (freeze-drying) — freezing then vacuum-subliming a product into a stable cake; the learning targets are product-temperature soft sensing, drying-endpoint prediction, and cycle design-space optimization.
Container-closure integrity testing (CCIT) — non-destructive verification that the vial-stopper seal holds; an emerging ML/anomaly-detection target.
Locked model — a model whose weights are frozen at validation and changed only through a predetermined change-control plan; the regulatory resolution of the validation-versus-learning paradox.
Validation-versus-learning paradox — the contradiction that ML's value is in adapting while GMP's trust requires not changing; managed by locking the model, never fully solved.
Predetermined change-control plan (PCCP) — a pre-specified, pre-validated envelope for how a model may be updated, so improvements do not require ad hoc re-validation each time.

Where this leads

The vials are filled, dried, inspected, and the good ones counted — but no unit ships until the lot is released. The next chapter, QC and Release: MSPC, Real-Time Release, and Predicting the OOS, turns to the gate every batch must pass: the release assays of the hplc_results.csv panel, multivariate statistical process monitoring against the golden batch, real-time release testing that replaces an end-test with a model, and the hardest learning problem of all — predicting an out-of-specification result before it happens, using the one OOS sibling, BATCH-2026-004, as the example.

What this chapter covers​

Why fill-finish is where production ML actually lives​

Fill-weight and dose accuracy: the case for not using ML​

Automated visual inspection: the production case for deep learning​

Why a CNN, structurally​

The real bottleneck: the defect library and the validation​

Anatomy of one AVI accept/reject record​

Lyophilization: soft-sensing the freeze that cannot be rushed​

Container-closure integrity and the cleanroom​

The unsolved part: the validation-versus-learning paradox​

What this chapter adds to the model suite​

Why it matters​

In the real world​

Key terms​

Where this leads​