Machine Learning, Soft Sensors, and Hybrid Models

📍 Where we are: Part V, Chapter 17 — having turned managed data into control with classical statistics, we now reach the frontier: machine learning that predicts what we cannot easily measure and fuses with the physics we already know.

The last chapter, From Data to Knowledge: SPC, Multivariate Analysis, and Continued Process Verification, showed how disciplined statistics turn a flood of process data into decisions — charting one variable at a time with Statistical Process Control (SPC), watching many at once with Multivariate Data Analysis (MVDA), and monitoring every batch forever under Continued Process Verification (CPV). Those tools are powerful, but they mostly describe and flag. This chapter is about tools that predict and learn: machine learning, soft sensors, and hybrid models.

Here is the tension that makes the topic interesting. Some of the things you most want to know during a batch — how much product you have made, how many cells are alive, how much sugar is left to feed them — are slow, expensive, or impossible to measure in real time. Machine learning offers a tempting shortcut: learn the answer from data you can measure cheaply. The promise is real, but so are the limits, and biomanufacturing has limits all its own.

The simple version

A good doctor does not draw blood every minute to know how you are doing. They watch cheap, fast signals — your pulse, your color, your breathing — and infer the expensive number underneath. A soft sensor does the same for a bioreactor: it watches the cheap signals it has and predicts the costly measurement it lacks. Machine learning is how it learns that inference from past batches. And a hybrid model is the doctor who also knows physiology — combining learned pattern-matching with real biological rules, so the guess holds up even in a situation never seen before.

What this chapter covers

Soft sensors: predicting hard-to-measure quantities from cheap online signals
Machine learning in bioprocessing, plainly — and why "small data" changes everything
Hybrid (gray-box) models that fuse mechanistic knowledge with data
The anatomy of one soft-sensor prediction record, and the lifecycle it lives inside
Validating artificial intelligence under GxP rules
The unsolved challenge of detecting drift when reference data is sparse
An honest reckoning of hype versus reality

Soft sensors: measuring the unmeasurable

Why soft sensors exist: the measurement gap

A soft sensor (also called a virtual sensor or inferential sensor) is not a physical probe. It is a piece of software that estimates a quantity you cannot easily measure directly, using other signals you can [6]. The idea comes from the wider process industries — oil refining, chemicals — where Kadlec and colleagues laid out the now-standard recipe: gather historical data, clean it, train a model to map cheap inputs to the expensive target, then run that model live to produce a continuous prediction [6].

In a bioreactor the targets are tantalizing. Titer — the concentration of product (say, antibody) in the broth — is the number the whole business cares about, yet it usually comes from a slow lab assay hours later. Viable cell density (VCD) — how many living cells are working — and the glucose concentration that feeds them are similarly awkward to track minute by minute. The measurement gap is born inside the production bioreactor: the physical run generates a live value operators want to act on now — a setpoint (the target value a control loop is told to hold) for that value only exists once the lab confirms what it should be — but the analytical lab that confirms the true value runs on its own slow clock. A soft sensor predicts these from signals that are available continuously, such as oxygen uptake (living cells consume oxygen, so it tracks how much biomass is working), stirring power (the broth thickens as cells multiply, so the motor must work harder), or a spectroscopic reading — closing the gap between the moment a value matters and the moment the lab can confirm it.

This connects directly to Raman spectroscopy, the optical fingerprinting technique we met as a PAT (Process Analytical Technology) tool back in Chapter 4. A Raman probe acquires near-continuously, with individual scans averaged over roughly a minute or two for an acceptable signal-to-noise ratio, so a fresh spectrum lands about once a minute. But a raw spectrum is not a glucose number; it must be turned into one by a chemometric model (statistics applied to chemical spectra). That model is a soft sensor: cheap, fast spectra in, the expensive concentration out [6].

Soft sensors are not only an upstream story. The measurement gap is just as real downstream, where a soft sensor often watches a far simpler signal: a column's UV A280 absorbance trace (ultraviolet light absorbed at 280 nm, which proteins absorb in proportion to concentration) or in-line conductivity. In Protein A capture — the first purification step, where the antibody is pulled out of the harvest onto an affinity resin — the operator must decide in real time when the eluting product peak begins and ends, because pooling too early or too late drags impurities into the pool or throws away product. A model that reads the live UV and conductivity traces and predicts the pool's purity, or the optimal pool start/stop, is a downstream soft sensor doing exactly what the Raman titer model does upstream. The same logic governs the UF/DF concentration-and-buffer-exchange step that makes the final drug substance: an in-line conductivity probe is the live proxy that tells the operator how many diavolumes of buffer exchange remain, and an in-line UV or refractive-index reading is the concentration proxy confirmed only later by a slow lab assay — the identical cheap-signal-now, expensive-confirmation-later asymmetry, just on a purification skid instead of a bioreactor.

Why bioprocesses break the data-science rulebook

Bioprocesses make soft sensors unusually hard to build. Brunner and colleagues catalogue why: batches vary in length, a run passes through distinct phases — an early growth phase where the cells multiply, then a production phase where they slow down and make product — so the same cheap signal can mean different things at different times, and the few probes that exist can drift or fail mid-run [7]. A soft sensor that quietly trusts a faulty input can be worse than no sensor at all, so fault tolerance — knowing when not to believe yourself — is part of the job, not an afterthought [7]. This is why the prediction record a real soft sensor emits is never a bare number; it carries an input-quality status so that a faulty probe forces the estimate to be flagged rather than trusted, exactly as we will see when we dissect that record below.

Machine learning, plainly

The supervised-learning recipe: Raman spectra to titer

Machine learning (ML) is software that improves its predictions by finding patterns in examples, rather than following rules a human wrote out by hand. Two broad kinds matter here. In supervised learning, you train on examples that come with the right answer attached — past batches where you logged both the cheap signals and the measured titer — so the model learns the mapping between them. A titer soft sensor is supervised learning. In unsupervised learning, there are no answers to copy; the algorithm groups or simplifies the data on its own — for instance, clustering batches into "behaved normally" and "drifted oddly," a cousin of the multivariate monitoring from the last chapter [5][9]. PLS, introduced below, is the workhorse of spectroscopic soft sensing, but it is one method among several: bioprocess soft sensors are also built with Gaussian process regression (prized in this small-data field because it natively reports a confidence interval, the kind we dissect later in this chapter), random forests and gradient boosting, support-vector regression, and small neural networks [5][9]. A beginner does not need to recognize each of these names to follow this chapter — they are simply alternative engines for the same supervised mapping; the companion ML/AI book compares these model families and how to choose among them in Models and Validation.

The supervised recipe for a Raman-to-titer model is concrete. A raw spectrum of more than a thousand intensity channels is first preprocessed — baseline-corrected and scatter-normalized — then compressed by partial least squares (PLS) into a handful of latent components: a few summary numbers that PLS builds by blending the thousand raw channels, keeping only the combinations that move up and down together with concentration (that is what "co-varies" means) and discarding the rest, because a thousand channels and only dozens of batches would otherwise overfit. The model fits a frozen vector of coefficients — fixed numbers, locked in once training ends, one weight per latent component — on paired history, and at deployment those coefficients turn each new spectrum into a single predicted titer. The companion implementation in Open-Source Bioprocess Data Systems shows this exact pipeline as runnable code — a PLS soft_sensor.py that maps Raman to titer (the served full-repo path additionally logs the fitted model to MLflow, the open-source experiment- and model-tracking tool, for governance; see the open-source analytics chapter). The data-point we dissect in this chapter is precisely what that code persists.

ML has been applied across the whole bioprocess workflow — selecting cell lines, optimizing media, predicting scale-up, and monitoring and controlling production [5]. The reviews that survey this work are enthusiastic. They are also unusually frank about a problem most ML textbooks never face.

caution

Most celebrated machine learning lives in a world of big data — millions of photos, billions of words. Biomanufacturing lives in a world of small data. A single batch can cost weeks and a fortune, so a process team may have only dozens of complete runs to learn from, not millions [5][9]. A model that is data-hungry will simply starve, or worse, overfit — memorize the handful of batches it saw and fail on the next one.

How does anyone know a model has not overfit? The standard test is cross-validation: hold part of the data out of training, then check the model on the part it never saw. In bioprocess the catch is that samples within a single run are correlated, so a naive random split leaks information and reports an optimistically low error. The discipline is to hold out whole batches — leave-one-batch-out or grouped cross-validation — and report the prediction error on those held-out runs as RMSEP (root-mean-square error of prediction — the typical size of the prediction miss, in the same units as the target, g/L of titer here, so smaller is better) and the cross-validated Q² (the cross-validated coefficient of determination — how much of the real run-to-run variation the model explains on batches it never trained on, on a scale up to 1.0, so closer to 1 is better). Those two numbers are the questions to ask a vendor or auditor: how was the model cross-validated, and did you hold out whole batches?

This small-data, high-cost-experiment reality is the binding constraint of bioprocess ML, and it is exactly what motivates the next idea [9].

Hybrid models: physics plus data

Why physics acts as a guardrail

If you have very little data, the smartest move is to stop asking the data to learn everything from scratch. We already know a great deal about how bioreactors behave — mass balances, reaction kinetics, the basic arithmetic of cells consuming sugar and producing protein. That knowledge is a mechanistic (or first-principles) model: equations derived from physics and chemistry, not from data.

A hybrid model — also called a gray-box or semi-parametric model — combines the two. It keeps the trustworthy mechanistic backbone and uses a machine-learning component only for the parts we cannot write down cleanly, such as how cell growth rate depends on a messy combination of conditions [1]. The name "gray-box" sits deliberately between a transparent white box (pure equations) and an opaque black box (pure ML). Von Stosch and colleagues' survey gives the field its taxonomy: structurally, the data-driven and mechanistic parts can sit in series or in parallel, but in every case the physics constrains what the data is allowed to conclude [1].

A hybrid (gray-box) model: a mechanistic backbone supplies known physics while a machine-learning component covers what is hard to write down, and the physics keeps the learned part honest. Original diagram by the authors, created with AI assistance.

Why does this fit data-scarce bioprocesses so well? Because the mechanistic part contributes knowledge the data never had to supply, the ML part has far less to learn, so it can succeed on a handful of batches where a pure black box would fail [1][3]. The evidence is concrete: for mammalian-cell processes making therapeutic proteins, Narayanan and colleagues showed that hybrid models predicted process behavior more accurately than either a purely mechanistic or a purely data-driven model on its own [8][1][3]. Hybrid modeling is not a compromise between two methods; in the small-data regime of bioprocesses, it often outperforms both on their own.

Diagram of a hybrid soft sensor: a mechanistic physics backbone and a learned component both fed by cheap online signals, with the physics constraining the learned part to keep the prediction reliable on unseen conditions A soft sensor that survives the unseen: mechanistic physics keeps the learned model honest. Original diagram by the authors, created with AI assistance.

It is also a practical enabler of the regulatory frameworks this book keeps returning to. An expert panel convened by von Stosch and colleagues argued that hybrid models are well suited to Quality by Design (QbD) and PAT, because a model that respects physics generalizes more safely across the design space — the proven region of operating conditions that reliably yields acceptable product — than one that has only ever seen a few points inside it [2]. This is the same QbD logic that ICH Q8(R2) (Pharmaceutical Development) and ICH Q9(R1) (Quality Risk Management) formalize: define a design space, and manage risk by staying inside it. There is a second regulatory hook worth naming, because it is the one the previous chapter already raised: a Raman chemometric model that reports a quality number is itself an analytical procedure, so it falls under ICH Q2(R2) (Validation of Analytical Procedures) and ICH Q14 (Analytical Procedure Development), both 2023, for its validation and lifecycle management [15][16]. A drifting model can masquerade as a drifting process, which is exactly why the model, not just the bioreactor, has to be validated and monitored. The book Hybrid Modeling in Process Industries collects the theory and cross-industry case studies behind these claims [3].

The prediction record and its lifecycle

Anatomy of a soft-sensor prediction

So far we have talked about soft sensors as ideas. But the data-point a deployed soft sensor produces is a concrete, structured record — and like every artifact in this book, its value comes from what travels alongside the number, not the number alone. When a Raman titer model fires, it does not just emit 3.8; it emits a prediction record that ties the estimate to the spectrum it came from, the model that produced it, the uncertainty around it, and the slow reference value that will eventually grade it.

One prediction is a whole record: the spectrum that fed it, the latent components and frozen coefficients that transformed it, the estimate with its confidence interval, and the delayed reference assay that will eventually reveal its residual. Original diagram by the authors, created with AI assistance.

Read the card top to bottom and the chapter's whole argument is laid out as fields. The input rows are the cheap, fast signal: a timestamp, the raw input_spectrum of more than a thousand intensities, the preprocessed (baseline- and scatter-corrected) version, the latent_components that PLS extracts, and the frozen coefficients that do the mapping. The green core is the prediction proper — titer_predicted_g_L paired with a confidence_interval, a model_version pinned to an exact MLflow run, and a drift_flag. The reconciliation rows hold what makes a soft sensor honest: the reference_value from the offline HPLC (high-performance liquid chromatography) assay — the slow, accurate lab method — that arrives hours later, the residual between prediction and reference, and an input-quality status. The violet relationships panel records where the data-point came from and where it goes — trained_on paired history, reconciled_with the reference assay record, feeds the CPV chart and the feed-rate controller, and retrains_when accumulated residual crosses a threshold.

That structure is not academic. It is exactly the row a real system persists: the open-source companion stores this prediction as a tracked artifact whose schema mirrors these fields, so the same model_version and confidence_interval you see here are columns you can query downstream (see the open-source analytics chapter). And it is contextualization — the discipline of Chapter 2 made literal — a prediction is useful only because it carries its own provenance, the direct descendant of that chapter's six-fields identity card, just as the open-source contextualization layer attaches identity to every raw measurement.

The record as a semantic statement, not just a database row

Those fields are provenance, and provenance has a standard shape. Write the same record as RDF (the Resource Description Framework — the web standard for stating facts as subject-predicate-object triples, where each fact links a thing to another thing or to a value) and it stops being a private schema and becomes a statement any system can read. The relationships panel maps almost one-for-one onto PROV-O (the W3C provenance ontology — a shared vocabulary for "what was derived from what, by which activity, using which agent"): the prediction is a prov:Entity that prov:wasDerivedFrom the spectrum, prov:wasGeneratedBy the inference run, and prov:wasAttributedTo the locked model version. The book's ontology companion models exactly this kind of genealogy spine, where a single transitive derivedFrom edge roots every artifact in its origin — here the spectrum, the training history, and the model run:

# one prediction as RDF — the relationships panel made into triples a machine can walk.
@prefix bp:   <https://example.org/bioproc#> .
@prefix prov: <http://www.w3.org/ns/prov#> .

bp:PRED-CCP001-20260321T0900 a bp:SoftSensorPrediction ;
    bp:titerPredictedGperL    3.8 ;
    bp:confidenceLow          3.5 ; bp:confidenceHigh 4.1 ;   # the 95% interval
    bp:inputStatus            "OK" ;                          # input-quality flag
    bp:driftFlag              false ;
    prov:wasDerivedFrom       bp:SPECTRUM-CCP001-20260321T0900 ;
    prov:wasGeneratedBy       bp:INFERENCE-RUN-44218 ;
    prov:wasAttributedTo      bp:raman_titer_pls-v3.2 ;       # the locked MLflow run
    bp:reconciledWith         bp:HPLC-DS001-20260321 .        # the slow reference, later

This is the same continuant-versus-occurrent split the ontology book leans on: the prediction and the model are continuants (things that persist and bear values), while the inference run and the reference assay are occurrents (activities that happen and are over) — keeping them as separate nodes is what lets you ask "which predictions did this model version make?" without conflating the model with the act of running it. The payoff is that the chapter's whole argument — which model produced this number, from which spectrum, graded against which reference? — becomes a one-line competency question (a plain-English question the data must be able to answer, used as a pass/fail acceptance test) in SPARQL (the standard query language for RDF, what SQL is for tables):

# CQ: for every soft-sensor prediction, name its model version, its source spectrum,
#     and the reference assay that will grade it. Provenance is a query, not a guess.
PREFIX bp:   <https://example.org/bioproc#>
PREFIX prov: <http://www.w3.org/ns/prov#>
SELECT ?prediction ?model ?spectrum ?reference WHERE {
  ?prediction a bp:SoftSensorPrediction ;
      prov:wasAttributedTo ?model ;
      prov:wasDerivedFrom  ?spectrum ;
      bp:reconciledWith    ?reference .
}

The fault-tolerance rule from earlier in the chapter — a faulty probe must force the estimate to be flagged rather than trusted — is itself a constraint a graph can enforce closed-world with a SHACL shape (the Shapes Constraint Language — the standard that validates that graph data has the required structure, treating a missing or wrong required field as a failure now, the way the ontology book's release gate treats a missing sterility test as a failed lot). The shape below requires that every prediction carry an inputStatus drawn from a controlled set — so a prediction with no quality flag, or one whose status was free-typed off the allowed vocabulary, fails validation instead of slipping downstream where a consumer would read it as trustworthy:

# shapes.ttl — a prediction with a degraded input must be flagged, not trusted (closed-world).
bp:PredictionShape a sh:NodeShape ;
    sh:targetClass bp:SoftSensorPrediction ;
    sh:property [ sh:path bp:inputStatus ;
        sh:minCount 1 ; sh:in ( "OK" "DEGRADED" "FAULT" ) ;
        sh:message "Every prediction must carry an input-quality status." ] ;
    sh:property [ sh:path bp:titerPredictedGperL ;
        sh:datatype xsd:float ; sh:minCount 1 ; sh:maxCount 1 ] .

Modeled this way, the same fields that make the record queryable in a relational store also make it interoperable across systems and reasoners — which is precisely what the FAIR principles ask of a model's training history, and what makes a prediction's provenance auditable rather than buried in a vendor's proprietary blob.

Where the prediction lands: standards at the seam

The prediction record does not float free; it has to flow into the same standards-bound plant systems every other process value uses, or it cannot be charted next to the setpoints it tracks. The Raman analyzer publishes its live reading over OPC UA (Open Platform Communications Unified Architecture — the vendor-neutral industrial protocol that carries a tag's value together with its quality flag, timestamp, and engineering unit), so the inputStatus field above is not an invention of the soft sensor — it is the OPC UA status code the probe already emits, carried forward into the prediction. When the prediction is bound to a batch and a piece of equipment, that binding follows ISA-95 / B2MML (the manufacturing-operations data model and its XML serialization — the standard that says a measurement belongs to a specific material lot, made on a specific equipment unit, during a specific operation), the same model the plant information systems chapter and the ISA-95 architecture chapter build on. So a glucose prediction is not a loose number; it is an ISA-95-contextualized value tied to BATCH-2026-001 on bioreactor BR101, with an OPC UA quality flag, that the historian can align against the physical feed setpoint — exactly the standards backbone that lets a model's output be trusted and traced like any qualified instrument reading.

Model drift, continuous retraining, and the lifecycle

A soft-sensor prediction is not a one-shot event; it lives inside a loop. The model is trained once on paired history, then predicts continuously — but the only way to know whether those predictions are still right is to compare them against the slow reference assay and watch the residual. When that residual drifts, the loop must close.

The forward path is fast and the return path is slow, and that asymmetry is the whole difficulty. Predictions stream out about once a minute; the ground truth that could correct them trickles back hours later. Holding the residual record, the drift threshold, and the retrain trigger under formal change control is where this loop meets the governance rules of Chapter 12 — a retrained model is a new validated object, not a silent in-place edit. The dedicated treatment of drift detection, the locked-model and PCCP retraining lifecycle, and the validation paradox of a model that keeps learning lives in the companion ML/AI book, in MLOps and Lifecycle.

Validating AI under GxP

What makes AI harder to govern than ordinary software

A model that helps you understand a process is one thing. A model that decides something about a medicine — that releases a batch, sets a feed rate, or stands in for a lab test — is a regulated object, and that changes everything. GxP is the umbrella term for the "Good Practice" rules (Good Manufacturing, Laboratory, and Clinical Practice, among others) that govern medicine making. Under GxP, you cannot simply deploy a clever model; you must validate it and keep it trustworthy for its whole life.

Three difficulties make AI harder to govern than ordinary software. The first is model drift: the real process slowly changes — a new raw-material lot, an aging probe, a seasonal shift — until the world no longer matches the data the model learned from, and its predictions quietly decay. The second is explainability: a black-box model can be accurate yet unable to say why, which is uncomfortable when a regulator asks you to justify a decision about a human medicine. The third is the validation question itself: a model that keeps learning after deployment is a moving target, and traditional one-time validation was never designed for something that changes.

The U.S. FDA has begun to map this territory. Its 2023 CDER discussion paper, Artificial Intelligence in Drug Manufacturing, first laid out the open questions for AI under cGMP (current Good Manufacturing Practice — the FDA's binding manufacturing rules, deliberately written to evolve with the state of the art) — how to manage the data a model is trained on, how to validate and re-validate models, and how to apply risk-based expectations so that a model touching a critical decision faces more scrutiny than one that does not [4]. The frontier then moved. In January 2025 the FDA issued draft guidance, Considerations for the Use of Artificial Intelligence to Support Regulatory Decision-Making for Drug and Biological Products, that turns those questions into a concrete framework [10]. Its central idea answers the practitioner's question "how much scrutiny does this model need?": you first state the model's context of use (COU) — exactly what it is used for and what it influences — and then run a seven-step risk-based credibility assessment in which model risk is the product of model influence and decision consequence. A soft sensor that merely advises an operator sits low on that scale; one that releases a batch sits high and must earn far more evidence. Crucially for this chapter, the framework includes a lifecycle-maintenance step: the credibility of a deployed model has to be maintained as conditions change — the regulatory codification of exactly the drift-monitoring loop this chapter describes. So while the framework is a draft and not yet a final rule, it is no longer true that there is nothing to point at: a draft credibility framework now exists [10].

This is where the Computer Software Assurance (CSA) thinking from Chapter 11 earns its keep. CSA's core move — spend your validation effort in proportion to risk, and lean on continuous evidence rather than one heroic up-front test — is exactly the posture a learning model demands. An AI soft sensor needs ongoing assurance: monitoring its predictions for drift, defining when it must be retrained, and documenting that whole lifecycle, much as CPV monitors a process forever [4]. Industry now has dedicated artifacts for this, not just generic CSV: the second edition of GAMP 5 (2022) added Appendix D11 on AI/ML, and ISPE's GAMP Guide: Artificial Intelligence (2025) is a purpose-built validation playbook for exactly the learning models this chapter describes [13][14]. "Validate your software" has, in other words, grown a chapter titled "validate your model."

The deployment specifics are where this becomes a manufacturing problem, not just a data-science one. A validated soft sensor does not run on a laptop; it is installed into the plant's control fabric, so the model — like any computerized system — earns its own IQ/OQ/PQ (Installation, Operational, and Performance Qualification — proof that the model was installed right, runs right against test spectra, and performs right against held-out batches). The model's prediction is wired into the MES (Manufacturing Execution System) and historian so that its output is captured against the batch record under the same audit trail as a physical probe — a glucose prediction that adjusts a feed rate is a control action the batch record must show. And there is a uniquely biomanufacturing twist on scale-up and tech transfer: a chemometric model is scale-dependent. A Raman calibration built in a 10 L development bioreactor sees a different optical path length, probe geometry, and background than a 2000 L production vessel, so transferring the soft sensor to commercial scale is not a copy-paste — it requires calibration transfer (statistical methods that map spectra from the source instrument onto the target's response) and a formal re-validation at the receiving scale, treated under change control exactly like transferring any analytical method. A model that was accurate at pilot scale and was never re-qualified at manufacturing scale is, in regulatory terms, an unvalidated model running on product — the soft-sensor analogue of skipping process validation after a site transfer.

The same expectations apply on both sides of the Atlantic, and Europe has moved fastest of all. In the United States, a model that touches a regulated record falls under 21 CFR Part 11 (electronic records and signatures); in the European Union, EU GMP Annex 11 (Computerised Systems) governs the validation of computational systems, including the models embedded in them. In July 2025 the European Commission released a much-expanded draft revision of Annex 11 (growing from roughly five pages to around nineteen, with new coverage of audit-trail review, cloud and SaaS providers, and AI/ML) and — for the first time — a brand-new draft Annex 22, Artificial Intelligence, the first GMP annex written specifically for AI/ML in regulated manufacturing, governing exactly the soft sensors this chapter is about, with both finals expected in 2026 [12][11]. Both Annexes demand that a system be validated for its intended use, kept under change control, and audit-trailed throughout its life — which, for a model that keeps learning, means the validation never truly ends. The discipline of treating each prediction as part of a monitored statistical population is the same one that drives Continued Process Verification: a soft sensor under GxP is simply a process that must be charted forever.

The unsolved challenge: model drift in sparse-reference regimes

It would be dishonest to end this chapter on a note of solved confidence. The hardest open problem in the soft-sensor data flow is not building the model — it is knowing, in real time, whether the model is still right.

Recall the asymmetry from the lifecycle figure. A Raman soft sensor predicts about once a minute, but the offline HPLC assay that could confirm it returns only every few hours, and sometimes only once or twice per batch. This is the sparse-reference regime: predictions are dense, ground truth is sparse, and the residual that would expose a developing bias can only be computed when the lab finally reports. Between reference points, a soft sensor that has begun to drift — because of a new raw-material lot, an aging probe, or a subtle shift in cell behavior — looks exactly like one that is working perfectly. The drift_flag on the prediction record is, by construction, a lagging indicator: it can only turn true once enough slow reference data has accumulated to prove the fast predictions wrong.

This makes real-time degradation and bias genuinely hard to detect without ground truth, and the field knows it. The FDA's 2023 CDER discussion paper Artificial Intelligence in Drug Manufacturing names exactly this difficulty — how to monitor and re-validate models whose performance can decay silently after deployment — as an open question for AI under cGMP [4], and the 2025 draft guidance answers it only with a process — a lifecycle-maintenance plan that mandates ongoing credibility monitoring — not a way to see the drift before the slow reference data arrives [10]. Brunner and colleagues' critical review reaches the same place from the engineering side: fault tolerance for bioprocess soft sensors, including the detection of slow degradation, remains an unsolved design problem rather than a checkbox [7]. Approaches exist — uncertainty estimates like the confidence interval on the prediction card, physics-based hybrid guardrails that flag implausible outputs, and statistical drift tests on the residual stream — but none of them substitutes for the truth a reference assay provides. Until ground truth is cheaper or faster, the honest answer is that a deployed soft sensor must be distrusted on a schedule: monitored, periodically reconciled, and retrained under change control, with the standing assumption that it is wrong until the slow data proves otherwise.

Why it matters

Every promise in this chapter rests on one foundation: data. A soft sensor is only as good as the batches it learned from. A hybrid model still needs clean, contextualized measurements to fit its data-driven part. And an AI under GxP cannot be validated unless its training data is itself trustworthy — attributable, complete, and well-described, the same ALCOA+ properties (Chapter 9) that make any batch record trustworthy [4]. It is also why the FAIR principles (Findable, Accessible, Interoperable, Reusable — Chapter 14) matter here in the most literal way: a model can only learn from history it can find, access, combine, and reuse, so the same findable, well-described, reusable data the rest of the book has fought for is precisely the raw material a soft sensor is trained on. The reviews that survey ML in bioprocessing keep landing on the same conclusion: the binding constraint is rarely the algorithm. It is the quantity and quality of the data available to feed it [5][9]. This is the throughline of the entire book. The reason to manage data well is not only to satisfy an auditor; it is to make the most advanced tools in the field actually work.

In the real world

The most successful industrial deployments so far have been pragmatic, not flashy. Raman-based soft sensors that hold glucose at a setpoint, and hybrid models used to design experiments and shrink the number of costly runs, are exactly the targeted, physics-anchored applications the bioprocess ML literature recommends over grand black-box ambitions [5][8].

This is not theoretical. A typical glucose soft sensor runs on a process Raman analyzer — for example a Kaiser Optical Systems RamanRxn probe (now part of Endress+Hauser) — that collects a spectrum every minute or two. The spectra and the predictions they produce do not vanish: they land in the same historian and contextualization fabric every other process tag uses, so a model's live output can be charted next to the physical setpoints it was meant to track (the open-source companion bridges exactly this kind of model output into a PI-style historian). A chemometric model turns each spectrum into a live reading the controller can act on instantly: it might report glucose = 4.1 g/L in real time, while the confirming offline assay only returns 3.6 g/L four hours later. The very idea of trading a few costly runs for a model that predicts the rest is the data-efficiency logic of process development, where every experiment is expensive and a hybrid model that shrinks the design-of-experiments burden pays for itself. That few-hours head start is the entire value — by the time the lab result arrives, the feed has already been corrected. The models themselves are ordinary software artifacts: vendor platforms such as Sartorius's SIMCA build and deploy chemometric models through their own runtime (SIMCA-Q / SIMCA-online). More broadly, trained ML models are increasingly serialized in interchange formats like PMML (Predictive Model Markup Language) or ONNX (Open Neural Network Exchange) so that a model fit on one system can be deployed, version-controlled, and audited on another.

Validation frameworks are catching up fast: between the FDA's 2025 draft credibility guidance [10], the EU's draft Annex 22 for AI [11], and dedicated industry playbooks such as GAMP 5's Appendix D11 and the ISPE GAMP Guide: Artificial Intelligence [13][14], the established computerized-system validation thinking we met in Chapter 11 has genuinely extended toward models, so that "validate your software" now reads "validate your model" too. As the broader field moves toward continuous and intensified processing, the dense, continuous data streams it produces make real-time soft sensing both possible and valuable. The frontier is not a robot that replaces the scientist; it is a model that earns its trust the same way every batch record does: through disciplined, defensible data.

Key terms

Soft sensor (virtual / inferential sensor) — software that estimates a hard-to-measure quantity from cheaper signals that are available in real time.
Titer — the concentration of product (such as antibody) in the bioreactor broth.
Viable cell density (VCD) — the number of living, working cells in the culture.
Setpoint — the target value a control loop is told to hold for a process variable (such as glucose).
Chemometrics — statistical methods that turn chemical spectra, such as Raman, into concentration numbers.
Machine learning (ML) — software that improves its predictions by finding patterns in examples rather than following hand-written rules.
Supervised learning — ML trained on examples that already carry the correct answer.
Unsupervised learning — ML that finds structure in data with no answers provided.
Overfitting — when a model memorizes its training examples and fails on new ones.
Cross-validation — the standard test for overfitting: hold part of the data out of training and score the model on what it never saw. In bioprocess, hold out whole batches (leave-one-batch-out / grouped CV), because samples within a run are correlated; report the held-out error as RMSEP (root-mean-square error of prediction — the typical prediction miss, in the target's own units like g/L, so smaller is better) and the cross-validated Q² (how much of the run-to-run variation the model explains on held-out batches, up to 1.0, so closer to 1 is better).
Small data — the regime, typical in biomanufacturing, where experiments are so costly that only a few are available to learn from.
Mechanistic (first-principles) model — a model built from physics and chemistry equations rather than from data.
Hybrid (gray-box / semi-parametric) model — a model that combines a mechanistic backbone with a data-driven component.
Model drift — the gradual decay of a model's accuracy as the real process changes away from its training data. For example, a glucose soft sensor trained on 2023 batches might begin reading 10–15% high after a new raw-material supplier is introduced — a clear signal that it needs retraining.
Explainability — the degree to which a model can justify why it made a prediction.
GxP — the family of "Good Practice" regulations governing medicine development and manufacture.
cGMP — current Good Manufacturing Practice; the FDA's binding manufacturing rules, written so that "current" expectations evolve with the state of the art (which is why a learning model's validation never quite settles).
Quality by Design (QbD) — designing quality into a process from the start by defining a design space and controlling within it (introduced in Chapter 1); a model that respects physics generalizes across that space more safely.
Design space — the proven region of operating conditions that reliably yields acceptable product; staying inside it is how QbD manages risk.
Computer Software Assurance (CSA) — the risk-based validation posture (introduced in Chapter 11) that spends effort in proportion to risk and leans on continuous evidence; the natural fit for a model that keeps learning.
Change control — the formal procedure by which any change to a validated system — including retraining a model — is reviewed, approved, and documented, so the change is itself auditable; a retrained model is a new validated object, not a silent in-place edit.
Partial least squares (PLS) — the chemometric regression that compresses a many-channel spectrum into a few latent components and maps them to a concentration.
Latent components — the few summary numbers a model like PLS builds by combining the thousands of raw spectral channels, keeping only the combinations that move up and down together with the target ("co-vary") and discarding the rest.
Residual — the difference between a soft sensor's prediction and the later reference value; the running error that exposes drift.
Sparse-reference regime — the situation where predictions are frequent but the ground-truth reference assays that could grade them are rare, making real-time drift hard to detect.
RDF / triple — the Resource Description Framework, the web standard for stating facts as subject-predicate-object triples, so a prediction record becomes a machine-readable statement rather than a private database schema.
PROV-O — the W3C provenance ontology; its wasDerivedFrom / wasGeneratedBy / wasAttributedTo vocabulary maps onto a prediction's "from which spectrum, by which run, by which model" lineage.
SHACL — the Shapes Constraint Language, which validates that graph data has the required structure closed-world, so a prediction with a degraded input cannot silently pass downstream as trustworthy.
SPARQL / competency question — the standard RDF query language, and the plain-English question (used as a pass/fail test) it answers — here, "for every prediction, name its model version, source spectrum, and grading reference."
OPC UA — the vendor-neutral industrial protocol that carries a tag's value together with its quality flag, timestamp, and engineering unit; the source of the prediction record's input-quality status.
ISA-95 / B2MML — the manufacturing-operations data model (and its XML serialization) that ties a measurement to a specific material lot, equipment unit, and operation, so a model's output is a contextualized value, not a loose number.
Calibration transfer — the statistical methods that map spectra from one instrument or scale onto another's response, required when a soft sensor built at development scale is transferred to a production vessel and re-validated under change control.

Where this leads

Soft sensors, hybrid models, and validated AI do not live in isolation — they only deliver their value when they plug into a factory that runs and integrates in real time. The next chapter, Real-Time Integration and Pharma 4.0: The Smart, Continuous Factory, pulls every thread of this book together: continuous and intensified processing, Real-Time Release Testing, the Pharma 4.0 vision, and the live data-integration efforts — above all the NIST real-time lab-data proof of concept (the IOF Biopharma / BMIC work) — that demand everything we have built, all at once. We close, honestly, on data spaces and the still-distant dream of the autonomous bioprocess.

What this chapter covers​

Soft sensors: measuring the unmeasurable​

Why soft sensors exist: the measurement gap​

Why bioprocesses break the data-science rulebook​

Machine learning, plainly​

The supervised-learning recipe: Raman spectra to titer​

Hybrid models: physics plus data​

Why physics acts as a guardrail​

The prediction record and its lifecycle​

Anatomy of a soft-sensor prediction​

The record as a semantic statement, not just a database row​

Where the prediction lands: standards at the seam​

Model drift, continuous retraining, and the lifecycle​

Validating AI under GxP​

What makes AI harder to govern than ordinary software​

The unsolved challenge: model drift in sparse-reference regimes​

Why it matters​

In the real world​

Key terms​

Where this leads​