본문으로 건너뛰기

UF/DF and Drug Substance: Soft-Sensing Concentration and Excipients

📍 Where we are: Part IV · Downstream, Learned — Chapter 16. Polishing chromatography handed downstream a pure pool of antibody, monomeric and charge-correct, but dilute and sitting in the wrong buffer. This chapter learns the final transformation: squeezing that pool to its drug-substance concentration and exchanging its buffer — and producing DS-001, the lot where every release CQA is at last measured.

The molecule is now pure. What it is not yet is finished. After polishing, the antibody is a clean but dilute stream — a few grams per litre, dissolved in the elution buffer of the last column, which is nothing like the buffer a patient's dose needs. The job of the last downstream unit operation, ultrafiltration/diafiltration (UF/DF), is two things at once: push water through a membrane to concentrate the protein up to its drug-substance target (often tens of grams per litre for a modern high-concentration mAb), and wash out the old process buffer, replacing it with the formulation buffer the molecule will live in (diafiltration). When the membrane finally stops, what comes off it is the drug substance — the bulk material the rest of the supply chain treats as the product — and in our running example that is the lot DS-001, deriving from PApool-001 and the whole genealogy behind it.

UF/DF is where the learning lens meets a deceptively simple-looking step, and it is also where the release CQAs we have been chasing since the bioreactor finally get measured. Two numbers govern the operation — how concentrated the protein is, and how much old buffer is left — and neither of them has a fast, cheap, GMP-grade inline assay that is easy to trust. So the chapter is, once more, a soft-sensing chapter: predict the concentration and the excipient state from the cheap signals the skid already carries, predict when diafiltration is done, and flag the excursions that ruin an otherwise-finished batch.

The simple version

Think of reducing a sauce and then changing its seasoning. You boil off water until it is thick enough (that is concentration), then you keep adding fresh stock and boiling it down again, over and over, until the old salty broth is almost entirely replaced by the new one (that is diafiltration). The hard part is knowing two things without stopping to taste: how thick the sauce is right now, and how much of the old broth is still in there. A UF/DF soft sensor is the cook who can read both off the look and feel of the pot — the colour, the way it coats a spoon — so they know exactly when it is done and when something has gone wrong, without ladling out a sample and waiting.

What this chapter covers

  • Framing UF/DF as two coupled soft-sensing problems — concentration (a moving target as water leaves) and excipient/buffer state (a moving target as buffer is exchanged), and why the offline assays for both are too slow to control on.
  • Inline concentration soft sensing — variable-pathlength UV280, refractive index, and Raman, and why a linear Beer-Lambert model is the right tool here, not a deep net.
  • The diafiltration endpoint — turning the textbook exponential wash-out into a learned endpoint predictor, so you stop at the right diavolume instead of a fixed number.
  • Excursion detection — gel-layer/concentration-polarization faults, flux decay, and how a residual against the physics flags a UF/DF run going wrong.
  • The DS-001 node — where the release CQAs (monomer, HCP, the rest of the panel) are finally measured, and how the soft sensors connect to them.
  • A runnable module, examples/platform/ml/ufdf_endpoint.py, and the anatomy of one UF/DF endpoint record.
  • The honest open problem: high concentration breaks the linear assumptions, and the reference is the slowest of all.

UF/DF is two soft-sensing problems wearing one skid

It helps to be precise about the physics before the learning. A tangential-flow filtration (TFF) skid pumps the protein pool across a membrane whose pores pass water and small solutes (buffer salts, sugars) but retain the large antibody. In the ultrafiltration phase you simply remove permeate, so the retained volume shrinks and the protein concentration climbs in proportion to the volume-reduction factor (VRF): concentrate from 8 L to 1.6 L and you have multiplied concentration five-fold. In the diafiltration phase you hold volume constant — adding fresh formulation buffer at exactly the rate permeate leaves — so concentration stays put while the old buffer ions are progressively flushed out and replaced. For an ideal, well-mixed system the residual fraction of the original buffer decays exponentially in diavolumes (DV) — the number of retentate-volumes of fresh buffer exchanged:

C_residual / C_initial = exp(−DV)

so roughly three diavolumes clears about 95% of the old buffer, five clears about 99%, and seven about 99.9%. That clean exponential is the backbone every UF/DF model leans on — and it is exactly the kind of trustworthy physics that, in the small-data regime of bioprocess, you should never throw away in favour of a black box.

The two governing quantities therefore move on different schedules. Protein concentration rises during UF and is then held during DF; the excipient state is unchanged during UF and decays during DF. Both matter for release: the drug substance has a target concentration with a tolerance, and a buffer/excipient specification (residual old buffer washed out, formulation excipients dialled in). And both are awkward to measure live. Concentration is classically measured by pulling a sample and running an offline A280 or a slow protein assay; the excipient/buffer state is measured by osmolality, ion chromatography, or pH/conductivity benchtop checks. Each takes long enough that, by the time the lab reports, the membrane has moved on. That gap — a value that matters now against a confirmation that arrives later — is the same measurement gap that created the titer soft sensor upstream, transplanted to the very last downstream step.

Inline concentration: variable-pathlength UV, refractive index, and why linear wins

The single most important inline measurement in UF/DF is protein concentration, and the production-grade way to get it is variable-pathlength ultraviolet (VPE/VPX) spectroscopy. The chemistry is Beer-Lambert: absorbance at 280 nm is proportional to protein concentration times the optical pathlength, A = ε · c · ℓ. At a few grams per litre a normal 1 cm cuvette works fine, but a high-concentration drug substance at tens of g/L would saturate the detector. The variable-pathlength trick is to shorten the optical path — down to fractions of a millimetre — so the absorbance stays in the linear, readable range no matter how concentrated the retentate gets. Repligen's FlowVPX/FlowVPE is the commercial embodiment of this, now widely deployed inline on TFF skids to read concentration continuously without dilution (the line came to Repligen via its 2025 acquisition of 908 Devices' bioprocessing analytics portfolio) [1] (production, vendor-self-reported).

Two more cheap inline signals carry the same information from a different angle. Refractive index rises almost linearly with dissolved protein and is a robust, drift-resistant secondary; Raman spectroscopy carries protein concentration and excipient identity in one spectrum, which is why it has become the research workhorse for UF/DF monitoring of multiple attributes at once. The learning task is to fuse these into a single calibrated concentration. And here is the crucial methodological point this chapter wants to make loudly: for concentration, the right model is a small linear one, not a deep network. Beer-Lambert is genuinely linear; refractive index is genuinely linear; the physics already tells you the functional form. A PLS or ordinary-least-squares calibration on a handful of grounding samples will match or beat a neural network here, will extrapolate far more safely, and — decisively for GMP — is trivially explainable to a reviewer. Reaching for a deep net on a problem the physics has already linearized is the cardinal small-data sin this whole book warns against. (Deep learning earns its place where the relationship is genuinely nonlinear and the data is rich — high-concentration viscosity, charge-variant pooling — not on a Beer-Lambert line.)

The excipient/buffer state is the harder of the two, because conductivity and refractive index respond to all the ions, not just the old buffer you are trying to remove. A diafiltration soft sensor for the residual old-buffer fraction is therefore a model that disentangles the formulation buffer being added from the process buffer being removed — which is exactly where Raman's chemical specificity (it can tell two buffer species apart) beats a bulk property like conductivity, and where a multivariate model earns its keep.

The diafiltration endpoint: stop at the right diavolume, not a fixed number

A naive UF/DF recipe diafilters a fixed number of diavolumes — "always run 7 DV" — chosen with a generous safety margin so the wash-out is guaranteed even on a bad day. That works, but it is wasteful: every extra diavolume is fresh buffer, processing time, and another window for the membrane to foul or the protein to aggregate at the gel layer. A learning plant instead predicts the endpoint: it watches the residual old-buffer signal decay and stops as soon as the model says the residual is reliably under spec, with a confidence margin.

The endpoint problem is a soft sensor plus a threshold crossing. The features are the inline conductivity and Raman trajectory over diavolumes; the modelled quantity is the residual old-buffer fraction; the endpoint is the first diavolume at which the predicted residual — not a single noisy probe reading — falls under the wash-out specification. Because the underlying decay is exponential, even a simple model fit to the early part of the curve extrapolates the crossing point well, so the endpoint can be called a fraction of a diavolume in advance. The pay-off is concrete: stopping at a model-predicted 3.0 diavolumes instead of a recipe-mandated 7 is more than a factor-of-two reduction in diafiltration buffer and time for that batch, with the residual still demonstrably under spec.

This is also where UF/DF connects to the AI-enhanced continued-process-verification (CPV) work the field is publishing: treating each UF/DF run's trajectory (flux, transmembrane pressure, concentration, conductivity) as a multivariate object and monitoring it batch-to-batch the way MSPC monitors a process, so a run that is drifting away from the golden-batch envelope is flagged early [2] (pilot). The endpoint model and the CPV monitor are two faces of the same trajectory model: one calls the stop, the other says whether the trajectory getting there looked normal.

Excursion detection: when the membrane fights back

UF/DF looks gentle but has a signature failure mode the soft sensors must guard against: concentration polarization and the gel layer. As water is pulled through the membrane, protein piles up against its surface faster than it can diffuse back into the bulk, forming a concentrated, viscous boundary layer. Past a critical flux this gel layer chokes the membrane — flux collapses, transmembrane pressure climbs, and in the worst case protein at the wall denatures or aggregates, quietly seeding the high-molecular-weight species that the SEC release assay will later catch. The excursion is invisible to a single concentration reading; it shows up as a relationship going wrong — flux falling faster than the concentration ramp says it should, or pressure rising without a corresponding gain in concentration.

That makes excursion detection a natural fit for the same residual-against-physics idea used elsewhere in this book. You have a mechanistic expectation — flux versus transmembrane pressure, concentration versus VRF — and you watch the residual between what the physics predicts and what the skid is actually doing. A growing residual is a fault signature: gel-layer formation, a fouling membrane, a failing pump, or a temperature excursion changing viscosity. This is structurally identical to the chromatography trajectory monitoring of the previous chapter — a multivariate trajectory, a learned or physics-based normal envelope, and an alarm on departure — applied to a TFF skid instead of a column. It is also squarely in the human-in-the-loop, advisory category the regulators endorse: the model flags the excursion and a human decides whether to intervene, hold, or investigate.

Hero diagram of the UF/DF learning problem drawn as two coupled soft sensors feeding an endpoint and an excursion monitor: on the left a tangential-flow-filtration skid with a membrane, a feed pump, and inline probes for variable-pathlength UV280, refractive index, Raman, and conductivity; a center top lane showing the ultrafiltration concentration ramp where protein concentration climbs with the volume-reduction factor from the real 22.58 g per L Protein A eluate toward a 50 g per L drug-substance target, read by a linear UV plus refractive-index soft sensor; a center bottom lane showing the diafiltration wash-out where the residual old-buffer fraction decays exponentially in diavolumes, read by a conductivity-plus-Raman excipient soft sensor whose predicted crossing of the wash-out spec defines the endpoint at about three diavolumes; an excursion monitor watching flux versus transmembrane pressure for the gel-layer signature; the three feeding a drug-substance node DS-001 that derives from PApool-001 and carries the released CQAs monomer 98.611 percent and HCP 28.203 ng per mg; an honest banner noting the offline concentration and excipient references are the slowest in the train. The last downstream step, learned: a linear concentration soft sensor reads the ultrafiltration ramp, an excipient soft sensor reads the diafiltration wash-out and predicts the endpoint, and an excursion monitor watches the flux-pressure relationship for the gel layer — together they produce DS-001, the drug-substance lot that finally carries the release CQAs of the running example. Original diagram by the authors, created with AI assistance.

A runnable model: ufdf_endpoint.py

The example module examples/platform/ml/ufdf_endpoint.py builds both soft sensors and the endpoint predictor, grounded in the running example's real numbers. It starts the concentration ramp from the actual Protein A eluate titer of BATCH-2026-001 — 22.58 g/L, read straight from protein_a_summary.csv — and ends at the released drug-substance CQAs read from hplc_results.csv (monomer 98.611%, HCP 28.203 ng/mg). The UF/DF step itself is not in the simulator, so the concentration ramp and the diafiltration wash-out curve are surrogate physics — Beer-Lambert UV, a membrane mass balance, and the textbook exponential decay — clearly labeled illustrative. The soft-sensor method is real; only the trajectory it is fit on is synthetic.

# examples/platform/ml/ufdf_endpoint.py (excerpt)
DS_TARGET_G_L = 50.0 # high-concentration mAb DS target (illustrative)
EXCIPIENT_SPEC = 0.05 # residual old-buffer fraction to clear by diafiltration


def simulate_ufdf(c0: float, n: int = 400) -> pd.DataFrame:
"""Surrogate UF concentration ramp + DF wash-out (illustrative physics)."""
# concentration (UF) phase: c = c0 * VRF as volume is reduced
vrf = np.linspace(1.0, DS_TARGET_G_L / c0, n // 2)
conc = c0 * vrf
# diafiltration (DF) phase: residual old buffer decays as exp(-DV)
dv = np.linspace(0.0, 8.0, n - n // 2)
excip_df = np.exp(-dv)
df = pd.DataFrame({...}) # phase, diavolume, protein_g_L, excipient_frac
eps = 1.42 # mAb A280 extinction (L/g/cm)
df["uv280_AU_per_cm"] = eps * df.protein_g_L + noise # variable-pathlength UV
df["refractive_index"] = 1.3330 + 1.8e-4 * df.protein_g_L + noise
df["conductivity_mS_cm"] = 2.5 + 13.0 * df.excipient_frac + noise
return df


def fit_concentration_softsensor(df):
"""Linear is right here: Beer-Lambert UV + refractive index -> g/L."""
X = df[["uv280_AU_per_cm", "refractive_index"]].to_numpy()
reg = LinearRegression().fit(X, df["protein_g_L"].to_numpy())
...


def predict_df_endpoint(df):
"""Endpoint = first diavolume where the PREDICTED excipient fraction <= spec."""
dfp = df[df.phase == "DF"]
reg = LinearRegression().fit(dfp[["conductivity_mS_cm"]], dfp["excipient_frac"])
under = dfp[reg.predict(...) <= EXCIPIENT_SPEC]
return float(under.diavolume.iloc[0])

Running python platform/ml/ufdf_endpoint.py prints the block below. The starting eluate (22.58 g/L) and the DS-001 release CQAs are real committed dataset values; the soft-sensor R² and the endpoint diavolume are computed on the illustrative surrogate trajectory:

UF/DF starts from the real Protein A eluate: 22.58 g/L -> DS target 50.0 g/L (illustrative)
concentration soft sensor (UV280 + RI -> g/L): R2=0.9997 RMSE=0.1465 g/L, final 49.89 g/L # illustrative
diafiltration endpoint: excipient<= 0.05 reached at 3.02 diavolumes (excipient soft sensor R2=0.9998) # illustrative
DS-001 release CQAs (real): monomer 98.611% HCP 28.203 ng/mg all_pass=True
ASSERT ok: inline UV+RI recover protein concentration (illustrative).

Read the output as the chapter's argument made executable. The concentration soft sensor recovers the protein concentration almost perfectly (R² 0.9997) because the relationship is linear — which is the point: a linear model on a Beer-Lambert problem is not a weakness, it is the correct, explainable, extrapolation-safe choice. The diafiltration endpoint lands at 3.02 diavolumes — far short of a recipe-mandated 7 — because the predicted residual old-buffer fraction has already crossed the 5% wash-out spec, saving more than half the buffer and time the conservative recipe would have spent. And the DS-001 line closes the genealogy: the drug substance carries the real release CQAs, monomer 98.611% and HCP 28.203 ng/mg, all of which pass.

Anatomy of one UF/DF endpoint record

A UF/DF batch does not end with a bare "stopped at 3 DV." Like every artifact in this series, the endpoint is a structured record that ties the stop decision to the trajectory that justified it, the soft-sensor predictions that called it, the specifications it was checked against, and the drug-substance lot it produces. Dissect it the way a manufacturing-science reviewer would before disposition.

Anatomy identity card of one UF/DF endpoint record for the run that produces DS-001: an indigo header naming the model ufdf_endpoint v1 and the lot DS-001 it produces, derivedFrom PApool-001; an input block listing the inline trajectory features variable-pathlength UV280, refractive index, Raman, conductivity, flux, and transmembrane pressure, each tagged inline; a green core block holding the final protein concentration 49.89 g per L against a 50 g per L target, the predicted diafiltration endpoint 3.02 diavolumes, and the residual old-buffer fraction under the 0.05 wash-out spec, all marked illustrative; a constraints row showing the concentration tolerance and the excipient wash-out specification that the endpoint must satisfy; an excursion row holding a flux-versus-transmembrane-pressure residual flag for the gel-layer signature, normal on this run; a CQA-handoff row showing the release results DS-001 carries, monomer 98.611 percent and HCP 28.203 ng per mg, both real and both pass; a violet relationships panel linking the record derivedFrom PApool-001, produces DS-001, feeds formulation and fill-finish, reconciledWith the offline A280 and osmolality references, and retrains when the residual against those references drifts; a caption noting the concentration and excipient references are the slowest in the train. One UF/DF endpoint, fully unpacked: the inline trajectory that fed it, the concentration and excipient soft-sensor predictions that called the stop, the specifications and excursion check it satisfied, the real release CQAs the resulting DS-001 lot carries, and the lineage tying it to PApool-001 and forward to fill-finish — with the honest note that the references that grade it are the slowest in the whole train. Original diagram by the authors, created with AI assistance.

Read top to bottom and the chapter is laid out as fields. The input block is the inline trajectory: the variable-pathlength UV280, refractive index, Raman, and conductivity that feed the soft sensors, plus the flux and transmembrane pressure that feed the excursion monitor — every one of them an inline signal, which is the whole reason a soft sensor is possible here. The green core is the decision: the final concentration against target, the predicted endpoint diavolume, and the residual excipient fraction under spec. The constraints row writes down what the endpoint must satisfy — the concentration tolerance and the wash-out specification — so a reviewer can see why the model stopped where it did. The excursion row carries the flux-pressure residual flag, the gel-layer early warning. The CQA-handoff row is what makes this the release-defining node: the real monomer 98.611% and HCP 28.203 ng/mg results that DS-001 carries forward. The violet relationships panel records lineage: this record derivedFrom PApool-001, produces DS-001, feeds formulation and fill-finish, reconciledWith the offline A280 and osmolality references, and retrains_when those references drift. The record is the DS-001 node modeled in Book 4's ontology — here carrying not just lineage but the soft-sensor predictions, the endpoint, and the CQAs that make it the drug substance.

The unsolved part: high concentration breaks the line, and the reference is the slowest of all

Be honest about why UF/DF soft sensing is harder than the clean R² above suggests. The first difficulty is that the linear assumptions degrade exactly where the drug substance lives. Beer-Lambert is linear at moderate concentration, but at the tens-of-grams-per-litre of a high-concentration mAb the optical, refractive, and especially the viscosity behaviour all go nonlinear — the gel layer thickens, diffusion slows, and the relationship between the inline signal and the true concentration bends. The variable-pathlength trick keeps absorbance readable, but the underlying chemistry near saturation is no longer the tidy line the calibration was fit on. This is the regime where, ironically, a richer model (or a hybrid that bolts a learned correction onto the Beer-Lambert backbone, exactly as the hybrid titer model does) starts to earn its keep — but it is also the regime where data is scarcest, because high-concentration runs are expensive and few. The honest position is that concentration soft sensing is solved in the linear middle and open at the high-concentration edge that modern formulations push toward.

The second difficulty is the slowest reference in the entire train. The titer soft sensor upstream is graded by an HPLC assay hours later; the harvest decision by a downstream HCP result days later. The UF/DF concentration and excipient soft sensors are graded by the release panel — the very CQAs that define the drug substance — which arrive after the full battery of SEC, CEX, HCP, DNA, endotoxin, and bioburden assays. By the time the residual that would expose a drifting UF/DF soft sensor is computable, the drug substance is already made and possibly already moving toward fill. This is the sparse-reference, slow-feedback regime at its absolute extreme: the prediction that matters most (am I at the right concentration and is the buffer washed out?) is graded by the slowest, most expensive ground truth in the process. The practical consequence is that UF/DF soft sensors are run with conservative margins and frequent re-grounding against at-line samples, not turned loose to autonomously call the endpoint on a critical lot — which is precisely the locked-model, human-confirms posture the regulators require.

The third difficulty is transfer and the excursion-label problem. A concentration and excipient calibration is bound to the specific membrane, the specific buffers, the specific protein, and the specific skid geometry it was built on; change the membrane lot or scale the skid and the model is, for regulatory purposes, a new procedure until re-qualified — the same transfer ceiling that haunts every spectroscopic model in this book. And excursions are rare by design (a good process almost never gel-layers), so the excursion detector is trained on an extremely imbalanced dataset with very few positive examples, which is why physics-residual anomaly detection — flagging departure from expected behaviour — is more robust here than a supervised fault classifier that has barely seen a fault.

What this chapter adds to the model suite

This chapter contributes examples/platform/ml/ufdf_endpoint.py to the Book 5 example suite: a standalone module that starts from the real Protein A eluate concentration of BATCH-2026-001, simulates the surrogate UF concentration ramp and DF wash-out, fits a linear concentration soft sensor (variable-pathlength UV280 plus refractive index), fits an excipient soft sensor (conductivity, illustrative), predicts the diafiltration endpoint as the first diavolume where the predicted residual crosses the wash-out spec, and reads out the real DS-001 release CQAs that the drug substance finally carries. It coordinates with — and deliberately does not duplicate — the upstream soft-sensor modules (soft_sensor_pls.py, soft_sensor_deep.py, titer/VCD from Raman) and the harvest module (harvest_endpoint.py): those predict what is in the tank and when to empty it; this predicts when the buffer exchange is done and what concentration the finished drug substance reached. The deliberate methodological choice — a linear model for a Beer-Lambert problem rather than a deep net — is itself the lesson the module is meant to teach.

Why it matters

UF/DF is the last chance to get the drug substance right, and it is the step where everything the previous chapters worked to achieve is finally cashed out into a number. A concentration soft sensor that holds the target without over- or under-shooting means a drug substance that meets its dose specification on the first try; a diafiltration endpoint model that stops at the right diavolume instead of a padded fixed count saves buffer, time, and a window for the membrane to foul, on every batch, forever; an excursion monitor that catches a gel layer before it denatures protein prevents the high-molecular-weight aggregates that would otherwise fail the SEC release assay — and the whole batch with it. None of these models autonomously moves a CQA; they make the most release-critical downstream step observable, defensible, and efficient, while a human and the offline panel keep final authority. Get UF/DF right and DS-001 is the clean, on-spec, on-concentration drug substance the running example needs; get it wrong and you can lose a fully purified, nearly finished batch in the last unit operation, after every expensive step that came before it has already succeeded.

In the real world

The strongest production-grade anchor here is inline variable-pathlength UV for protein concentration: Repligen's FlowVPX/FlowVPE is deployed on TFF skids across the industry to read concentration continuously, in-line, without dilution, as part of a broader downstream PAT and automation stack (the analytics line came to Repligen through its March 2025, roughly $70M acquisition of 908 Devices' bioprocessing portfolio) [1] (production, vendor-self-reported). This is the real, deployed backbone the concentration soft sensor of this chapter sits on — though it is important to be precise: the inline UV instrument is measurement; the model that turns it (plus refractive index and Raman) into a calibrated, multi-attribute, endpoint-calling soft sensor is the part still maturing.

The machine-learning extension of UF/DF is, today, mostly pilot and research, not productized GMP control. The clearest peer-reviewed signals are an Extended-Kalman-Filter-plus-Raman approach to monitoring ultra- and diafiltration in real time [3] (pilot, peer-reviewed-independent) and an AI-enhanced continued-process-verification method specifically for UF/DF that treats the run trajectory as a multivariate object to monitor batch-to-batch [2] (pilot). Inline refractive index for concentration and conductivity for buffer state are universal production practice; the learned fusion of those signals into a multi-attribute soft sensor with a predicted endpoint is the applied research-to-pilot frontier. The broader picture is the one this whole book keeps landing on, and that the ISPE Pharma 4.0 surveys confirm: ML in biomanufacturing clusters in monitoring and human-in-the-loop decision support, not autonomous control of CQAs, and a UF/DF endpoint or concentration soft sensor is squarely in the advisory, human-confirms category that the FDA's 2023 Artificial Intelligence in Drug Manufacturing discussion paper and the draft EU GMP Annex 22 both expect — a locked, validated model supporting a human decision, with a predetermined change-control plan, never silently moving a quality attribute on its own [4][5]. The honest summary: inline concentration measurement is real and deployed; learned multi-attribute soft sensing and endpoint prediction for UF/DF are credible, physics-anchored, peer-reviewed pilots; and none of it autonomously decides when a critical drug-substance batch is finished.

Key terms

  • Ultrafiltration/diafiltration (UF/DF) — the last downstream unit operation: concentrate the purified pool to its drug-substance target (ultrafiltration) and exchange its buffer into the formulation matrix (diafiltration), producing the drug substance.
  • Tangential-flow filtration (TFF) — the membrane geometry UF/DF runs on: feed flows across the membrane surface while permeate passes through, retaining the large antibody.
  • Volume-reduction factor (VRF) — the ratio of starting to retained volume; protein concentration rises in proportion to it during ultrafiltration.
  • Diavolume (DV) — one retentate-volume of fresh buffer exchanged during diafiltration; the residual old buffer decays roughly as exp(−DV).
  • Diafiltration endpoint — the diavolume at which the predicted residual old-buffer fraction falls under the wash-out specification; learned rather than fixed, to avoid over-diafiltering.
  • Variable-pathlength UV (VPE/VPX) — inline ultraviolet concentration measurement that shortens the optical path so high-concentration retentate stays in the linear Beer-Lambert range without dilution.
  • Concentration polarization / gel layer — protein piling up against the membrane faster than it diffuses back, choking flux and risking aggregation; the principal UF/DF excursion signature.
  • Excipient soft sensor — a model mapping inline conductivity and Raman to the residual old-buffer fraction, so the diafiltration endpoint can be called without an offline assay.
  • Drug substance (DS-001) — the bulk purified, concentrated, formulated antibody material; the lot where the running example's release CQAs (monomer 98.611%, HCP 28.203 ng/mg) are finally measured.
  • Beer-Lambert linearity — the genuinely linear absorbance-concentration relationship that makes a small linear model the correct, explainable choice for concentration soft sensing, rather than a deep net.

Where this leads

The drug substance is made: pure, at target concentration, in the right buffer — DS-001, deriving from PApool-001 and carrying the release CQAs. What remains is to turn that bulk material into doses a patient receives. The next chapter, Formulation and Fill-Finish: Computer Vision and the Lyophilizer, enters Part V, where the strongest production ML case in all of QC — deep-learning automated visual inspection of vials and syringes — meets the soft-sensing and control of the lyophilizer, as the drug substance becomes the drug product DP-001.