Harvest and Clarification: Predicting the Endpoint
📍 Where we are: Part III · Upstream, Learned — Chapter 12. The production bioreactor ran for two weeks and made the antibody; now the broth is full of cells we must get rid of. This chapter learns the moment to stop the run and the consequences of how it ended.
The previous chapter modeled the inside of the tank: soft sensors reading titer and viable cell density from Raman, closed-loop glucose control, a digital twin forecasting the trajectory. Everything pointed at one decision the models barely touched — when is this batch done? Harvest is the hinge between upstream and downstream, and it is one of the most underrated machine-learning targets in the whole process: a batch held a day too long keeps making antibody but also keeps dying, spilling intracellular junk — host-cell protein (HCP), DNA, lipases — that the entire downstream train then has to claw back out. Harvest a day early and you leave titer on the table. The endpoint is an optimization, not a calendar.
Then comes clarification: separating the antibody-bearing fluid from the cells and debris, usually a centrifuge followed by a train of depth filters, so the next step (Protein A capture) sees clean liquid instead of a slurry. How well clarification goes — how big a filter you need, whether it clogs mid-run — is almost entirely decided upstream, by the cell density and viability the harvest decision produced. This chapter learns three coupled things: the harvest endpoint, the turbidity of what you are about to clarify, and the filter you will need to clarify it. They are the same problem viewed from three angles, and the running example's clarified pool, CLAR-001, is what they collectively produce.
Think of straining a pot of stock. If you let it cook too long, the bones start to break down and cloud the broth — straining gets slow and the filter clogs. Stop too early and you lose flavor. A good cook learns the moment to pull it, and learns to size the strainer to how cloudy the batch looks. A harvest model is that learned instinct: it watches the cheap signals — cell count, how many cells are still alive, the spectrum — and predicts both the right moment to stop and how hard the straining will be, so the kitchen has the right-sized strainer ready before the pot comes off the heat.
What this chapter covers
- Framing the harvest decision as a learning problem — what the target really is (it is not "day 14"), and why upstream signals at the endpoint are the features.
- Turbidity soft sensing — predicting harvest-feed cloudiness (NTU) from VCD, viability, and spectra, the proxy for "how hard will this be to clarify."
- Filter sizing and Vmax prediction — turning a small-scale Vmax/throughput test into a model that sizes the depth-filter area for the full batch, and predicting clog risk.
- The upstream→clarification link — regressing clarification performance and downstream HCP burden on the VCD and viability the harvest produced; the CLAR-001 node.
- A runnable model module,
examples/platform/ml/harvest_endpoint.py, and the anatomy of one harvest-decision record. - The honest open problem: the endpoint label is censored and the clarification feedback is slow.
The harvest decision is an optimization, not a date
A naive plant harvests on a fixed day — "we always pull on day 14." A learning plant treats the endpoint as the argmax of an objective that trades the amount of product against its quality cost. Concretely, let day t of the run carry a viable cell density VCD(t), a viability via(t), and a titer titer(t). Harvesting later raises titer(t) (cells keep secreting) but lowers via(t), and falling viability is the single best leading indicator of the debris and HCP that clarification and capture must remove. The harvest objective is, loosely:
J(t) = recoverable_titer(t) − λ · downstream_burden(t)
where downstream_burden(t) rises as viability falls and lysed-cell content climbs, and λ encodes how expensive that burden is to remove. The endpoint is t* = argmax_t J(t) subject to hard constraints — a viability floor (many platforms will not harvest below roughly 70% viability), a turbidity ceiling the clarification train can handle, and the bioburden and scheduling realities of a GMP suite. Machine learning enters because almost every term in J(t) is a quantity we can only measure slowly or partially: titer(t) lags hours behind in the lab, via(t) comes from a twice-a-day offline count, and downstream_burden(t) is not measured at harvest at all — it only reveals itself days later in the capture pool's HCP result. So the harvest model is really a stack of soft sensors feeding a constrained decision.
In our running example, BATCH-2026-001's offline panel tells the story the model has to read. Over the last three days the viable cell density plateaus and then turns down while viability slides — by the final recorded sample (2026-01-18 18:00) the culture is at 19.66e6 cells/mL at 68.0% viability with a titer of 5.877 g/L [1]. The titer is still climbing; the viability is now below the 70% floor most platforms respect. That tension — more product, worse feed — is the harvest decision, and it is exactly what a model has to weigh.
Turbidity soft sensing: predicting how hard clarification will be
The cleanest single proxy for "how hard will this batch be to clarify" is the turbidity of the harvest feed, measured in nephelometric turbidity units (NTU). High turbidity means lots of cells and debris in suspension, which means a centrifuge working harder and depth filters that clog faster. Turbidity at harvest is driven by total cell mass and, critically, by the dead and lysing fraction — which is why it correlates so strongly with falling viability. If you can predict harvest-feed turbidity a day ahead from the signals you already have, you can pre-stage the right clarification setup instead of discovering the problem when the filter pressure spikes.
The soft-sensor framing is identical to the titer soft sensor but with a different target. The features are the cheap, available signals at and near the endpoint: VCD, viability, the integral of viable cell density over the run (the IVCD, a measure of total productive cell-hours), lactate and ammonia (metabolic stress markers that track lysis), and — where an in-line probe exists — the Raman or capacitance spectrum that already feeds the upstream soft sensors. The target is the offline turbidity of the harvest pool. Because the relationship is monotone but nonlinear (turbidity climbs sharply once viability drops past a knee), a gradient-boosted tree or a small hybrid model usually beats plain linear regression here, while still being defensible on the few dozen batches a process has.
It is worth being precise about what "turbidity" buys you that titer does not. Titer answers how much product; turbidity answers how dirty the feed. The harvest decision needs both, because the optimum is where the marginal product gained by waiting is no longer worth the marginal dirtiness incurred — and dirtiness is what the entire downstream train pays for. A turbidity soft sensor turns that abstract trade-off into a number the day before you have to act.
Filter sizing and Vmax: turning a 50 mL test into a 2000 L decision
Once you have decided to harvest, you must clarify, and the practical question is brutally concrete: how much filter area do I need so the depth filters do not clog before the batch is through? Buy too little area and the filter blinds mid-run, pressure climbs, and you lose product or stop the line; buy too much and you have wasted single-use consumables that cost real money at scale. This is the classic filter sizing problem, and it has a well-established small-scale ritual that machine learning extends rather than replaces.
The ritual is the Vmax test (and its cousin, the Pmax/constant-pressure test). You run a small disc of the actual depth-filter media — tens of milliliters — against a sample of the real harvest feed and record the cumulative volume filtered versus time. Under the classical gradual pore-blocking model, the data linearize: plotting t/V against t gives a straight line whose slope is 1/Vmax, where Vmax is the maximum volume the membrane can ever process per unit area before it fully blinds. You then size the full-scale filter so the batch volume stays comfortably under the area-scaled Vmax with a safety factor. It is elegant, cheap, and standard.
Where does learning come in? Two places. First, the Vmax test itself is a small extrapolation that the gradual-pore-blocking model can get wrong when fouling is not gradual — real harvest feeds often show a mix of pore constriction and cake buildup, and a model fit on the full t/V-versus-t curve (or a short physics-informed model that blends the blocking laws) extrapolates the clog point more reliably than the single-slope reading. Second, and more valuable, you can skip ahead of the bench test entirely: a model that predicts Vmax directly from the upstream state — the VCD, viability, and turbidity at harvest — lets you size the filter before the harvest feed even exists, the day the harvest model says you are about to pull the batch. The Vmax test stays as the confirmatory measurement; the model is what lets you order the right filter area in advance.
The math is worth stating because it clarifies what the model is actually predicting. The gradual-pore-blocking relationship is:
t / V(t) = (1 / Qi) + (t / Vmax)
where V(t) is cumulative filtrate volume, Qi is the initial flux, and Vmax is the asymptotic capacity. A linear fit of t/V against t recovers 1/Vmax as the slope. The learning task replaces the slope-reading with a regression Vmax ≈ f(VCD, viability, turbidity, …) trained across past batches and their bench Vmax results — so the bench test grounds the labels and the model generalizes them to the next batch's upstream conditions.
The upstream→clarification link: where clarification performance is really decided
The deepest idea in this chapter is that clarification performance is overwhelmingly a function of upstream conditions, not of the clarification equipment. The cell density and especially the viability at harvest set the debris load, the turbidity, the Vmax, and — the part that bites days later — the HCP burden the capture step inherits. This is why a chapter on the first downstream step belongs in the Upstream, Learned part: the lever is upstream, even though the consequence is downstream.
Concretely, you can regress each clarification and downstream-burden outcome on the harvest-state features:
- Clarified turbidity / centrate quality — how clean the centrifuge centrate is, as a function of feed VCD and viability.
- Depth-filter Vmax / capacity — area you will need, as above.
- Step yield / recovery — product lost to the filter cake and centrifuge underflow.
- Inherited HCP — the host-cell protein the capture pool starts with, which a low-viability harvest inflates because lysed cells dump their contents into the feed.
That last link is the one that closes the loop to the QC and release chapter. In our campaign the released-batch HCP values are mostly well inside the 100 ng/mg limit — BATCH-2026-001 finishes at 28.203 ng/mg — but the out-of-spec sibling BATCH-2026-004 fails HCP at 128.0 ng/mg [1]. HCP is a downstream-and-release attribute, but its origin is frequently an upstream one: an over-extended harvest of a low-viability culture floods the feed with host-cell protein that no single purification step fully recovers. A model that links harvest viability to inherited HCP is, in effect, an early-warning system for exactly the failure mode BATCH-2026-004 represents — which is why this is not an academic correlation but a deviation-prevention tool. (We are careful here: the dataset records BATCH-2026-004's HCP failure as the real OOS; we do not invent a causal harvest value for it, only the modeling principle that low-viability harvests raise inherited HCP risk.)
The honest framing is risk stratification, not control. You are not going to autonomously move a CQA with this model; you are going to flag, before clarification, that this particular batch's harvest profile looks like the batches that later struggled at HCP — and put a human and a tightened in-process check on it. That is exactly the human-in-the-loop posture the regulatory consensus endorses for ML touching quality decisions.
The first downstream step, learned: a harvest-endpoint optimization, a turbidity soft sensor, and a Vmax filter-sizing model are three views of one problem — predicting, from the upstream state at harvest, both when to stop and how hard the clarification will be — and together they produce the clarified pool CLAR-001 and an early HCP-risk flag for the batches that look like the OOS sibling.
Original diagram by the authors, created with AI assistance.
A runnable model: harvest_endpoint.py
The example module examples/platform/ml/harvest_endpoint.py builds the harvest-state feature table from the simulator's offline assays and trains two coupled models over the campaign: a turbidity / clog-load soft sensor and a Vmax filter-sizing regressor, plus the harvest-objective argmax that turns the trajectory into an endpoint. It is grounded entirely in the committed datasets — offline_assays.csv for the per-batch VCD/viability/titer trajectory and hplc_results.csv for the release-stage HCP that anchors the inherited-burden link — so it runs standalone with no services.
# examples/platform/ml/harvest_endpoint.py
from pathlib import Path
import numpy as np
import pandas as pd
from sklearn.ensemble import GradientBoostingRegressor
from sklearn.linear_model import LinearRegression
from sklearn.metrics import r2_score
DATA = Path(__file__).resolve().parents[3] / "examples" / "datasets"
VIABILITY_FLOOR = 70.0 # platform harvest constraint, % viability
LAMBDA = 0.08 # quality-cost weight in the harvest objective (illustrative)
def harvest_features() -> pd.DataFrame:
"""One row per batch at its last recorded culture sample = the harvest state."""
off = pd.read_csv(DATA / "offline_assays.csv", parse_dates=["sample_time"])
last = off.sort_values("sample_time").groupby("batch_id").tail(1).set_index("batch_id")
# IVCD: trapezoidal integral of viable cell density over the run (productive cell-hours)
ivcd = (off.sort_values("sample_time")
.groupby("batch_id")
.apply(lambda g: np.trapz(g.VCD_e6_per_mL, g.sample_time.astype("int64") / 3.6e12)))
feat = last[["VCD_e6_per_mL", "viability_pct", "lactate_g_L",
"ammonia_mM", "titer_g_L"]].copy()
feat["IVCD_e6_h_per_mL"] = ivcd
# surrogate harvest-feed turbidity (NTU): rises with cell mass and the dead fraction
feat["turbidity_NTU"] = (feat.VCD_e6_per_mL * (100 - feat.viability_pct) * 0.9
+ feat.VCD_e6_per_mL * 6.0)
return feat
def harvest_objective(batch_id: str = "BATCH-2026-001") -> dict:
"""argmax over the trajectory of J(t) = titer(t) - LAMBDA * burden(t), with a viability floor."""
off = pd.read_csv(DATA / "offline_assays.csv", parse_dates=["sample_time"])
g = off[off.batch_id == batch_id].sort_values("sample_time").reset_index(drop=True)
burden = g.VCD_e6_per_mL * (100 - g.viability_pct) # lysed-cell load proxy
J = g.titer_g_L - LAMBDA * (burden / burden.max() * g.titer_g_L.max())
feasible = g.viability_pct >= VIABILITY_FLOOR
J_feasible = J.where(feasible, other=-np.inf)
t_star = int(J_feasible.idxmax())
return {"batch_id": batch_id,
"endpoint_sample": g.sample_id[t_star],
"endpoint_time": str(g.sample_time[t_star]),
"endpoint_titer_g_L": round(float(g.titer_g_L[t_star]), 3),
"endpoint_viability_pct": round(float(g.viability_pct[t_star]), 1),
"last_titer_g_L": round(float(g.titer_g_L.iloc[-1]), 3),
"last_viability_pct": round(float(g.viability_pct.iloc[-1]), 1)}
def fit_turbidity_softsensor(feat: pd.DataFrame) -> dict:
X = feat[["VCD_e6_per_mL", "viability_pct", "lactate_g_L",
"ammonia_mM", "IVCD_e6_h_per_mL"]].to_numpy()
y = feat["turbidity_NTU"].to_numpy()
# leave-one-batch-out: the only honest cross-validation in a 6-batch campaign
preds = np.empty_like(y)
for i in range(len(y)):
m = (np.arange(len(y)) != i)
gb = GradientBoostingRegressor(n_estimators=120, max_depth=2,
learning_rate=0.08, random_state=2026)
gb.fit(X[m], y[m])
preds[i] = gb.predict(X[i:i + 1])[0]
return {"n_batches": len(y), "loo_r2": round(float(r2_score(y, preds)), 3),
"turbidity_range_NTU": [round(float(y.min()), 1), round(float(y.max()), 1)]}
def size_filter(feat: pd.DataFrame, batch_volume_L: float = 2000.0,
safety: float = 1.5) -> pd.DataFrame:
"""Vmax (L/m2) regressed on harvest state -> required depth-filter area for the batch."""
# bench Vmax labels fall as the dead fraction rises (illustrative kinetic surrogate)
feat = feat.copy()
feat["Vmax_L_per_m2"] = 320.0 - 1.6 * (feat.VCD_e6_per_mL * (100 - feat.viability_pct)) ** 0.5
X = feat[["VCD_e6_per_mL", "viability_pct", "turbidity_NTU"]].to_numpy()
reg = LinearRegression().fit(X, feat["Vmax_L_per_m2"].to_numpy())
feat["Vmax_pred"] = reg.predict(X)
feat["area_m2"] = safety * batch_volume_L / feat["Vmax_pred"]
return feat[["VCD_e6_per_mL", "viability_pct", "Vmax_pred", "area_m2"]].round(2)
if __name__ == "__main__":
feat = harvest_features()
print("harvest objective (golden batch):", harvest_objective("BATCH-2026-001"))
print("turbidity soft sensor:", fit_turbidity_softsensor(feat))
print("filter sizing (2000 L, 1.5x safety):")
print(size_filter(feat).to_string())
Running python platform/ml/harvest_endpoint.py prints a block like the following (the soft-sensor R² and the Vmax/area figures are illustrative — they exercise surrogate turbidity and Vmax relationships built on top of the real VCD/viability/titer trajectory, not yet a verbatim committed run output):
harvest objective (golden batch): {'batch_id': 'BATCH-2026-001', 'endpoint_sample':
'BATCH-2026-001-OFF-026', 'endpoint_time': '2026-01-17 18:00:00+00:00',
'endpoint_titer_g_L': 4.221, 'endpoint_viability_pct': 76.8,
'last_titer_g_L': 5.877, 'last_viability_pct': 68.0}
turbidity soft sensor: {'n_batches': 6, 'loo_r2': 0.92, 'turbidity_range_NTU': [...]} # illustrative
filter sizing (2000 L, 1.5x safety):
VCD_e6_per_mL viability_pct Vmax_pred area_m2
BATCH-2026-001 19.66 68.0 254.7 11.78 # illustrative
BATCH-2026-004 ... ... ... ... # lower Vmax -> larger area
Read the harvest-objective line carefully, because it makes the chapter's central point concrete. The last recorded sample sits at 68.0% viability — below the 70% floor — so the constrained argmax does not choose it; it backs off to the previous feasible sample at 76.8% viability and 4.221 g/L. The unconstrained plant that "always pulls last" would harvest a richer but dirtier feed; the constrained model trades a little titer for a feed the downstream train can actually handle. That difference, multiplied across a campaign, is the value of learning the endpoint instead of reading it off a calendar.
Anatomy of one harvest-decision record
A harvest decision, like every artifact in this series, is not a bare timestamp — it is a structured record that ties the recommended endpoint to the state that justified it, the predictions that fed it, the constraints it respected, and the downstream consequence it forecasts. Dissect it the way a manufacturing-science reviewer would before signing off on a harvest.
One harvest decision, fully unpacked: the upstream state that fed it, the constrained endpoint it chose (backing off the sub-70-percent last sample), the turbidity and Vmax it predicted, the viability floor that vetoed the richer feed, the inherited-HCP risk it forecasts, and the lineage tying it to BATCH-2026-001 and the clarified pool CLAR-001 it produces — with the honest note that the true best endpoint can never be observed.
Original diagram by the authors, created with AI assistance.
Read top to bottom and the chapter is laid out as fields. The input block is the harvest state: VCD, viability, titer, IVCD, lactate, and ammonia, each tagged by how it arrived (the twice-a-day offline count versus the in-line spectrum). The green core is the recommendation — the endpoint time, the feasible titer it implies, and the predicted turbidity (NTU), Vmax (L/m²), and filter area (m²) that pre-stage clarification. The constraints row is what makes the record auditable: the 70% viability floor that vetoed the last sample is written down, not implied, so a reviewer can see why the model did not chase the higher titer. The forecast row carries the inherited-HCP risk band that links forward to the release HCP result and the OOS-sibling pattern. The violet relationships panel records lineage: this decision derivedFrom BATCH-2026-001, produces CLAR-001, feeds Protein A capture, reconciledWith the bench Vmax test and the offline turbidity that grade it, and retrains_when the residual against those references drifts. The record is the CLAR-001 node modeled in Book 4's ontology — here carrying not just lineage but the predictions and constraints that produced it.
The unsolved part: the censored label and the slow feedback
Be honest about why harvest ML is harder than it looks. The first difficulty is that the true best endpoint is never observed. For any real batch you harvest at exactly one time and see exactly one outcome; you never get to see what would have happened had you waited a day or pulled a day early. The label t* is censored — counterfactual, not measured — so you cannot simply do supervised regression onto "the right day." Teams work around this with mechanistic forward models (simulate the trajectory under each candidate endpoint and optimize over the simulation), with surrogate labels (regress onto the downstream outcomes you can measure, like inherited HCP and step yield, and let the objective imply the endpoint), or with the kind of constrained argmax shown above. None of these is the same as observing ground truth, and all of them inherit the small-data ceiling: a process has a few dozen batches, each harvested once, so the model is learning the endpoint from a handful of single-point observations.
The second difficulty is feedback latency. The harvest decision's most important consequence — the inherited HCP, the realized step yield, the filter behavior at scale — is not known at harvest. It arrives days later, after capture, after the assays. So the residual that would tell you the harvest model is drifting is one of the slowest in the entire process, slower even than the titer soft sensor's hours-late reference. Between the harvest and the verdict, a harvest model that has begun to mislead looks exactly like one that is working. This is the sparse-reference, slow-feedback regime at its most extreme, and it is why harvest models in practice are advisory: they recommend, a human and the platform constraints decide, and the model is re-graded only when the downstream truth finally lands.
The third, quieter difficulty is transfer. A Vmax model and a turbidity soft sensor are bound to the specific cell line, media, depth-filter media, and centrifuge they were trained on, exactly as a Raman calibration is bound to its probe. Change the filter grade or scale from a 50 mL disc to a full lenticular stack and the model is, for regulatory purposes, a new procedure until re-qualified. The bench Vmax test never goes away precisely because it is the confirmatory measurement that re-grounds the model whenever the hardware moves.
What this chapter adds to the model suite
This chapter contributes examples/platform/ml/harvest_endpoint.py to the Book 5 example suite: a standalone module that builds the per-batch harvest-state feature table from offline_assays.csv, computes the constrained harvest-objective argmax over a batch's trajectory, fits a leave-one-batch-out turbidity / clog-load soft sensor, and regresses a depth-filter Vmax onto the harvest state to size the full-scale filter. It coordinates with — and deliberately does not duplicate — the upstream soft-sensor module (titer/VCD from Raman) and the SPC/MVDA reference scripts: those predict what is in the tank; this predicts when to empty it and how hard that will be. The surrogate turbidity and Vmax relationships are clearly labeled illustrative; the VCD, viability, titer, and HCP figures the module consumes are the real committed dataset values.
Why it matters
Harvest is the single decision where upstream and downstream meet, and it is decided with the least information of any step — at the moment you must act, the most important consequences are still days away. Get it right and the entire downstream train sees a clean, predictable feed: the centrifuge runs smoothly, the depth filters do not blind, capture starts from low HCP, and the release assays pass with margin. Get it wrong — hold a dying culture too long — and you spend the rest of the process, and a chunk of your yield, clawing back the host-cell protein and debris a single over-extended harvest dumped into the feed. Learning the endpoint, the turbidity, and the filter size turns the most consequential blind decision in manufacturing into a defensible, advisory, evidence-backed one. It will not autonomously move a CQA; it will keep you from harvesting your way into the next OOS.
In the real world
The strongest production-grade anchor for learning at harvest is Amgen's deployment at Juncos, Puerto Rico, where SIMCA OPLS models predict harvest titer and other in-process attributes inside commercial GMP drug-substance manufacturing. Amgen engineers report that the harvest-titer model eliminated roughly six hours of harvest idle time and around ten hours of idle time between chromatography columns by letting operators act on a model prediction rather than wait for the lab — a concrete, deployed (production) case of an MVDA model at the upstream-downstream boundary [2]. The honest caveat travels with the headline: this is a first-party, self-reported account from Amgen engineers together with a Sartorius vendor case study (vendor-self-reported / self-authored evidence tier), not an independently audited result, and the specific hour-savings cannot be externally confirmed.
The turbidity-and-Vmax half of the chapter is, today, more pilot and engineering practice than productized ML. Depth-filter sizing via the Vmax/Pmax bench tests is universal production practice, and the gradual-pore-blocking math behind it is textbook filtration theory; the machine-learning extension — predicting Vmax directly from upstream VCD/viability/turbidity so the filter is sized before harvest — is an applied (pilot/research) idea that process-development groups use internally more than vendors sell. The broader regulatory and consensus picture frames where this sits: the ISPE Pharma 4.0 reality is that ML in biomanufacturing clusters in monitoring and human-in-the-loop decision support, not autonomous control of CQAs, and a harvest-endpoint or HCP-risk model is squarely in the advisory, human-decides category that the FDA's 2023 Artificial Intelligence in Drug Manufacturing discussion paper and the EU's draft Annex 22 both expect — a locked, validated model supporting a human decision, never silently moving a quality attribute on its own [3][4]. The honest summary: harvest-titer prediction is real and deployed at at least one major manufacturer; turbidity and Vmax learning are credible, physics-anchored applications still mostly inside process-development groups; and none of it autonomously decides when to harvest.
Key terms
- Harvest endpoint — the chosen time to stop the culture and begin clarification; an optimization trading recoverable titer against the downstream burden a dying culture creates, subject to a viability floor.
- Harvest objective
J(t)— the (illustrative) function being maximized: recoverable titer minus a weighted downstream-burden penalty, evaluated only where the viability constraint holds. - Viability floor — the hard constraint (commonly around 70%) below which a platform will not harvest; the constraint that vetoes the richest-but-dirtiest endpoint.
- Turbidity (NTU) — nephelometric turbidity of the harvest feed; the single best proxy for how hard clarification will be, driven by total cell mass and the dead/lysing fraction.
- Clarification — the first downstream step, separating antibody-bearing fluid from cells and debris (centrifuge plus depth filters), producing the clarified pool (CLAR-001).
- Depth filter — graded-porosity filter media that traps cells and debris; sized so it does not blind before the batch is through.
- Vmax — the maximum volume a filter membrane can process per unit area before fully blinding; read from a small-scale
t/V-versus-tlinearization or predicted from upstream state. - Vmax / Pmax test — the small-scale bench ritual that measures filter capacity on a sample of real harvest feed; the confirmatory measurement that re-grounds any Vmax model.
- IVCD — the integral of viable cell density over the run; total productive cell-hours, a feature for both titer and clarification models.
- Inherited HCP — the host-cell protein the capture step starts with; inflated by a low-viability harvest, linking the upstream endpoint to a downstream release attribute.
- Censored label — the harvest endpoint truth that can never be observed because each batch is harvested exactly once; why harvest ML cannot be plain supervised regression.
Where this leads
Clarification handed downstream a clean, defined pool — CLAR-001, deriving from BATCH-2026-001 and feeding the first purification column. The next chapter, Capture Chromatography: Hybrid Models and Real-Time Pooling, enters the Protein A step, where the feed's HCP burden — set by the harvest decision we just made — meets hybrid mechanistic-plus-ML chromatography models and the real-time pooling decisions that turn a UV trace into a defined capture pool.