Polishing Chromatography: Trajectory Models and Resin Lifetime

📍 Where we are: Part IV · Downstream, Learned — Chapter 15. The last chapter cleared viruses by a validated log-reduction margin and handed us a pool that is safe but not yet pure enough to be drug substance. Now the product meets its final columns. Polishing is where the separation gets hard — where the impurities left to remove are versions of the antibody itself, differing by a single charge or a stuck-together dimer — and where the pooling decision stops trading product against junk and starts trading product against product.

The capture pool PApool-001 was concentrated and far cleaner than the harvest, and viral safety proved a clearance margin on top. But it still carries impurities that capture cannot touch: charge variants of mAb-A (acidic and basic species that differ from the main form by deamidation, C-terminal lysine, or sialylation), aggregates (high-molecular-weight, HMW, species — two or more antibodies stuck together), and fragments (low-molecular-weight, LMW). These are not foreign contaminants; they are the product, slightly wrong. Removing them is the job of one or two polishing columns — typically cation exchange (CEX), anion exchange (AEX), or a mixed-mode resin — run in bind-and-elute or flow-through mode. The polishing pool feeds UF/DF and becomes the drug substance DS-001.

Here the pooling decision becomes its sharpest. On a CEX gradient the acidic variants elute first, the main (target) species in the middle, and the basic variants last — so where you cut the peak directly sets the charge-variant composition of the product, which is a release CQA. Cut too wide and you keep main species but drag in acidic and basic tails; cut too narrow and you protect purity but throw away yield. This is the same locked-guard-band pattern as capture, but the attribute on the line is now CEX_main_pct — and in our running example that number must land between 60 and 80 percent, with the golden batch at 70.686.

The simple version

Imagine sorting a bag of nearly identical coins where a few are very slightly heavier (acidic) or lighter (basic) than the rest, and a few have fused together into doubles (aggregates). You pour them down a tilted chute and they arrive in order — light, normal, heavy — with the doubles lagging behind. Your job is to hold a bucket under the chute and catch only the stream of normal coins: start the bucket a moment too early and you scoop up light ones, stop too late and you catch heavy ones and doubles. A polishing column is that chute; the "pooling decision" is when to start and stop the bucket. And because the chute itself wears out — the slope flattens after thousands of uses until the coins stop separating cleanly — you also need to know when to replace the chute. This chapter learns both: when to catch, and when the chute is too worn to trust.

What this chapter covers

We start where the maturity actually is. The production-grade computational model of a polishing column is mechanistic, not machine-learned — the same general-rate-model lineage as capture, now applied to mixed-mode and ion-exchange polishing (Cytiva GoSilico, the open-source CADET) — and we attribute it as such. Then we place learning where it earns its keep on this step: trajectory / charge-variant pooling, where a learned-but-locked rule on the in-line gradient sets the cut points that govern CEX_main_pct and clip the HMW tail; separating HMW and LMW, where the SEC attributes SEC_HMW_pct and SEC_LMW_pct bound how aggressive the cut must be; and resin lifetime / column-integrity prediction, the slow problem of an asset wearing out, where ML and classical moment analysis decide the cycle-count question — when to repack. The runnable artifact, examples/platform/ml/resin_lifetime.py, models a locked charge-variant pooling window against the running example's real CEX and SEC release values and projects a governed repack cycle from a resin-health trend.

The polishing step, as a set of learnable decisions

A bind-and-elute CEX polishing cycle has the same skeleton as capture — equilibrate, load, wash, elute, strip, clean — but the elution is the whole story. Instead of one sharp product peak releasing on a pH drop, polishing typically runs a shallow salt or pH gradient that pulls the charge variants off the resin in charge order, smearing them across many column volumes. The acidic species, holding less positive charge, let go first; the main species in the middle; the basic species, clinging hardest to the negatively charged CEX resin, last. The chromatogram is therefore not a spike to catch but a trajectory to slice, and three decisions sit on that trajectory:

Where to cut the front (the lead cut). Start collecting too early and the acidic shoulder is in the pool, pushing CEX_acidic_pct up and CEX_main_pct down. Start later and the pool is purer in main species but you have discarded recoverable product.
Where to cut the back (the tail cut). Stop too late and the basic shoulder — and any aggregate riding the tail — enters the pool. Stop earlier and you protect CEX_basic_pct and SEC_HMW_pct at the cost of yield.
When the column is too old to separate cleanly. Every cut above assumes the resin still resolves the variants. As the bed ages over its validated cycle life, plate count falls, peaks broaden and tail, and the charge variants stop separating — until the same cut points that gave 70 percent main species start giving 64. The cycle-count decision — when to repack or replace — is a prediction problem in its own right.

Each is a place where a model reads a cheap in-line signal and produces a decision or a number, and each has a bounded, auditable consequence: you can grade the pool a rule chose against the release assay of what it actually collected. That auditability is why downstream ML — here as in capture — has moved into plants as monitoring and locked rules, never as autonomous control of the cut.

The mature tool is mechanistic — and it now reaches polishing

As with capture, the strongest computational model of a polishing column is a set of partial differential equations, not a fitted network. The general rate model plus a steric mass action (or, for mixed-mode, a multi-component) isotherm describes how each charge variant competes for binding sites as the gradient develops, and a calibrated model predicts the elution order, the peak overlap, and therefore the pool composition any pair of cut points would yield — across conditions it was never run at, with the extrapolation that comes from encoding real physics [1]. This is production technology in CMC process development: Cytiva's GoSilico (ChromX/DSPX) and the open-source CADET are the exemplars, and a peer-reviewed industrial case modeled a mixed-mode polishing step for an antibody end to end — simulation, optimization, and design-space definition — entirely mechanistically [2]. It must be attributed correctly: these are mechanistic models, not machine learning. Calling a calibrated general-rate-model twin "AI polishing" is the same category error this book refused for capture.

Evidence

Mechanistic chromatography modeling of polishing steps (general rate model + steric mass action / multi-component isotherms) is mechanistic, not ML, and is the most mature deployed computational tool here — Cytiva GoSilico (production, CMC process development) and open-source CADET, with a peer-reviewed mixed-mode antibody-polishing case [1][2] (peer-reviewed-independent and vendor documentation). Any vendor headline (yield uplift, lots saved) is vendor-self-reported and must carry that label; the modeling capability is established, the specific savings are not independently audited. The learned layer this chapter builds sits beside the mechanistic twin — it reads the trace, sets a locked cut, and watches the resin age — it does not replace the physics.

Trajectory modeling and charge-variant pooling

The pooling decision on a polishing gradient is, at its core, the same locked-guard-band rule as capture — collect while a monitored signal stays inside a band, divert otherwise — but the band now sits on a charge-ordered trajectory and trades two product-quality attributes against yield. Three things make it harder than the capture pool.

First, the signal that matters is not always the raw UV. UV280 tells you protein is eluting but not which charge variant. The richest in-line signal for charge-variant resolution is Raman (or, increasingly, multi-angle light scattering for aggregate content), and the most ambitious research in this space used a convolutional neural network on Raman spectra to make charge-variant pooling decisions on a CEX polishing step, reporting R² between 0.94 and 0.99 for the predicted charge-variant fractions [3]. That is a genuine deep-learning result on exactly this problem — and it is research, on one separation, not a routine deployment. (Note the contrast with the capture-step Raman work that predicted many quality attributes during Protein A: that one used K-nearest-neighbours, not deep learning, and must not be cited as a deep-learning case [4].)

Second, the rule is learned at design time, then locked. A model trained on historical polishing cycles — each with its CEX charge-variant and SEC aggregate release assays — can learn the lead and tail cut points that best trade CEX_main_pct against recovery while keeping SEC_HMW_pct under its ceiling. But that cut is then frozen and runs as a fixed rule. Letting a model re-draw the cut online, batch to batch, to chase a quality target is precisely the adaptive control of a CQA that EU draft Annex 22 draws its sharpest line against — it requires locked models and a predetermined change-control plan, not live self-modification.

Third, two specs are on the line at once. The cut must satisfy the charge-variant spec (CEX_main_pct in [60, 80]) and the size spec (SEC_HMW_pct under 3 percent, SEC_LMW_pct under 2 percent). Aggregates often ride the trailing edge of the peak, so the tail cut does double duty: it sets CEX_basic_pct and clips HMW. A pooling model that optimizes charge variants while ignoring the aggregate tail can pass CEX and fail SEC.

Grounding it in the running example's real numbers

The polishing pool's job is to take the feed charge-variant composition and enrich it toward the main species. In our running example the released values for the golden batch BATCH-2026-001 are real, read straight from examples/datasets/hplc_results.csv: CEX_main_pct = 70.686, CEX_acidic_pct = 21.551, CEX_basic_pct = 10.452, with SEC_HMW_pct = 1.287 and SEC_LMW_pct = 0.439. Across the six campaign batches, CEX main runs 66.699 to 70.686 and HMW runs 1.086 to 1.719 — comfortably inside both specs, but the OOS sibling BATCH-2026-004 failed on a different attribute (host-cell protein at 128.0 ng/mg against a 100.0 ceiling), not on charge variants or aggregates. That is a useful caution the module makes explicit: a polishing pool that is perfectly in spec on the attributes this step controls tells you nothing about an impurity an earlier step let through.

The module's pooling model treats the elution as a charge-ordered gradient (acidic → main → basic) and a locked centre window — keep fractions 0.18 to 0.88 of the peak — that enriches the main species by partly discarding the acidic lead and basic tail. The keep-fractions are the locked rule's effect on each variant; the batch's acidic/main/basic split is the real measured release value, so the pooled purity the window implies is grounded, not invented.

Separating HMW and LMW: the size dimension

Charge is one axis; size is the other, and the two are measured by different assays. Size-exclusion chromatography (SEC) is the release assay that reports SEC_HMW_pct (aggregates) and SEC_LMW_pct (fragments), and the polishing step is where aggregate content is most actively controlled — capture concentrates everything, including aggregates, and polishing is the chance to leave them behind. Where the charge variants spread along the gradient, aggregates tend to partition to the peak edges: HMW species, being larger and often stickier, frequently trail the main peak, so the tail cut that protects CEX_basic_pct also clips HMW. This coupling is why a one-dimensional "maximize main species" objective is wrong — the cut has to satisfy a vector of attributes.

A learned pooling model that respects this is a small multi-objective problem: choose lead and tail cuts to maximize step yield subject to CEX_main_pct ≥ 60, SEC_HMW_pct under 3, SEC_LMW_pct under 2. In the small-data regime of a real campaign — a handful to a few dozen cycles with full release assays — this is not a job for a data-hungry network; it is a constrained optimization over a few cut parameters, ideally informed by the mechanistic twin's prediction of where each species elutes. The honest division of labour mirrors the rest of downstream: physics predicts the elution profile, a learned/locked rule places the cuts, and the release assays grade the result.

Resin lifetime and the cycle-count decision

The slow, expensive, genuinely hard problem in polishing is time — the resin is not the same on cycle 200 as on cycle 1. A polishing resin is validated for a finite cycle life, and across that life it degrades in measurable ways: the plate count (column efficiency, N) falls, peak asymmetry (As, tailing) rises, back-pressure creeps up, and — the consequence that matters — charge-variant resolution degrades until the locked cut points that gave 70 percent main species start giving less, drifting the pool toward its spec edge. A pooling rule that is perfectly tuned today will slowly mistime itself as the bed ages. The decision this forces is the cycle-count decision: when to repack or replace the column, balancing the cost of an expensive resin against the risk of a drifting CQA.

Two families of method address this, and the honest reading is that they cooperate rather than compete.

Classical, production-grade: moment analysis and transition analysis. The most mature deployed approach is not ML at all. At commercial CDMO scale, Samsung Biologics uses moment analysis and direct transition analysis — computing plate height (HETP), transition width, and asymmetry from a pulse or step injection — for near-real-time column-integrity monitoring of large-scale GMP columns [5]. These are deterministic algorithms on the chromatographic profile, and they are production. They answer "is this column still packed and performing within its qualified envelope?" directly from physics, and they are the baseline any ML approach must beat or augment, not replace.

Data-driven monitoring and projection. On top of (or alongside) moment analysis, ML tracks features of the chromatographic profile across cycles to detect aging before it shows up in yield or quality. A pilot study used on-line PAT with PCA and batch-level modeling to detect Protein A resin aging some 20 to 25 cycles before observable yield decline, with a proposed strategy to extend resin life — a modeled benefit, not a validated GMP outcome [6]. Model-based strategies for resin lifetime optimization and supervision, and feature-mining of chromatographic profiles to monitor column performance, fill out the literature [7][8]. The pattern is consistent: physics (moment analysis) measures current health; a fitted trend projects when health will cross an action limit; the repack itself happens under governed change control, never as a model silently extending its own validated cycle count.

Polishing, learned: a charge-ordered elution trajectory with overlapping acidic, main, and basic variants and an aggregate tail; a learned-but-locked guard band picks the lead and tail cuts that govern CEX main, basic, and HMW at once; and a resin-health trend — anchored by production-grade moment analysis — projects the governed repack cycle. The machine-learning layer sits beside the mechanistic mixed-mode twin, not instead of it. Original diagram by the authors, created with AI assistance.

Building it: locked pooling plus a resin-lifetime projection

The runnable module frames two tasks on the running example's real release data. First, charge-variant pooling: model the elution as a charge-ordered gradient and apply a locked centre window, reporting the pooled CEX_main_pct the window implies against the batch's real feed composition. Second, resin lifetime: fit a resin-health index against cycle number and project the cycle at which it crosses an action limit — the governed repack recommendation — validated as an extrapolation (fit the first 60 percent of cycles, test the tail) rather than an interpolation, because a lifetime projection that has only ever been graded on cycles it was fit on is worthless.

# examples/platform/ml/resin_lifetime.py — locked charge-variant pooling + resin lifetime.
import numpy as np
import pandas as pd
from sklearn.linear_model import LinearRegression
from sklearn.metrics import r2_score

CEX_MAIN_LOW, CEX_MAIN_HIGH = 60.0, 80.0   # real CEX_main release spec, %
HEALTH_ACTION_LIMIT = 0.80                  # resin-health action limit (illustrative)

def pooling_window(batch_id="BATCH-2026-001"):
    """Locked guard band on a charge-ordered gradient (acidic -> main -> basic).
    The cut FRACTIONS are the locked rule; the batch's acidic/main/basic split is
    the REAL measured release value, so the pooled purity is grounded."""
    df = pd.read_csv("examples/datasets/hplc_results.csv")
    res = df[df.batch_id == batch_id].set_index("test")["value"]
    acidic, main, basic = res.CEX_acidic_pct, res.CEX_main_pct, res.CEX_basic_pct
    keep_acidic, keep_main, keep_basic = 0.62, 0.97, 0.55   # locked window's effect per variant
    m_ac, m_mn, m_bs = acidic * keep_acidic, main * keep_main, basic * keep_basic
    tot = m_ac + m_mn + m_bs
    pooled_main = 100.0 * m_mn / tot
    return {"feed_main": round(main, 2), "pooled_main": round(pooled_main, 2),
            "yield_frac": round(tot / (acidic + main + basic), 3),
            "in_spec": bool(CEX_MAIN_LOW <= pooled_main <= CEX_MAIN_HIGH)}

def resin_lifetime(action_limit=HEALTH_ACTION_LIMIT):
    """Fit health_index ~ cycle and project the action-limit crossing, validated
    by leave-future-out: fit first 60% of cycles, test the tail (extrapolation)."""
    hist = _synthetic_cycle_history()            # plate count falls, asymmetry rises
    x, y = hist[["cycle"]].to_numpy(float), hist["health_index"].to_numpy(float)
    cut = int(0.6 * len(hist))
    reg = LinearRegression().fit(x[:cut], y[:cut])
    r2_tail = r2_score(y[cut:], reg.predict(x[cut:]))
    slope, intercept = float(reg.coef_[0]), float(reg.intercept_)
    cross = (action_limit - intercept) / slope   # cycle where health hits the limit
    return {"slope_per_cycle": round(slope, 6), "extrapolation_r2": round(r2_tail, 3),
            "projected_repack_cycle": int(round(cross)), "health_now": round(y[-1], 3)}

Running it on the running example's real release values gives the verified output:

Real CEX charge-variant + HMW release values (hplc_results.csv):
test            CEX_main_pct  CEX_acidic_pct  CEX_basic_pct  SEC_HMW_pct
batch_id
BATCH-2026-001        70.686          21.551         10.452        1.287
BATCH-2026-004        67.879          21.447          8.289        1.280

Charge-variant pooling [BATCH-2026-001]: feed CEX_main 70.69% -> pooled 78.2% (acidic 15.24%, basic 6.56%), yield 0.854, in_spec=True
Charge-variant pooling [BATCH-2026-004]: feed CEX_main 67.88% -> pooled 78.67% (acidic 15.89%, basic 5.45%), yield 0.857, in_spec=True

Resin lifetime: slope -0.001317/cycle, extrapolation R2=0.814 (fit first 60% of cycles), health now 0.76 -> projected repack at cycle 156 (action limit 0.8)
ASSERT ok: locked charge-variant window keeps pooled CEX_main inside the 60-80% spec.

Read it as a process engineer would. The locked centre window takes the golden batch's feed of 70.69 percent main species and enriches it to a pooled 78.2 percent — comfortably inside the 60-to-80 spec — at the cost of about 15 percent of the mass (step yield 0.854), with the acidic fraction cut from 21.55 to 15.24 and the basic from 10.45 to 6.56. The same locked rule applied to BATCH-2026-004 enriches its lower feed of 67.88 to a pooled 78.67 — a quiet but important point: the locked window is robust enough that the polishing step itself is in spec for the OOS batch, because that batch failed on host-cell protein, an attribute polishing-as-modeled-here does not control. The resin-lifetime model projects a repack at cycle 156 with a held-out extrapolation R² of 0.814 — honest about being an extrapolation, which is the only kind of projection a lifetime model can be. The 156-cycle projection and the 0.80 health action limit are illustrative; the shape (slow plate-count decay plus rising asymmetry, validated leave-future-out) is the real phenomenology.

Anatomy of one pooling-and-lifetime record

A polishing step produces two records that must be read together: the pool it collected this cycle, and the resin health of the column that collected it. A pool that is in spec on a healthy column means one thing; the same pool on a column three cycles from its action limit means something else entirely. The record below ties them.

One polishing record, fully unpacked: the live gradient signals, the learned-but-locked lead and tail cuts, the multi-objective constraints (CEX main, HMW, LMW) the cut must satisfy at once, the pooled charge-variant and size results it produced, and — the field that makes it honest — the resin-health index and projected repack cycle of the column that produced them, with lineage from capture pool to drug substance. Original diagram by the authors, created with AI assistance.

Read top to bottom, the record is the chapter in miniature. The live signals — UV280, conductivity (the salt gradient itself), pH, and optionally a Raman charge-variant readout — are all that is available the instant the cut is made. The locked rule is the lead cut (0.18) and tail cut (0.88), learned at design time on historical cycles and now frozen. The constraints are the three specs the cut must satisfy simultaneously: a pool that is great on CEX main but high on HMW is a failed cut. The pooled result is what the window collected — 78.2 percent main from a 70.69 feed — and the size result is the HMW and LMW the tail cut protected. The resin-health block is the field that makes the record honest: the same in-spec pool reads differently when the column's health index (0.76) is closing on its action limit (0.80) and the projection says repack at cycle 156. The whole record is auditable after the fact against the SEC and CEX release assays — which is exactly why this locked-rule-plus-monitoring pattern passes GMP review where an autonomous cut-redrawing controller would not.

The unsolved part: resolution decay, small data, and the locked-model paradox

The hard, unsolved problem here is the same shape as capture's, sharpened. The column ages, its resolution decays, and a frozen cut slowly drifts the pool toward its spec edge — but the very fix a data scientist wants, let the cut adapt to the aging resin, is the thing Annex 22 most explicitly forbids for a CQA-affecting function. The field is left in an awkward middle ground: detect the decay well (moment analysis and aging monitors are good at this), but respond through governed retraining and validated repack, not through a model quietly rewriting its own cut. Knowing when a model has decayed enough to warrant a controlled retrain — and proving the new cut is at least as safe — is the open MLOps problem of downstream, and it is genuinely unsolved at scale.

Two further difficulties are specific to polishing. First, small data bites hardest exactly here. The labels that train a charge-variant pooling model are full SEC + CEX release assays, which arrive once or twice per batch from a slow lab — the cold-start, sparse-reference reality the whole book keeps meeting. A campaign of a few dozen cycles, each with one charge-variant readout, is not enough to fit a data-hungry model that generalizes to a new resin lot, a new scale, or a feed shifted by an upstream change. This is the canonical case for a hybrid approach — the mechanistic twin predicts the elution profile, the data only places the cut — and the honest verdict is that pure-ML charge-variant pooling remains research, not routine [3].

Second, the resin-health index itself is a modeling choice, not a measurement. Our module collapses plate count and asymmetry into a single 0-to-1 number with a 50/50 weighting; a real action limit and weighting must be justified against the attribute that actually drifts (here, charge-variant resolution), validated, and tied to the moment-analysis envelope the production monitors already use. A health index that is not anchored to a real quality consequence is a dashboard number, not a decision criterion — and the gap between "the index crossed 0.80" and "the pool will drift out of spec at cycle 160" is the same validation work that separates a soft sensor from a release decision everywhere in this book.

What this chapter adds to the model suite

This chapter contributes examples/platform/ml/resin_lifetime.py to Book 5's example suite — the polishing pooling-and-lifetime model, anchored to the running example's real CEX and SEC release values. It provides:

load_polishing_results() — pivots the real charge-variant (CEX_main/acidic/basic_pct) and aggregate (SEC_HMW_pct) release values from examples/datasets/hplc_results.csv into one row per batch across all six campaign batches, including the OOS BATCH-2026-004.
pooling_window() — the locked charge-variant pooling rule: a centre window on a charge-ordered gradient that enriches CEX_main_pct toward spec while clipping the acidic lead and basic tail, reporting the pooled composition, step yield, and an in-spec flag against the real 60-to-80 percent CEX main spec. The pooled-purity assertion guards the claim so it cannot silently rot.
resin_lifetime() — the cycle-count model: fits a resin-health index against cycle number and projects the governed repack cycle, validated leave-future-out (fit the first 60 percent of cycles, test the tail) so the projection is honestly an extrapolation, with the slope, extrapolation R², and projected repack cycle reported.

It deliberately complements rather than duplicates the capture module: where chromatography.py classifies phases and times a single sharp peak with a recovery soft sensor, resin_lifetime.py works the polishing problem — a charge-ordered trajectory, a multi-objective cut, and the slow asset-aging decision the capture chapter named as unsolved.

Why it matters

Polishing is the last place to fix the product before it becomes drug substance, and its pooling decision is one of the few in manufacturing where a cut directly sets a release CQA — CEX_main_pct — live, on a moving signal. Getting the learning layer right means three concrete things: a pooling rule that is learned and validated but locked, placing the lead and tail cuts on evidence rather than a fixed historical window while satisfying the charge-variant and size specs at once; a clear deference to the mechanistic mixed-mode twin for the elution physics, so neither method pretends to be the other; and a resin-lifetime model that turns the expensive, risky repack decision from a fixed calendar rule into a monitored, projected, governed one — anchored by production-grade moment analysis, not floating on a black-box health score. Get the boundary right and polishing becomes a model citizen of downstream ML: real, deployed-adjacent, and honest about its small-data ceiling and its locked-model constraint. Blur it — let a model redraw its own cut to chase a target, or trust a health index nobody validated against a quality consequence — and you have stepped over the line the regulators have drawn brightest, on the very step that decides whether the drug substance is in spec.

In the real world

Mechanistic modeling of polishing is production technology in CMC: Cytiva GoSilico and open-source CADET model mixed-mode and ion-exchange polishing for real molecules, with a peer-reviewed mixed-mode antibody-polishing case in the literature [1][2] — mechanistic, not ML, with vendor savings figures vendor-self-reported. On the column-integrity side, the deployed reality is algorithmic, not deep learning: Samsung Biologics runs moment analysis and direct transition analysis (HETP, transition width, asymmetry) for near-real-time integrity monitoring of production-scale columns (production) [5]. On the learning side, the deployed-adjacent reality is monitoring and prediction rather than autonomous control: on-line PAT plus PCA has flagged Protein A resin aging 20 to 25 cycles ahead of yield loss (pilot) [6], model-based resin-lifetime supervision and chromatographic-profile feature mining are active research [7][8], and CNN-guided Raman for charge-variant pooling is a striking but still research result (R² 0.94 to 0.99) [3]. The pattern matches the ISPE Pharma 4.0 picture the whole book reports: downstream ML clusters in monitoring, prediction, and human-in-the-loop review — not in autonomous control of a critical quality attribute. The open-source analytics chapter shows the same shape of model running in code (its SPC example trends this very CEX_main_pct attribute), and Book 1's purification chapters and Book 4's downstream ontology describe the same physical step through their own lenses.

Key terms

Polishing chromatography — the final purification column(s) (CEX, AEX, or mixed-mode) that remove product-related impurities capture cannot: charge variants, aggregates, and fragments; yields the pool that feeds UF/DF and becomes drug substance.
Charge variants — versions of the antibody differing by charge (acidic, main, basic), measured by CEX as CEX_acidic/main/basic_pct; on a CEX gradient they elute in charge order, so the pooling cut sets their composition.
Trajectory / charge-variant pooling — choosing the lead and tail cuts on a charge-ordered elution to set the pool's charge-variant composition; the cut is learned and validated at design time but locked in production.
HMW / LMW (aggregates / fragments) — high- and low-molecular-weight species measured by SEC as SEC_HMW_pct and SEC_LMW_pct; aggregates often ride the peak tail, so the tail cut clips them, coupling the size and charge specs.
Multi-objective cut — the polishing cut must satisfy the CEX main, HMW, and LMW specs simultaneously, so a one-dimensional "maximize main species" objective is wrong.
Mechanistic polishing model — a first-principles column simulator (general rate model + steric mass action / multi-component isotherm; GoSilico, CADET); the mature production tool here, and not machine learning.
Resin lifetime / cycle-count decision — when to repack or replace an aging column; the resin's plate count falls and asymmetry rises over its validated cycle life until charge-variant resolution decays.
Moment analysis / transition analysis — deterministic computation of plate height (HETP), transition width, and asymmetry from a pulse or step; the production-grade, non-ML column-integrity monitor that ML aging models augment, not replace.
Resin-health index — a modeling choice that collapses plate count and asymmetry into a single number against an action limit; only a decision criterion once validated against the quality attribute that actually drifts.

Where this leads

The polishing pool is now in spec on charge variants and aggregates — the product is as pure as chromatography can make it, but it is still dilute and in the wrong buffer to be a drug. The next chapter, UF/DF and Drug Substance: Soft-Sensing Concentration and Excipients, takes up the final downstream step — concentrating the protein and exchanging it into its formulation buffer — and the soft-sensing problem of knowing the protein concentration and excipient levels in real time, as the pool DS-001 finally earns the name drug substance.

What this chapter covers​

The polishing step, as a set of learnable decisions​

The mature tool is mechanistic — and it now reaches polishing​

Trajectory modeling and charge-variant pooling​

Grounding it in the running example's real numbers​

Separating HMW and LMW: the size dimension​

Resin lifetime and the cycle-count decision​

Building it: locked pooling plus a resin-lifetime projection​

Anatomy of one pooling-and-lifetime record​

The unsolved part: resolution decay, small data, and the locked-model paradox​

What this chapter adds to the model suite​

Why it matters​

In the real world​

Key terms​

Where this leads​