Modeling Polishing: Multi-Step Purification and Quality Attributes
📍 Where we are: Part IV · Downstream, modeled — Chapter 15. The product is captured and made virally safe. This chapter models the finishing purification — and uses it to confront how the graph represents a quality that no single step creates.
After capture and viral safety, the antibody is mostly pure but not yet to specification. Polishing — usually one or more ion-exchange chromatography steps — removes the last impurities and, importantly, separates the desired antibody from closely-related product variants: aggregates (high-molecular-weight species), charge variants, fragments [1][3]. Each step lays the familiar derivedFrom edge and carries the familiar in-process quality results. But polishing surfaces a question the model has dodged until now: when the final monomer purity is 98.611 %, which step made it so? The honest answer — "all of them, cumulatively" — is harder to model than it sounds.
Polishing a rough gem is several passes with finer and finer grit; the final shine is not the work of any one pass but of the whole sequence. If someone asks "which pass made it shiny?", the honest answer is "the sequence did." A quality attribute like purity is the same: each polishing step nudges it, and the final value emerges from the chain. This chapter models that — how the graph records a quality that no single step owns, and how it keeps the chain of custody for a number that was shaped, not created, at any one point.
What this chapter covers
We model polishing steps as unit operations that incrementally improve quality attributes, model product variants as the entities being separated out, and dissect one polishing step. The chapter's weight is on the attribution problem: how the graph represents a CQA shaped cumulatively across a chain, why no single node "owns" the final value, and how that connects the in-process results along the chain to the design space knowledge that predicted them.
Each step nudges a quality attribute, and variants are real entities
A polishing step transforms its input material into a purer output, and the interesting facts are the quality attributes it moves: in the running example the cation-exchange step bp:POL-001 takes the viral-filtered pool bp:VFpool-001 to the polished pool bp:POLpool-001, dropping the in-process high-molecular-weight (HMW) aggregate fraction from 4.1 % measured after capture to 1.4 % after polishing, and recording the charge-variant main peak at 70.9 % [3]. Each intermediate carries its in-process result through bp:hasInProcessResult — an evidenced quality, exactly as in the analytical chapter — and the design-space link affectsQuality ties the process parameters to the attributes they move at the type level. The variants being removed are modeled as entities in their own right: bp:AGG-1 (a bp:Aggregate) and bp:CHV-1 (a bp:ChargeVariant) are both derivedFrom bp:VFpool-001 and routedTo bp:WASTE-pol, so the graph tracks what was removed and to where — the same keep-one-discard-another pattern clarification introduced, now applied to molecular variants rather than cells.
The aggregate attribute that polishing exists to suppress is its own typed quality in the vocabulary — bp:AggregateContent, a sibling of bp:MonomerContent under the bp:Quality class — and the polishing step, its output pool, the separated variants, and the in-process trajectory results are all real individuals in instances.ttl:
@prefix bp: <https://example.org/bioproc#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
bp:POLpool-001 a bp:PolishingIntermediate ; rdfs:label "polished pool" ;
bp:derivedFrom bp:VFpool-001 ;
bp:participatesIn bp:POL-001 ;
bp:hasInProcessResult bp:IPR-pol-hmw , bp:IPR-pol-cex .
bp:POL-001 a bp:IonExchangeChromatography ; rdfs:label "cation-exchange polishing step" ;
bp:hasInput bp:VFpool-001 ; bp:hasOutput bp:POLpool-001 ;
bp:removesVariant bp:AGG-1 , bp:CHV-1 .
bp:AGG-1 a bp:Aggregate ; rdfs:label "removed HMW aggregate" ; bp:derivedFrom bp:VFpool-001 ; bp:routedTo bp:WASTE-pol .
bp:CHV-1 a bp:ChargeVariant ; rdfs:label "removed acidic charge variant" ; bp:derivedFrom bp:VFpool-001 ; bp:routedTo bp:WASTE-pol .
# In-process trajectory results along the chain (correlation, not causation).
bp:IPR-cap-hmw a bp:InProcessResult ; bp:isAbout bp:PApool-001 ; bp:hmwPct "4.1"^^xsd:float .
bp:IPR-pol-hmw a bp:InProcessResult ; bp:isAbout bp:POLpool-001 ; bp:hmwPct "1.4"^^xsd:float .
The classes those individuals instantiate are grounded up in align.ttl — the step to the IOF biopharma module, the attribute and its sibling to the BFO quality category, with the variant entities and relations honestly local:
# align.ttl — the polishing classes grounded UP (excerpt).
bp:PolishingStep rdfs:subClassOf iof:PolishingProcess . # IOF biopharma 'polishing process' (Released, Release_202602)
bp:AggregateContent rdfs:subClassOf bp:Quality . # and bp:Quality rdfs:subClassOf obo:BFO_0000019 — BFO 'quality'
bp:MonomerContent rdfs:subClassOf bp:Quality . # its sibling attribute, on the same BFO grounding
# bp:IonExchangeChromatography inherits bp:PolishingStep (no distinct IOF leaf for cation-exchange polishing);
# bp:Aggregate / bp:ChargeVariant are bp:ProductVariant leaves, and bp:removesVariant / bp:routedTo are local
# QbD relations — all ILLUSTRATIVE, since no settled 1:1 OBO/IOF term exists for product variants or these edges.
The endpoint of that trajectory is recorded on the lot as a plain scalar — the golden DS-001 carries 98.611 % monomer and 1.287 % HMW aggregate after the chain has done its work. Running queries/trajectory.rq, which collects every bp:InProcessResult carrying an bp:hmwPct, returns the two-point chain [('PApool-001', 4.1), ('POLpool-001', 1.4)] — the loadable, ordered history of one quality attribute.
The attribution problem: a quality the chain owns, not a step
Here is the modeling puzzle. The monomer purity at release is a single number, 98.611 %, and it is tempting to point at one polishing step as its source. But purity was raised at capture, protected through viral safety, and refined across each polishing step — it is a cumulative, almost emergent property of the whole purification chain. The graph records each step's in-process result, so the trajectory of an attribute is visible — for HMW aggregate the modeled chain runs 4.1 % after capture to 1.4 % after polishing to 1.287 % at release. But the final value is not caused by any one node; it is the endpoint of a chain of incremental improvements. The model resists the urge to assert a single "purity-determining step" and instead represents the trajectory: a sequence of materials each carrying the attribute's value at that point, so an analyst can see where the gain happened and a design-space model can be checked step by step against what each operation was predicted to do.
This reframes derivedFrom itself. Across the chain it is not just lineage but a chain of custody for quality: each edge carries the material and the evolving state of its attributes, so the graph can answer not only "where did this come from?" but "how did its purity get here, step by step?" That second question is what turns a genealogy into a process understanding record, and it is the bridge from this chapter's in-process trajectory to the release decision that reads the endpoint.
One quality attribute as a trajectory, not a single point: HMW aggregate falls across capture, polishing, and release, staying under the 2.0 percent ceiling the out-of-spec sibling DP-004 breaches at 2.41 percent — the drop driven by the variants polishing routes to waste.
Original diagram by the authors, created with AI assistance.
The endpoint that polishing must drive the aggregate below is not narrative — it is a machine-checkable constraint in the release gate. The gate reads exactly the bp:hmwPct scalar the chain deposits on the lot and rejects any value above the upper limit; this is the single criterion the out-of-spec sibling DP-004 trips, at 2.41 % against a ceiling of 2.0 %, while its monomer stays in spec:
@prefix sh: <http://www.w3.org/ns/shacl#> .
@prefix bp: <https://example.org/bioproc#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
bp:ReleaseShape a sh:NodeShape ;
sh:targetClass bp:DrugSubstance , bp:DrugProduct ;
sh:property [
sh:path bp:hmwPct ;
sh:name "SEC %HMW aggregate" ;
sh:minCount 1 ; sh:maxCount 1 ;
sh:datatype xsd:float ;
sh:maxInclusive 2.0 ;
sh:message "HMW aggregate is missing or above the 2.0 % release limit." ] .
Run against the whole graph this gate conforms for DS-001, DP-001, and DP-002 and reports a single violation for the sibling lineage — the realistic out-of-spec mode where polishing did not clear enough aggregate. Every other panel value on those lots (monomer 98.687 %, CEX, HCP, protein) is in spec, so the failure is isolated to one path on two focus nodes:
[4] SHACL whole-graph conforms: False # DS-004/DP-004 fail hmwPct MaxInclusive (2.41 > 2.0)
violating focus nodes: ['DP-004', 'DS-004'] failing paths: ['hmwPct']
DS-001-only graph conforms: True
A polishing step in context: it nudges the quality attributes and routes the separated variants AGG-1 and CHV-1 to waste, while the trajectory strip shows HMW aggregate falling across the whole chain — a chain of custody for quality, not a single determining step.
Original diagram by the authors, created with AI assistance.
The unsolved part: attributing cause along a chain
The attribution problem is genuinely unsolved in the strong sense, and it is worth being precise about why. The graph can faithfully record the trajectory of a quality attribute — its value after each step — but a trajectory is correlation, not causation. Saying "purity rose most at the polishing step" does not establish that the polishing step caused the final quality, because the steps are not independent: what a polishing step can achieve depends on what capture left it, and the same final purity might have been reached by a different division of labor across the chain. The model can show where the value changed; it cannot, from lineage alone, apportion credit or blame among interacting steps. That apportionment is a causal-inference question that needs the design-space models and process knowledge the graph only indexes, not the genealogy it stores — the same boundary the process-development chapter drew between the graph and the response surface.
A second, practical gap is that in-process and release results live at different authority levels. An in-process purity after polishing-1 is a process control measurement; the release purity on the drug substance is the legally binding result. A graph that records them as if they were the same kind of fact blurs a distinction that matters — the same validated-versus-measured and in-process-versus-release care the viral-safety chapter demanded. So the honest standard is that the graph is excellent at recording the history of a quality attribute and silent on causation among the steps that shaped it; modeling the trajectory is real and valuable, while claiming a single step "owns" the result is exactly the kind of confident over-reach the whole book warns against.
Why it matters
Polishing is where the product reaches specification, and it is where the model learns to represent a quality as a history rather than a property of one node. Record the attribute trajectory across the chain and the graph supports both investigation ("where did purity drop in this batch versus the golden one?") and verification ("did each step move the attribute as the design space predicted?"). Collapse the chain to a single final number, or pretend one step caused it, and the model loses the step-by-step understanding that makes a deviation diagnosable. The chain-of-custody-for-quality framing established here is what lets the digital-thread chapter reconstruct not just lineage but the evolution of every critical attribute along it.
In the real world
Multi-step polishing by ion-exchange chromatography, removing aggregates and charge variants to bring a monoclonal antibody to specification, is standard commercial practice, and the analytical methods that track these variants step by step are well established [1][2][3]. Plants already measure in-process purity along the train; the modeling advance is to represent those measurements as a connected trajectory of one attribute across linked materials, rather than as isolated test results filed per step — so the history of a CQA is queryable. The open-source downstream chapter captures these step signals, and the trajectory model here is what lets that captured data answer how a quality attribute came to be, not merely what it finally was.
Key terms
- Polishing — the finishing purification (typically ion-exchange) that removes residual impurities and separates the antibody from product variants to reach specification.
- Product variants — closely-related molecular species (aggregates/HMW, charge variants, fragments) separated out and modeled as distinct entities
derivedFromthe product. - Attribute trajectory — the sequence of a quality attribute's values across linked materials along the chain, making visible where a quality changed.
- Chain of custody for quality — the reading of
derivedFromacross the purification chain as carrying not just lineage but the evolving state of each material's quality attributes. - Attribution problem — the unsolved question of apportioning a cumulative quality among interacting steps; the graph records the trajectory (correlation) but cannot, from lineage alone, establish causation.
- In-process result — a quality measured on an intermediate and carried by
bp:hasInProcessResult(the4.1 %and1.4 %HMW points); a process-control measurement, distinct in authority from the binding release result on the lot. - Release gate (SHACL constraint) — the machine-checkable shape (
bp:ReleaseShape, withsh:maxInclusive 2.0overbp:hmwPct) that holds back any lot whose CQA is out of spec; the single criterion the siblingDP-004trips at2.41 %.
Where this leads
Polishing brings the product to specification; the trajectory of every CQA is now recorded across the chain. The next chapter, Modeling the Drug Substance: The Lot That Anchors Release, models the final downstream step that yields DS-001 — the lot where all the upstream lineage converges, where the release CQAs properly belong (correcting the simplification we have carried since the bioreactor), and from which every drug-product lot will derive.