Modeling Capture Chromatography and the Pooling Problem

📍 Where we are: Part IV · Downstream, modeled — Chapter 13. The clarified harvest is ready to purify. This chapter models the first and most important purification step — and immediately cashes the individuation warning the harvest chapter left us with.

Clarification handed us a clarified harvest still full of host-cell proteins, DNA, and other impurities. Protein A capture chromatography is the step that changes everything: a column packed with resin that binds the antibody specifically, lets everything else wash away, then releases the purified product in a concentrated pool — PApool-001 in our running campaign. It typically removes the bulk of impurities in one operation. It is also the canonical home of the pooling problem: a single capture step is run as many cycles, each loading and eluting, all combined into one pool — so for the first time a material derivedFrom not one parent but several. The harvest chapter promised this case was coming; here we model it.

The simple version

Imagine pressing olive oil in batches all morning and pouring every press into one big drum. The drum's oil came from all the morning's presses, not one — so if you ever need to trace a problem in the drum, "where did this come from?" has several answers, and you'd better have recorded each press. Capture chromatography works in cycles like those presses, pooled into one container. This chapter models that many-into-one lineage honestly, and also faces a subtler trace: the press itself — the resin — is reused across many products, so it can carry a whisper of the last one into the next.

What this chapter covers

We model capture as a unit operation with the now-standard one-process-transforms-material shape, and solve the pooling problem — one pool deriving from many cycles — that breaks the simple parent-child edge. We model the resin as persistent equipment with a binding disposition, dissect the PApool-001 node, and confront the two genealogies that capture introduces: the pool's backward fork, and the resin's reuse across batches that creates a real carryover lineage.

Pooling: one material, many parents

A capture step is run as a sequence of cycles because a column can only hold so much per load. Each cycle binds antibody from a portion of the harvest and elutes it; the eluates are combined into the capture pool [1]. Modeled naively as a single unit operation, the pool simply derivedFrom the batch — which is true but lossy. Modeled honestly, the pool derivedFrom each cycle's eluate, and each eluate derives from the load it processed, so the lineage forks backward: one PApool-001, several parent eluates. This is exactly the many-parents case the harvest chapter flagged, and the graph handles it without strain because derivedFrom was never restricted to one parent — a node can have as many incoming-lineage edges as reality demands. In the loadable dataset this is not a pattern we describe but individuals we instantiate: three cycle eluates (bp:ELU-001a, bp:ELU-001b, bp:ELU-001c) each derivedFrom the clarified harvest bp:CLAR-001, and PApool-001 records exactly which of them it pooled through bp:includedFraction — a typed sub-property of derivedFrom. So when an investigation asks "which load contributed to this pool?", the answer is in the graph, not in a chromatography logbook. The pool's headline edge to the clarified harvest is also asserted directly, so the coarse chain stays reachable while the cycle detail is preserved underneath it:

# instances.ttl — the capture pool, with its cycle-level provenance preserved.
bp:ELU-001a a bp:CycleEluate ; rdfs:label "capture cycle 1 eluate" ; bp:derivedFrom bp:CLAR-001 .
bp:ELU-001b a bp:CycleEluate ; rdfs:label "capture cycle 2 eluate" ; bp:derivedFrom bp:CLAR-001 .
bp:ELU-001c a bp:CycleEluate ; rdfs:label "capture cycle 3 eluate (excluded by pooling)" ; bp:derivedFrom bp:CLAR-001 .

bp:PApool-001 a bp:CapturePool ;
    bp:derivedFrom bp:CLAR-001 ;                       # transitive headline lineage (entailed via the eluates too)
    bp:fromBatch bp:BATCH-2026-001 ;
    bp:includedFraction bp:ELU-001a , bp:ELU-001b ;    # which cycles pooled — answerable, not collapsed
    bp:participatesIn bp:CAP-001 .

Real-time pooling decisions make this sharper: plants increasingly use on-line measurement to decide, cycle by cycle, which eluate fractions are pure enough to pool [3]. That decision is itself a modeled event — bp:POOL-DEC-001, a bp:PoolingDecision that takes all three eluates as bp:hasInput and outputs PApool-001, recording that cycle 3 was excluded while cycles 1 and 2 were included. Capturing it means the pool's composition is not just recorded but explained. The process-development affectsQuality knowledge meets the plant here: the criterion that decides pooling is the control strategy in action.

The resin persists, and persistence means carryover

Capture introduces an entity the earlier steps did not stress: a consumable that is reused. The Protein A resin is expensive and used for many cycles across many batches before disposal. In the dataset it is a real individual — resin lot bp:RESIN-PrA-07, a bp:ResinLot — carrying its bp:cycleCount (38) and bp:cycleLifetimeLimit (200), with the capture step bp:performedOn it. In BFO terms it is a material entity — equipment — bearing a disposition to bind antibody: bp:RESIN-PrA-07 bp:hasDisposition bp:PrA-bind, a bp:BindingDisposition that bp:isRealizedIn the capture step (the same realizable-entity idea as a molecule's developability). Crucially the resin persists across batches, and that persistence creates a genealogy the product chain alone misses: because the lot is a tracked entity with its own usage history rather than an anonymous supply, a contamination or a degradation on it could link batches that share no product lineage. Modeling the resin this way is what makes "which batches shared this resin?" a query rather than an archaeology project — the carryover question that is invisible when consumables are anonymous. This is the same impact-analysis power the knowledge-graph chapter showed for cell banks, now applied to a shared column.

This is also where the alignment up to standard ontologies earns its keep, and where the book stays honest about its seams. The general class bp:ChromatographyColumn aligns to a real, verified external term — iof:ChromatographyColumn, the IOF biopharma module's chromatography column (a Released class in IOF Release_202602) — so a downstream tool that imports IOF can line up our column with theirs, and the resin reaches a real anchor too: bp:ChromatographyResin binds to iof:ChromatographyMedium. But the specific leaves of this chapter stay deliberately local: bp:CaptureColumn and bp:ResinLot are marked ILLUSTRATIVE in align.ttl, because no settled one-to-one IOF or AFO term for a Protein A capture column or a single resin lot exists in this stack to anchor them to. The resin's binding disposition, by contrast, does reach a verified anchor: its parent bp:Disposition is a subclass of obo:BFO_0000016 (BFO 2020 disposition, verified via OLS4), and the bearer relation bp:hasDisposition is a sub-property of obo:RO_0000053 (RO bearer of). So the upper spine is real and dereferenceable; the domain-specific equipment terms remain honest local placeholders until the standards catch up.

Those alignment edges, exactly as align.ttl asserts them, ground the step, the column, the resin, and the binding disposition up while leaving the specific leaves honestly local:

# align.ttl — the capture step and its equipment grounded UP (excerpt).
bp:CaptureChromatography rdfs:subClassOf iof:CaptureStep .          # IOF biopharma 'capture step' (Released, Release_202602)
bp:ChromatographyColumn  rdfs:subClassOf iof:ChromatographyColumn . # IOF biopharma 'chromatography column' (Released)
bp:ChromatographyResin   rdfs:subClassOf iof:ChromatographyMedium . # IOF biopharma 'chromatography medium' (the stationary-phase resin)
bp:Disposition           rdfs:subClassOf obo:BFO_0000016 .          # BFO 2020 'disposition' (the resin's binding disposition; verified via OLS4)
bp:hasDisposition        rdfs:subPropertyOf obo:RO_0000053 .        # RO 'bearer of' (the resin bears the disposition)
bp:includedFraction      rdfs:subPropertyOf bp:derivedFrom .        # a typed sub-property of derivedFrom (itself RO 'derives from')
# bp:CaptureColumn and bp:ResinLot stay ILLUSTRATIVE leaves — no settled 1:1 IOF term for a Protein A
# capture column or a single resin lot — so they inherit the anchors above rather than being faked.

The capture pool, unpacked: a material that derives from its pooled cycle eluates (the backward fork), produced on a tracked resin lot reused across batches (the carryover lineage), carrying its in-process impurity result — pooling and consumable provenance modeled, not collapsed. Original diagram by the authors, created with AI assistance.

Capture's two genealogies side by side: cycle eluates pool backward into one PApool-001 with cycle 3 excluded by the pooling decision, and a reused resin lot links batches no product lineage connects — the carryover the product chain alone misses. Original diagram by the authors, created with AI assistance.

The unsolved part: pooling provenance and consumable lineage at scale

The pooling and carryover models are correct in principle and genuinely hard in practice, and the difficulty is one of fidelity versus effort. Our dataset shows the fine model — three named cycle eluates, an explicit pooling decision, a tracked resin lot with its cycle count and lifetime limit — but recording every cycle, every fraction, every pooling decision, and every consumable's full cross-batch usage history at production scale is a large amount of fine-grained data, and plants routinely model coarsely — one pool, one parent batch, an anonymous resin — because the finer model costs more to capture and maintain. The trouble is that the coarse model is silently lossy in exactly the cases that matter most: when an investigation needs to know which load or which fraction carried an impurity, or which prior batch shared the resin, the answer was discarded at modeling time. And even our model marks the edge it does not yet draw: the resin carries the cycle count that makes carryover answerable, but the second batch that would close the cross-batch loop is described here rather than instantiated — the usage history exists as a count, not yet as a fleet of explicit per-batch usage edges. There is a real, unresolved tension between the genealogy's completeness and the cost of capturing it, and "we have full traceability" is often true at the batch level and false at the cycle and consumable level.

Consumable lineage compounds it. A resin is not static — it degrades, is cleaned between cycles, and is eventually retired — so its "identity" over a campaign has the same living-thing softness a cell line has: is the resin at cycle 200 the same entity as at cycle 1? And cleaning validation (proving carryover is below a threshold) is a validated claim about the cleaning process, not a per-batch measurement — the same validated-versus-measured gap the viral-safety chapter is about to make central. So the honest standard is that capture's two genealogies — pooling and carryover — are modelable and important, but they are precisely where the comfortable batch-level traceability thins out, and a graph that stops at the batch boundary can present a confident lineage that cannot actually answer a cycle-level or consumable-level question.

Why it matters

Capture is where downstream traceability is won or lost in detail. Model the pool as deriving from its real cycles and the resin as a tracked, reused entity, and the graph can answer the forensic questions investigations actually ask — which load, which fraction, which shared consumable — by traversal. Collapse them to a single batch-level edge and an anonymous supply, and the genealogy looks complete while quietly being unable to answer its hardest questions. The pooling problem the harvest chapter raised is not an edge case; it is the normal shape of downstream, and capture is where the model either rises to it or papers over it.

In the real world

Protein A capture run in pooled cycles, on reused resin with cleaning validation and increasingly with on-line pooling decisions, is standard commercial antibody purification [1][2][3]. Plants track resin lots and cycle counts already — for cleaning and lifetime limits — so the entities the graph needs largely exist; what is uneven is linking them as genealogy rather than filing them as logs, so that "which batches shared this resin?" is a query rather than an archaeology project. The open-source downstream chapter captures these skid signals, and the pooling and consumable lineage modeled here is exactly the structure that turns those captured signals into an answerable forensic record.

Key terms

Capture chromatography — the Protein A step that binds the antibody specifically and elutes a purified, concentrated pool; the first major purification.
Pooling problem — a capture pool derives from many cycle eluates, forking the derivedFrom lineage backward; handled by recording cycles as real materials rather than collapsing them.
Pooling decision — the cycle-by-cycle, often on-line, choice of which fractions are pure enough to combine; a modeled event that explains the pool's composition.
Resin (consumable) — the chromatography medium, a persistent material entity bearing a binding disposition, reused across batches.
Carryover lineage — the genealogy created by a shared consumable: batches that used the same resin are linked even with no product lineage, enabling cross-batch impact analysis.
Validated versus measured — cleaning (carryover control) is a validated claim about a process, not a per-batch measurement; a recurring downstream gap.

Where this leads

The product is captured and largely purified, pooled from its cycles. The next chapter, Modeling Viral Safety: Inactivation and Filtration as Risk-Reducing Steps, models the steps whose entire purpose is to remove and inactivate a contaminant you hope was never there — forcing the model to represent absence, risk reduction, and the difference between a validated clearance claim and a measured result, the validated-versus-measured gap capture just previewed.

What this chapter covers​

Pooling: one material, many parents​

The resin persists, and persistence means carryover​

The unsolved part: pooling provenance and consumable lineage at scale​

Why it matters​

In the real world​

Key terms​

Where this leads​