Modeling Harvest and Clarification: A Material Transformation

📍 Where we are: Part III · Upstream, modeled — Chapter 12. The antibody is made, suspended in a tank of cells. This chapter models separating it out — the first downstream step — and uses it to face a question every derivedFrom edge quietly assumes an answer to.

The production bioreactor leaves us with a problem of abundance: the antibody we want is dissolved in a broth full of the cells that made it, their debris, and a soup of other proteins. Harvest and clarification separate the product-bearing liquid from the solids — by centrifugation and depth filtration — yielding a clarified harvest ready for purification and a stream of spent cells to discard [1]. Modeling it is simple on its surface — one material in, a cleaner material out — but it forces the book to confront, finally, the question hiding under every genealogy edge so far: when does one material stop being itself and become a new thing the graph should name?

The simple version

Straining stock is a transformation: the pot of bones and water goes in, and two things come out — the clear broth you keep and the solids you throw away. The broth is genuinely a new thing, not the same pot relabeled. Harvest is biopharma's straining: the cell-culture broth becomes clarified harvest plus waste. The interesting question this raises is one cooks never worry about but a database must: at exactly which moment did "the pot of stock" become "the strained broth" — and is that a new entry in your records, or the same one? This chapter answers that for the graph.

What this chapter covers

We model clarification as a unit operation — an occurrent that consumes one material and produces two — laying the derivedFrom edge from clarified harvest to batch and crossing the upstream-to-downstream boundary. We dissect the transformation, and then spend the chapter's weight on the genuinely hard, genuinely unsolved question it surfaces: the individuation of materials along a continuous process flow, where the graph's crisp nodes meet a reality of pooling, splitting, and gradual change.

A unit operation consumes one material and produces two

Clarification is the template for every downstream step to come, so it is worth modeling cleanly. It is a unit operation — an occurrent — whose input is the batch material (the cell-culture broth) and whose outputs are two new materials: the clarified harvest (the product-bearing liquid we keep) and the spent biomass (the cells and debris we discard). The clarified harvest derivedFrom the batch, extending the lineage chain one more hop toward the drug substance; the waste stream is a real output worth naming when contamination tracing or mass balance matters, so it gets its own node too. This two-output shape — keep one, discard another — recurs at viral filtration, polishing, and every place the process removes something, so the pattern established here is reused throughout Part IV. The running example loads this as real individuals: a clarification occurrent bp:HARV-001, its kept output bp:CLAR-001, and its discarded output bp:BIOMASS-001, both deriving from BATCH-2026-001 (from instances.ttl):

# Harvest clarification as the downstream template: one material in, two out.
bp:HARV-001 a bp:Clarification ;                 # an occurrent (a unit operation)...
    bp:hasInput  bp:BATCH-2026-001 ;             # ...consuming the cell-culture broth
    bp:hasOutput bp:CLAR-001 , bp:BIOMASS-001 .  # ...producing the kept harvest plus the discarded waste

bp:CLAR-001 a bp:ClarifiedHarvest ;              # the kept, cell-free harvest
    bp:derivedFrom bp:BATCH-2026-001 ;           # one more derivedFrom hop downstream
    bp:participatesIn bp:HARV-001 ;
    bp:turbidityNtu "3.2"^^xsd:float .           # its first in-process quality
bp:BIOMASS-001 a bp:SpentBiomass ;               # the discarded cells and debris
    bp:derivedFrom bp:BATCH-2026-001 .

The downstream-template classes are grounded up in align.ttl, with the combined clarification step honestly local:

# align.ttl — the downstream template grounded UP (excerpt).
bp:UnitOperation rdfs:subClassOf iof:ManufacturingProcess .   # IOF Core 'manufacturing process'
bp:Material      rdfs:subClassOf obo:BFO_0000040 .            # BFO 2020 'material entity' (ClarifiedHarvest / SpentBiomass inherit it)
bp:Quality       rdfs:subClassOf obo:BFO_0000019 .            # BFO 2020 'quality' (Turbidity inherits it)
bp:derivedFrom   rdfs:subPropertyOf obo:RO_0001000 .          # RO 'derives from'
# bp:Clarification stays ILLUSTRATIVE: the book models it as centrifugation + depth filtration together,
# and IOF has no single 1:1 leaf for that combined step, so it inherits iof:ManufacturingProcess.
# bp:ClarifiedHarvest / bp:SpentBiomass / bp:Turbidity have no verified single external leaf — kept local.

This step also crosses a conceptual line: it is where upstream (growing cells) hands off to downstream (purifying protein). In the model that boundary is not a special construct — it is just another derivedFrom edge between two materials — which is itself a useful insight: the famous upstream/downstream divide is a human chapter heading, while the graph sees one continuous chain of material transformations. A clarification CQA worth carrying is turbidity — how cloudy the clarified harvest is, often required to be below a few NTU — modeled here as a bp:turbidityNtu of 3.2 inhering in bp:CLAR-001, the first of many in-process checks the downstream chapters attach to the material they evidence.

The downstream template: one material in, a process, two materials out — the kept clarified harvest derivedFrom the batch and the discarded biomass — with turbidity as the output's first quality check and the upstream/downstream line revealed as just another edge. Original diagram by the authors, created with AI assistance.

The downstream template as a flow: one material in, two out — the kept clarified harvest crosses into downstream while the spent biomass goes to waste — and the famous upstream/downstream divide is revealed as just another derivedFrom edge. Original diagram by the authors, created with AI assistance.

The unsolved part: where does one material end and the next begin?

Every derivedFrom edge in this book — and there have been many — quietly assumes you can say where one material stops and the next starts. Clarification is where that assumption gets uncomfortable, and the discomfort is real, not pedantic. The product molecules in the clarified harvest are the same molecules that were in the broth; nothing was created, only separated. So is the clarified harvest genuinely a new material entity deserving its own node and IRI, or is it the same material in a new container with some things removed? The graph forces a choice the chemistry does not make for you.

The pragmatic answer the field takes — and that this book takes — is to mint a new material node at each unit operation boundary, because that is the granularity at which materials get sampled, tested, pooled, and released. The clarified harvest is a new node because it is the thing that gets a turbidity result and a hold time and a forward handoff. But this is a modeling convention, not a metaphysical fact, and it has soft edges. Pooling breaks it one way: several capture cycles combine into one pool, so one child material derivedFrom several parents — the lineage forks backward. Splitting breaks it the other way: one drug-substance lot is filled into many drug-product lots, so one parent has many children. And continuous processing dissolves the boundary entirely: in a connected, continuously-flowing process there are no discrete batches to individuate at all, only a flow, and "where is the node?" has no clean answer.

There is a second, sharper limit: the graph does not enforce mass balance. It will happily record a clarified harvest derivedFrom a batch with no check that the amounts reconcile, no guarantee that what came out plus the waste equals what went in. derivedFrom is a lineage relation, not a conservation law — it says "this came from that," not "this much came from that much." A graph can therefore present a perfectly traversable genealogy that is quantitatively impossible, and catching that requires either modeling quantities explicitly (yields, volumes, masses as typed values) and validating them with SHACL, or accepting that the graph indexes lineage while the mass balance lives in process records. The honest standard, then, is that material individuation is a deliberate convention pinned to unit operations, that pooling and splitting and continuity stress it in known ways the model must handle explicitly, and that lineage is not the same guarantee as conservation. This is the genuinely hard part of modeling a flowing physical process as a discrete graph, and no amount of ontology makes it disappear.

Why it matters

Clarification establishes the pattern — and the unresolved tension — that all of downstream inherits. Every subsequent step is a unit operation transforming one material into another, laying a derivedFrom edge, and every one faces the same individuation question; getting the convention right and applying it consistently is what makes the downstream lineage traversable and trustworthy. Treat materials inconsistently — a node here, a relabel there, no rule for pooling — and the genealogy that the digital-thread chapter walks becomes ambiguous exactly where investigations need it most. The simplest downstream step is the right place to settle how the model individuates matter, because everything after it just repeats the pattern at higher stakes.

From the wire to the graph

The chapter's turbidity scalar — a bare bp:turbidityNtu of 3.2 — is honest but thin: it carries no unit and no provenance, and it does not say where the number came from. The companion's instances.ttl upgrades bp:CLAR-001 to show both interfaces a real clarified harvest is measured through. The cloudiness still appears as the convenience scalar, but it now also hangs a qudt:QuantityValue (bp:CLAR-001-turbidity-qv, with qudt:numericValue 3.2 and qudt:ucumCode "[NTU]" — NTU has no clean QUDT unit IRI, so the code is illustrative) and a bp:hasTrace to bp:Trace-CLAR-turbidity, an in-line turbidity transmitter on tag HARV.Turb.PV. That transmitter is read over OPC UA, which is production-grade plant infrastructure — so this is not a pilot interface; it is how the number actually arrives off the wire.

The at-line titer arrives the other way. A bp:Sample derivedFrom the harvest is the input to an assay whose result points, via bp:hasChromatogram, at bp:ADF-Titer-CLAR — an Allotrope bp:AnalyticalDataFile. This is the analytical-methods index-vs-payload pattern: the scalar and the pointer live in the graph, while the chromatogram payload stays inside the ADF container. Both patterns — the unit-qualified QuantityValue plus hasTrace index seen on the bioreactor's instrument readings, and the ADF-backed result seen in QC — are reused here at a boundary unit operation rather than only deep in the train, which is the point: each is reusable per unit operation.

bp:CLAR-001 a bp:ClarifiedHarvest ; rdfs:label "clarified harvest of BATCH-2026-001" ;
    bp:derivedFrom bp:BATCH-2026-001 ;
    bp:participatesIn bp:HARV-001 ;
    bp:turbidityNtu "3.2"^^xsd:float ;
    bp:hasQuantityValue bp:CLAR-001-turbidity-qv ;   # the same value, unit-qualified
    bp:hasTrace bp:Trace-CLAR-turbidity .            # acquired off an in-line OPC UA turbidity transmitter
bp:CLAR-001-turbidity-qv a qudt:QuantityValue ; rdfs:label "3.2 NTU (clarified-harvest turbidity)" ;
    qudt:numericValue "3.2"^^xsd:float ; qudt:ucumCode "[NTU]" .   # NTU has no clean QUDT/UCUM unit IRI; the UCUM-style code is illustrative
bp:Trace-CLAR-turbidity a bp:Quality ; rdfs:label "clarified-harvest turbidity tag" ;
    bp:tagName "HARV.Turb.PV" ;
    rdfs:seeAlso <https://example.org/historian/tag/HARV.Turb.PV> .

# An at-line titer on the clarified harvest, arriving as an Allotrope-container result.
bp:SMP-CLAR-001 a bp:Sample ; rdfs:label "clarified-harvest titer sample" ; bp:derivedFrom bp:CLAR-001 .
bp:Titer-Assay-CLAR a bp:Assay ; rdfs:label "Protein A HPLC titer on clarified harvest" ;
    bp:hasInput bp:SMP-CLAR-001 ; bp:hasResult bp:Titer-Result-CLAR .
bp:Titer-Result-CLAR a bp:Result ; rdfs:label "clarified-harvest titer result" ;
    bp:isAbout bp:SMP-CLAR-001 ; bp:hasChromatogram bp:ADF-Titer-CLAR .
bp:ADF-Titer-CLAR a bp:AnalyticalDataFile ; rdfs:label "titer chromatogram (Allotrope ADF)" ;
    rdfs:seeAlso <https://example.org/adf/Titer-CLAR-001.adf> .

The examples/platform/ontology/opcua_to_rdf.py adapter demonstrates exactly this wire-to-graph crossing for OPC UA: it reads a variable node off the controller's address space, recovers the unit from the node's EUInformation, and lands a unit-carrying observation on the same bp:hasTrace index — the general mechanism behind a transmitter tag like HARV.Turb.PV, and the subject of the shop floor and the digital twin.

In the real world

Centrifugation and depth-filtration clarification, with turbidity targets and yield tracking, are standard practice in commercial antibody recovery, and the recovery-and-purification literature treats the train of unit operations as exactly the discrete, sampled steps this chapter models nodes around [1][2]. The ISA-88 batch model already individuates the process into operations the same way [3], which is why minting a material node per unit operation aligns with how plants already think. The frontier the field is actively wrestling with is exactly this chapter's unsolved part: as processes go continuous, the comfortable batch-and-unit-operation individuation that makes genealogy clean starts to dissolve, and modeling lineage for a flowing process — without discrete batches to hang nodes on — is an open problem the open-source downstream chapter and the broader industry are still working through.

Key terms

Unit operation — a discrete processing step, modeled as an occurrent consuming one material and producing one or more new ones; the granularity at which downstream materials are individuated.
Clarification — separating product-bearing liquid from cells and debris (centrifugation, depth filtration), producing clarified harvest plus spent biomass.
Individuation — the modeling decision of where one material entity ends and the next begins; pinned by convention to unit-operation boundaries, stressed by pooling, splitting, and continuous flow.
Pooling / splitting — lineage that forks backward (many parents → one child) or forward (one parent → many children), which the simple parent-child derivedFrom must handle explicitly.
Mass balance (not enforced) — the fact that derivedFrom records lineage, not conservation; a traversable genealogy can be quantitatively impossible unless quantities are modeled and validated.
Upstream/downstream boundary — a human chapter heading, not a graph construct; to the model it is simply another derivedFrom edge in one continuous chain.

Where this leads

The product is separated and clarified, and we have settled how the model names matter as it flows. Part IV follows the antibody through purification. The next chapter, Modeling Capture Chromatography and the Pooling Problem, models the Protein A capture step that yields PApool-001 — and immediately puts the individuation question to work, because a capture pool is the canonical case of many cycles pooling into one material, the backward-forking lineage this chapter warned was coming.

What this chapter covers​

A unit operation consumes one material and produces two​

The unsolved part: where does one material end and the next begin?​

Why it matters​

From the wire to the graph​

In the real world​

Key terms​

Where this leads​