Skip to main content

Implementation: Building the Instance Graph

📍 Where we are: Part V · Implementation — the phase where the classes, axioms, and alignments of Parts III–IV become individuals in a loadable file. The methodology is still SAMOD's test-first loop, but the test data is a real mAb campaign: one frozen vial of engineered CHO cells, expanded, harvested, captured, polished, and filled into vials. We instantiate that campaign, then ask the lineage and release questions a manufacturer actually asks — which bank made this lot, is it within passage, what shares a failed lot's fate — against the graph we built.

A vocabulary that no batch ever populates is a hypothesis, not an ontology. Implementation is the act of asserting the individuals of one antibody campaign — the working cell bank bp:WCB-CHO-001, the clarified harvest bp:CLAR-001, the released drug substance bp:DS-001 — so the questions a quality unit lives by stop being promises and become queries with answers. This chapter walks the single file that holds the whole worked example, instances.ttl, end to end: from the transfection that creates the engineered line expressing the lead antibody, down the master-and-working cell-bank tiers, through the seed train where passage accumulates, across the harvest boundary where one tank of broth becomes two materials, through capture's many-cycles-into-one pool on a resin reused across batches, to the drug substance where eleven ancestors converge and release is decided. Every block below is a verbatim excerpt of that file. Nothing is invented; the headline numbers — passage 8 against a validated limit of 40, monomer 98.611, eleven ancestors of DS-001, an out-of-spec HMW aggregate that trips the gate — are the ones a reasoner checks.

The simple version

A class diagram is a set of forms: blank boxes labelled "cell bank," "batch," "drug substance." But a cell bank is not a blank box — it is a vial of living, frozen cells that every later batch descends from, the way every loaf traces back to a bakery's sourdough starter. Implementation is filling the forms in for one specific antibody: writing WCB-CHO-001 on the starter vial, BATCH-2026-001 on the production run it grew into, DS-001 on the released lot, and drawing the arrows that say this lot came from that bank. Once the forms are filled, you can finally ask a real manufacturing question — "if this lot fails, which others came from the same bank?" — instead of admiring empty boxes. This chapter fills in every form for one mAb campaign and shows the file that holds them.

Start from the questions a manufacturer asks

The instance graph exists to answer the lineage and impact questions that decide whether a medicine ships. Three competency questions drive almost everything below, and each is a real disposition question, not a structural one. CQ-01given any downstream material, which materials does it derive from, to any depth? — is the question an investigator asks when a released lot is suspect: trace it back through every purification step to the cell bank, answered by the derivedFrom chain we lay edge by edge. CQ-02given a working cell bank, which materials across the whole campaign descend from it? — is the question a recall scopes: if the bank is in doubt, everything aliquoted from it is too, answered by the inverse traversal from bp:WCB-CHO-001. CQ-03what is the originating bioreactor batch of a drug-substance lot, and its release monomer value? — links the released lot back to the run that made it, answered by walking from bp:DS-001 to bp:BATCH-2026-001 and reading 98.611. And CQ-04when a lot fails, which others share its fate? — is the one that turns a confident recall into a query rather than a guess. The supporting gates (CQ-17 cell-bank characterization, CQ-18 passage limit) also bind to individuals built here. Every instance we assert earns its place by serving one of these; nothing is modeled to look thorough.

Implement the making: the transfection that creates the engineered line

The genealogy of a biologic begins not with a thing but with an event — and keeping the event distinct from the thing is what later lets the graph answer how was this made? rather than only what is it? The cell-line slice declared CellLine ⊑ ∃createdBy.Transfection, encoding a manufacturing fact: an engineered CHO line does not simply exist, it was produced by introducing the antibody gene into a host. Implementation must satisfy that existential restriction with a real occurrent. The transfection consumes the expression vector and produces the engineered line, which expresses the discovery lead bp:mAb-A:

# instances.ttl — the line was MADE (an occurrent), satisfying CellLine ⊑ ∃createdBy.Transfection.
bp:CHO-host a bp:HostOrganism ; rdfs:label "Chinese hamster (Cricetulus griseus)" .
bp:CONSTRUCT-mAb-A a bp:GeneticConstruct ; rdfs:label "mAb-A expression vector" .

bp:CELLLINE-001 a bp:CellLine ; rdfs:label "engineered CHO line for mAb-A" ;
bp:hasHostOrganism bp:CHO-host ;
bp:createdBy bp:TF-001 ;
bp:expresses bp:mAb-A .
bp:TF-001 a bp:Transfection ; rdfs:label "transfection of mAb-A construct into CHO" ;
bp:hasInput bp:CONSTRUCT-mAb-A ; bp:hasOutput bp:CELLLINE-001 .
bp:CLONE-7 a bp:Clone ; rdfs:label "selected single-cell clone #7" .

The host is bp:CHO-host, typed bp:HostOrganism and aligned up to the NCBI Taxonomy IRI for Cricetulus griseus, never the string "CHO" [1]. This is the reuse discipline made concrete at the instance level, and it matters for a manufacturing reason: CHO is the workhorse host of commercial antibody production, its genome is sequenced and public, and a downstream tool that reads the taxon IRI can line our host up with anyone else's CHO line. The biology is a borrowed public identifier; only the program-specific clone and line are local mints. The bp:createdBy edge to one transfection is functional, so a reasoner treats a double-entered transfection record as the same event — a built-in deduplication of duplicate provenance that a flat spreadsheet of cell-bank records would silently admit twice.

Implement the lineage: RCB → MCB → WCB-CHO-001, the anchor root

Then the banking — a derivedFrom chain whose root is the most important node in the graph, because every later batch traces back to it and an error in its identity propagates to every descendant. The reason a manufacturing program builds three tiers rather than one is a regulatory and safety one: a single research bank (RCB) is characterized into a master bank (MCB), exhaustively tested and frozen once, and each campaign draws a working bank (WCB) from the master so the precious master is never exhausted. The working bank carries its passage count and the four characterization results the cell-bank gate requires [2]:

# instances.ttl — the cell-bank tiers: RCB -> MCB -> WCB (the root of the genealogy).
bp:RCB-CHO-001 a bp:ResearchCellBank ; rdfs:label "RCB-CHO-001" ;
bp:expresses bp:mAb-A ; bp:passageNumber 2 .
bp:MCB-CHO-001 a bp:MasterCellBank ; rdfs:label "MCB-CHO-001" ;
bp:derivedFrom bp:RCB-CHO-001 ;
bp:hasClone bp:CLONE-7 ;
bp:passageNumber 5 .
bp:WCB-CHO-001 a bp:WorkingCellBank ; rdfs:label "WCB-CHO-001" ;
bp:derivedFrom bp:MCB-CHO-001 ;
bp:hasHostOrganism bp:CHO-host ;
bp:expresses bp:mAb-A ;
bp:passageNumber 8 ;
bp:hasCharacterization bp:CR-identity , bp:CR-sterility , bp:CR-viral , bp:CR-genetic .
bp:PassageLimit-mAb-A a bp:ValidatedPassageLimit ; rdfs:label "validated passage limit, mAb-A line" ;
bp:validatedPassageLimit 40 .

This is the implementation answer to CQ-18: the bank stands at passage 8, the validated limit is 40, and within limit is a join the graph evaluates (8 <= 40). The passage number is not bureaucratic bookkeeping — it bounds how long the living culture may be grown before productivity drift and product-quality shift become a real risk, so the comparison is a genuine GMP gate. The four bp:CharacterizationResult individuals each test something a misidentified or contaminated bank would fail: CR-identity (is this really the mAb-A line?), CR-sterility (no mycoplasma), CR-viral (no adventitious agents), CR-genetic (the construct is stable). Each is bp:isAbout bp:WCB-CHO-001 with bp:verdict "PASS", satisfying CQ-17 and the qualified-cardinality restriction written earlier in this chapter. The passage count established here is the clock the seed train increments.

Hero diagram of the cell-bank genealogy as the implemented root of the whole thread: at the top a transfection process introduces an antibody construct into CHO host cells (labelled with the NCBI Taxonomy IRI for Cricetulus griseus) to create an engineered cell line and a selected clone; a derivedFrom chain then links the working cell bank WCB-CHO-001, the master cell bank MCB-CHO-001, and the research cell bank RCB-CHO-001 in a row, the working bank deriving from the master and the master from the research; WCB-CHO-001 is marked the anchor root carrying passage 8 against a validated limit of 40, and from it the lineage continues along a bottom row through SEED-001, the bioreactor batch BATCH-2026-001, the drug substance DS-001, and the drug product DP-001, each derivedFrom the previous; the host taxon and clone are tagged as borrowed public IRIs. The implemented root: a transfection creates the engineered line, a clone is banked through the research, master, and working tiers as a derivedFrom lineage, and WCB-CHO-001 becomes the anchor every downstream individual transitively traces back to. Original diagram by the authors, created with AI assistance.

Implement the seed train: passage accumulates as the cells expand

A WCB vial holds only a few million cells; production needs billions. The seed train bridges the gap by thawing the vial and growing the cells through ever-larger vessels — a shake flask, then a seed bioreactor — and each scale-up is modeled as a real expansion occurrent that consumes one cell material and produces a larger one, with the passage clock advancing at every stage (shake flask at 12, seed bioreactor at 16):

# instances.ttl — the seed train: real expansion stages, accumulating passage.
bp:SEEDFLASK-001 a bp:ShakeFlaskCulture ; rdfs:label "shake-flask seed culture" ;
bp:derivedFrom bp:WCB-CHO-001 ;
bp:participatesIn bp:EXP-001 ;
bp:passageNumber 12 .
bp:SEED-001 a bp:SeedBioreactorCulture ; rdfs:label "SEED-001 (seed bioreactor culture)" ;
bp:derivedFrom bp:SEEDFLASK-001 ;
bp:participatesIn bp:EXP-002 ;
bp:hasHostOrganism bp:CHO-host ;
bp:passageNumber 16 .

bp:BATCH-2026-001 then derivedFrom bp:SEED-001, so the production batch traces transitively back to WCB-CHO-001 — the spine CQ-01 walks, no matter how many expansion stages sit between. The passage climbs 8 → 12 → 16, all under the validated 40, so the GMP question was this batch inoculated from cells within the limit? is now a query rather than a reconstruction from lab notebooks. But the tidy two-node chain hides a real modeling judgment: a seed train is not actually a sequence of discrete events, it is continuous growth, cells dividing every day and occasionally split between vessels. Where to put the node boundaries — each vessel transfer, each day, each flask split — is a choice the biology does not make for you. Model too coarsely and you cannot trace a contamination to a specific transfer; too finely and the graph drowns in nodes nobody queries. We mint a node per expansion stage because that is the granularity at which cells are sampled and passage is counted, but the nodes are deliberate simplifications of a living continuum, not facts the biology hands over. The batch carries bp:monomerPct "98.611" as the in-process value the digital thread later reads when answering CQ-03 — though, as we will see, that result's faithful home is the released lot, not the batch.

Implement material individuation: harvest is one in, two out

Harvest and clarification are the downstream template, and they force the question hiding under every derivedFrom edge: when does one material stop being itself and become a new thing the graph should name? The product molecules in the clarified harvest are the same molecules that were in the broth — nothing was created, only separated from the cells and debris. So is the clarified harvest a new material entity deserving its own IRI, or the same material relabeled? The chemistry does not decide; the model does. The convention this book takes is to mint a new material node at each unit operation boundary, because that is the granularity at which materials get sampled, tested, pooled, and released — and clarification produces two such materials: the clarified harvest we keep and the spent biomass we discard [3]:

# instances.ttl — harvest clarification: one material in, two out (the downstream template).
bp:CLAR-001 a bp:ClarifiedHarvest ; rdfs:label "clarified harvest of BATCH-2026-001" ;
bp:derivedFrom bp:BATCH-2026-001 ;
bp:participatesIn bp:HARV-001 ;
bp:turbidityNtu "3.2"^^xsd:float .
bp:BIOMASS-001 a bp:SpentBiomass ; rdfs:label "spent biomass (cells + debris)" ;
bp:derivedFrom bp:BATCH-2026-001 .
bp:HARV-001 a bp:Clarification ; rdfs:label "harvest clarification" ;
bp:hasInput bp:BATCH-2026-001 ;
bp:hasOutput bp:CLAR-001 , bp:BIOMASS-001 .

The clarified harvest earns its own node because it is the thing that gets a turbidity result (3.2 NTU, its first in-process quality) and a forward handoff; the spent biomass earns one too, so contamination tracing and mass balance have a discarded stream to point at. The famous upstream-to-downstream boundary that organizes the textbooks is, to the graph, just another derivedFrom edge in one continuous chain. One thing the edge does not enforce is mass balance: derivedFrom records lineage, not conservation — it says "this came from that," not "this much came from that much." The graph will happily present a perfectly traversable genealogy in which the kept harvest plus the waste do not reconcile with what went in, because catching that requires modeling quantities explicitly and validating them. The honest standard is that the graph indexes lineage while mass balance lives in the process records. The bare 3.2 is honest but thin until it is unit-qualified, which is the subject of the next chapter.

Implement the pooling fork: PApool-001, one pool from many cycles

Capture chromatography is the first place the simple parent-child edge breaks down. Protein A capture binds the antibody specifically and washes everything else away, removing the bulk of impurities in one operation — but a column holds only so much per load, so the step is run as many cycles, each eluting a fraction, all combined into one pool. For the first time a material derivedFrom several parents at once, and the lineage forks backward [4]. The implementation refuses to collapse that fork: three cycle eluates are real individuals, and PApool-001 records exactly which it pooled through bp:includedFraction, a typed sub-property of derivedFrom:

# instances.ttl — the capture pool, with its cycle-level provenance preserved.
bp:ELU-001a a bp:CycleEluate ; rdfs:label "capture cycle 1 eluate" ; bp:derivedFrom bp:CLAR-001 .
bp:ELU-001b a bp:CycleEluate ; rdfs:label "capture cycle 2 eluate" ; bp:derivedFrom bp:CLAR-001 .
bp:ELU-001c a bp:CycleEluate ; rdfs:label "capture cycle 3 eluate (excluded by pooling)" ; bp:derivedFrom bp:CLAR-001 .

bp:PApool-001 a bp:CapturePool ; rdfs:label "PApool-001 (Protein A capture pool)" ;
bp:derivedFrom bp:CLAR-001 ;
bp:fromBatch bp:BATCH-2026-001 ;
bp:includedFraction bp:ELU-001a , bp:ELU-001b ;
bp:participatesIn bp:CAP-001 ;
bp:hasInProcessResult bp:IPR-PApool-hcp .

The headline edge PApool-001 derivedFrom CLAR-001 is asserted directly so the coarse chain stays reachable for CQ-01, while the cycle detail underneath answers the forensic "which load contributed?" question an investigation actually asks. Cycle 3 (bp:ELU-001c) was excluded by the real-time pooling decision and is kept as an individual routed through bp:POOL-DEC-001, so the pool's composition is explained, not merely recorded. Capture also introduces a consumable the earlier steps did not stress — a resin that is reused across batches:

# instances.ttl — the resin as a persisting consumable with a usage history — carryover is answerable.
bp:CAP-001 a bp:CaptureChromatography ; rdfs:label "Protein A capture step" ;
bp:hasInput bp:CLAR-001 ; bp:hasOutput bp:PApool-001 ;
bp:performedOn bp:RESIN-PrA-07 .
bp:RESIN-PrA-07 a bp:ResinLot ; rdfs:label "Protein A resin lot PrA-07" ;
bp:hasDisposition bp:PrA-bind ;
bp:cycleCount 38 ; bp:cycleLifetimeLimit 200 .
bp:PrA-bind a bp:BindingDisposition ; rdfs:label "Protein A antibody-binding disposition" ;
bp:isRealizedIn bp:CAP-001 .

The Protein A resin is expensive and used for many cycles across many batches before disposal, so it is a tracked material entity — bp:RESIN-PrA-07 carrying its bp:cycleCount (38) against a bp:cycleLifetimeLimit (200) — bearing a disposition to bind antibody. Because the lot persists across batches, it creates a genealogy the product chain alone misses: a degradation or carryover on the resin could link batches that share no product lineage, which is what makes "which batches shared this resin?" a query rather than an archaeology project. Two honesties about this node are worth naming. First, alignment honesty: bp:ChromatographyColumn and bp:ChromatographyResin align up to verified IOF terms (iof:ChromatographyColumn, iof:ChromatographyMedium), but bp:CaptureColumn and bp:ResinLot are marked ILLUSTRATIVE local placeholders in align.ttl, because no settled one-to-one external leaf for a Protein A capture column or a single resin lot exists yet — the discipline is to anchor what genuinely anchors and flag the rest, not to fake a borrow. Second, validated versus measured: the cleaning that controls carryover between cycles is a validated claim about the cleaning process, not a per-batch measurement — and the resin at cycle 200 has the same living-thing identity softness a cell line has, since it degrades and is cleaned and retired. The graph carries the cycle count that makes carryover answerable, but the full cross-batch usage history is a cost-versus-fidelity trade every plant negotiates.

Identity card dissecting the PApool-001 capture-pool node as implemented: a type row (Protein A capture pool, a material entity); a pooling block showing includedFraction forking backward to the cycle-eluate parents ELU-001a and ELU-001b, each deriving from the clarified harvest, with a pooling-decision note marking that cycle 3 was excluded; a process row typed as a chromatography unit operation on the capture step; a resin row pointing at the reused resin lot RESIN-PrA-07 with its binding disposition, cycle count, and lifetime limit; an in-process host-cell-protein result on the pool as an evidenced quality; and a forward note that this pool feeds viral safety and polishing. The implemented capture pool: a material that derives from its pooled cycle eluates (the backward fork) with cycle 3 excluded by the pooling decision, produced on a tracked resin lot reused across batches — pooling provenance preserved as individuals, not collapsed to a single edge. Original diagram by the authors, created with AI assistance.

Implement the convergence: DS-001 and its eleven ancestors

The drug substance is where every upstream lineage individual converges to a single material, and where the binding release specification attaches [5]. UF/DF concentrates the polished product and exchanges it into its formulation buffer, yielding DS-001. Through transitive derivedFrom, bp:DS-001 traces back through the polishing intermediate, the two viral pools, the capture pool, the clarified harvest, the bioreactor batch, the seed train, to WCB-CHO-001 and on up the cell-bank tiers to RCB-CHO-001eleven ancestors the lineage walk returns. This is the lot a release decision is made on, so the full CQA panel sits here, on the lot it actually describes:

# instances.ttl — the convergence node carrying the full release panel.
bp:DS-001 a bp:DrugSubstance ; rdfs:label "DS-001" ;
bp:derivedFrom bp:POLpool-001 ;
bp:participatesIn bp:UFDF-001 ;
bp:hasSpecification bp:Spec-DS-mAb-A ;
bp:releaseStatus "PASS" ;
bp:monomerPct "98.611"^^xsd:float ;
bp:hmwPct "1.287"^^xsd:float ;
bp:cexMainPct "70.686"^^xsd:float ;
bp:hcpPpm "12.0"^^xsd:float ;
# ... residual DNA, endotoxin (more panel scalars) elided ...
bp:proteinConcMgPerMl "50.2"^^xsd:float ;
# ... QUDT value nodes (monomerValue, cexValue) elided ...
bp:approvedBy bp:SIG-DS-001 ;
bp:hasCertificate bp:CofA-DS-001 .

Each scalar on this panel is a real antibody quality attribute with a manufacturing reason to exist. monomerPct (98.611) is the fraction of intact, correctly-folded monomer; hmwPct (1.287) is the high-molecular-weight aggregate the polishing step removes, controlled because aggregates can provoke an immune response in patients; cexMainPct (70.686) is the main charge-variant peak, a proxy for post-translational homogeneity; hcpPpm (12.0) is residual host-cell protein. Placing monomerPct here — not on the bioreactor batch where the coarse loader once parked it — is the deliberate attribution correction the drug-substance chapter made in the open: the bioreactor broth was never assayed for final monomer purity, so monomer purity is a release attribute of the lot, measured on a sample of DS-001. The lesson is general and worth keeping: where a result attaches is a modeling decision with consequences, and a graph can be perfectly traversable and still subtly mis-locate the fact it is supposed to certify. From this one node the lineage forks forward: DP-001 and DP-002 both derivedFrom DS-001 (the fill fork, mirror of capture's backward pooling fork), and the out-of-spec sibling lineage WCB-CHO-001 → SEED-004 → BATCH-2026-004 → PApool-004 → DS-004 → DP-004 shares the same root — which is precisely why a recall-impact query can reach both fates from the one bank.

The recipe as portable process knowledge

Not every implemented individual is a material. The package that carried the whole campaign onto the plant floor — the master recipe — is knowledge in transferable form, and it is modeled as a generically dependent continuant: a plan that can be copied from one site to another without being consumed. Its structure comes from ISA-88 (IEC 61512), the batch-control standard, which models a recipe as a hierarchy of procedures, operations, and phases and separates the recipe from the equipment that executes it:

# instances.ttl — the recipe as ISA-88 information, realized by a run, transferred between sites.
bp:Recipe-mAb-A a bp:MasterBatchRecord ; rdfs:label "master recipe, mAb-A" ;
bp:hasRecipeElement bp:RP-production .
bp:RP-production a bp:RecipePhase ; rdfs:label "production-phase recipe element" ;
bp:prescribesParameter bp:FeedRate , bp:Temperature ;
bp:requiresEquipment bp:REQ-2000L .
bp:REQ-2000L a bp:EquipmentRequirement ; rdfs:label "2000 L single-use production bioreactor requirement" .
bp:BR-204 a bp:ProductionBioreactor ; rdfs:label "BR-204 (receiving-site vessel)" ;
bp:locatedAt bp:SITE-B ; bp:qualifiesFor bp:REQ-2000L .
bp:TT-001 a bp:TechTransfer ; rdfs:label "tech transfer of mAb-A to site B" ;
bp:transferredFrom bp:SITE-A ; bp:transferredTo bp:SITE-B ; bp:isAbout bp:Recipe-mAb-A .

The move that makes a recipe portable is modeling what each step needs separately from what any site has. RP-production does not require "vessel BR-101"; it requires bp:REQ-2000L — a production bioreactor of a given class and scale — an equipment requirement that a real vessel fills by playing a role. The cell-culture run bp:CCP-001 bp:realizes bp:Recipe-mAb-A at the originating site, while bp:BR-204 at site B declares bp:qualifiesFor bp:REQ-2000L, so the same recipe runs on a different vessel without rewriting and the graph can check, mechanically, whether a candidate vessel qualifies. This information-artifact-versus-occurrent split is what lets one recipe be realized by many runs and transferred between sites. The B2MML XML the originating MES actually exchanges is walked element-by-element into exactly these triples by the companion's loader — the same recipe, serialized as the wire format on one side and as IOF-aligned RDF on the other. What the model cannot guarantee is the process: a culture at 2,000 L mixes, oxygenates, and shears differently than one at 2 L, and a new site's water and raw-material lots behave subtly differently, so a CQA can shift even with every modeled parameter held identical. The graph flags the transfer and links the engineering and confirmation runs that probe it; it cannot predict the shift. A portable model is not a portable process.

Evaluation: does the loaded graph answer its questions?

A model is validated by loading it, reasoning over it, and running the competency questions [6]. validate.py parses bioproc.ttl + align.ttl + instances.ttl2120 triples as authored — applies the OWL-RL closure, and the graph closes to 7137 triples, the inferred edges included. Because bp:derivedFrom is transitive, the long-range lineage is entailed, not hand-asserted: the backward walk from bp:DS-001 returns eleven ancestors (answering CQ-01 and, reading the originating batch's 98.611, CQ-03), and the inverse traversal answers CQ-02:

# queries/CQ-02.rq — Cell-bank impact: every material that descends from the working cell bank.
PREFIX bp: <https://example.org/bioproc#>
SELECT ?descendant WHERE {
?descendant (bp:derivedFrom)+ bp:WCB-CHO-001 .
} ORDER BY ?descendant

Run on the reasoned graph it returns the entire campaign — both the golden DP-001/DP-002 lineage and the out-of-spec DP-004 sibling — every node tracing to this one bank, exactly the cell-bank-level reach a contamination concern needs. The impact query CQ-04 sharpens it: when DP-004 fails, walking up its lineage to the shared WCB-CHO-001 and back down returns ['DP-001', 'DP-002'] — the shared-fate set a recall must scope, computed by query instead of by quarantining the whole plant:

# queries/CQ-04.rq — Impact analysis: when DP-004 fails, which drug products share its lineage?
PREFIX bp: <https://example.org/bioproc#>
SELECT DISTINCT ?affected WHERE {
bp:DP-004 (bp:derivedFrom)+ ?shared . # an ancestor of the failed lot
?affected (bp:derivedFrom)+ ?shared . # anything else derived from it
?affected a bp:DrugProduct .
FILTER(?affected != bp:DP-004)
} ORDER BY ?affected

The closed-world SHACL gate then confirms WCB-CHO-001 carries its four characterizations and passage count (CQ-17), and the release gate flags only hmwPct on DS-004/DP-004 (2.41 against a limit of 2.0 %) — the aggregate that is the actual failure mode here, since monomer is in spec at 98.687 — isolating the one real violation while every other panel value, on every other lot, passes.

The unsolved part: a loadable graph is a snapshot, not a living record

The instance graph is honest, traversable, and reasoned — and it is frozen. It captures one campaign at one moment: passage 8, monomer 98.611, eleven ancestors. But two deeper truths the chapter surfaced sit underneath that freeze. The first is that the things the graph names are alive and continuous. The cell bank is a population of cells that mutate and drift over generations, so "the working bank" is not genetically identical to the master, and no owl:sameAs answers whether the culture at passage 60 is the same entity as at passage 5 — identity here is a useful fiction bounded by characterization, not the crisp sameness an IRI implies. Worse, and historically real, is misidentification: cell lines have been confused and cross-contaminated across the life sciences for decades, and for a manufacturing root node it is the worst possible error, because it is asserted with full confidence, propagates through every derivedFrom edge, and no downstream data integrity catches it — every downstream fact is correctly derived from a wrongly identified root. The second truth is that the individuation conventions the graph commits to are not metaphysical facts: minting bp:CLAR-001 at a unit-operation boundary, treating PApool-001 as one pool from discrete cycles, anchoring release on a discrete bp:DS-001 lot all assume a batch world. As processing goes continuous, those discrete individuals lose their natural boundaries, and "the release lot" becomes a time-bounded convention imposed on a continuum rather than a natural object — one the ontology can record but not derive. So the loadable graph proves the model can answer its questions on real data; it does not prove the data is what it claims at the root, nor that the discrete nodes survive a process with no batches, both of which are governance and maintenance problems, not implementation ones.

Why it matters

Individuals are where an ontology stops being a diagram and starts being a manufacturing record. The 2120 triples of instances.ttl, closing to 7137 under the reasoner, are the difference between claiming the model can trace a lot to its cell bank and showing it return eleven ancestors of DS-001, then scope a recall to DP-001 and DP-002 when DP-004 fails — in two queries, not two weeks of spreadsheet archaeology. Every competency question the release gate and the digital thread depend on is answered by traversing edges asserted here. Build the instance graph faithfully — the transfection as an occurrent, the cell-bank tiers as the root, lineage on the one transitive spine, pooling forks preserved on a tracked resin, release attributes on the lots they describe — and the questions that gate a medicine resolve by query. Skip it, or hang the purity result on the wrong node, and the release-critical part of the graph is an untested hypothesis no reasoner ever ran.

In the real world

Every commercial mammalian-cell program already maintains the RCB/MCB/WCB lineage, the seed-train passage history, the per-step pooled materials and tracked resin lots, and the released drug-substance lot with its certificate — the cell-bank hierarchy and its characterization are a regulatory expectation, not optional good practice [1][3][4]. What is uneven is that those records live scattered across an MES, a LIMS, an ELN, and chromatography logbooks, so "we have traceability" is a claim about filing. The implementation modeled here — a single loadable file where the cell bank, the pool, the resin, and the drug substance are nodes joined by typed derivedFrom edges — is exactly the artifact that turns that claim into a property a query can verify, and turns a recall from a campaign-wide quarantine into a scoped set of two sibling lots.

Key terms

  • Instance graph — the asserted individuals of one mAb campaign (bp:WCB-CHO-001, bp:CLAR-001, bp:DS-001 …) populating the vocabulary's classes; instances.ttl, the loadable running example.
  • Material individuation — the convention of minting one new material node per unit operation; pinned to boundaries like clarification, stressed by pooling, splitting, and continuous flow, and not a fact the biology hands you.
  • Mass balance (not enforced)derivedFrom records lineage, not conservation; a traversable genealogy can be quantitatively impossible unless quantities are modeled.
  • Backward pooling fork / carryover lineagePApool-001 derivedFrom several cycle eluates via bp:includedFraction, on a resin (RESIN-PrA-07) reused across batches that links lots no product lineage connects.
  • Convergence nodebp:DS-001, where the entire upstream lineage meets one material with eleven transitive ancestors and the binding release specification; the corrected home of the monomerPct release result.
  • Transitive entailment — the long-range derivedFrom edges the OWL-RL closure infers (2120 → 7137 triples), so lineage to any depth needs no hand assertion.
  • Portable information artifact — the master recipe, an ISA-88-structured generically dependent continuant realized by a run and transferred between sites by binding to equipment requirements, not vessels.

Where this leads

The vocabulary is now a populated campaign graph, but several of its scalars — the harvest's bare 3.2 NTU, the run's bare setpoints — carry no unit and no provenance, which is honest but thin. The next chapter, From the Wire to the Graph, shows how those numbers actually arrive: off an OPC UA transmitter, through a historian, into a unit-qualified qudt:QuantityValue and an Allotrope-backed result — closing the gap between a clean instance graph and the messy plant signals that feed it.