Skip to main content

Conceptualization: Relations, derivedFrom, and the Genealogy Spine

📍 Where we are: Part III · Conceptualization — the lifecycle phase where the classes from the last chapter are wired together with relations. Methodology: NeOn's conceptualization activity, run on SAMOD's test-first loop — each relation earns its place by answering a competency question about a real CHO monoclonal-antibody campaign.

A taxonomy of classes is a set of labeled boxes. What makes it a graph — something you can walk, query, and reason over — is the relations between the boxes. This chapter conceptualizes the load-bearing ones, and it spends most of its weight on a single relation, bp:derivedFrom, because that one edge is the spine the entire digital thread of a monoclonal-antibody lot hangs from: every vial a patient receives traces, hop by hop, back to one frozen vial of working cell bank. The other three relations it builds — affectsQuality, occursIn, contains — each answer a different manufacturing question, and one of them, contains, exists precisely to not be confused with lineage when a recall has to be scoped.

The simple version

A family tree has two kinds of line in it. One says "descended from" — your grandparent is your ancestor, forever, no matter what. The other says "lives in the same house right now" — which changes when someone moves out. Confuse the two and you get nonsense ("my roommate is my ancestor"). A bioprocess graph has the same two lines: derivedFrom is the permanent "made from" ancestry that roots every vial of antibody in one frozen cell bank, and contains is the mutable "packed inside right now" that changes every time a shipping case is opened. This chapter draws both, keeps them apart on purpose, and adds the Quality-by-Design edge that says which process knob affects which quality of the drug.

Start from the questions

Four competency questions from the specification shape this chapter, and they split cleanly across the four relations. CQ-06 ("which process parameters affect monomer purity?") is answered directly by affectsQuality — and monomer purity is the headline release attribute of an antibody, the fraction that is intact, properly folded monomer rather than aggregate. CQ-21 ("which vessel did a cell-culture run occur in?") is answered by occursIn, the edge that lets us keep the batch material and its bioreactor as separate things. CQ-13 ("what serialized vials are contained, transitively, within a given pallet?") is the contains walk, and CQ-14 ("is containment kept distinct from genealogy?") is the guard that proves contains and derivedFrom never collapse into each other — the difference between scoping a recall by lineage and accidentally recalling a shipping carton. Holding those four in view, every relation below is judged by whether it makes its manufacturing question answerable.

The fork that decides everything: object vs datatype property

OWL draws one distinction worth fixing before any relation is authored: an object property links a thing to another thing, while a datatype property links a thing to a literal value [1]. bp:derivedFrom relates a lot to a parent lot — an edge you can walk back to the cell bank. bp:monomerPct relates a drug-substance lot to the number 98.611 — its SEC %monomer release reading, a value you can read. That is the difference between a relationship and a measurement, and it decides which of the two you author. The genealogy spine is object properties all the way down; the release numbers an antibody lot must hit — monomer purity, high-molecular-weight aggregate, charge-variant distribution, protein concentration — are datatype properties hung on the lot. Here is the seam, both kinds declared verbatim from the vocabulary:

# bioproc.ttl — object property (an edge to walk) vs datatype property (a value to read).
bp:derivedFrom a owl:ObjectProperty , owl:TransitiveProperty ;
rdfs:label "derived from" ;
rdfs:domain bp:Material ; rdfs:range bp:Material ;
skos:definition "Relates a material to the parent material it originated from; transitive, so lineage is inferable to any depth." .

bp:monomerPct a owl:DatatypeProperty ; rdfs:label "SEC %monomer" .
bp:passageNumber a owl:DatatypeProperty ; rdfs:label "passage number" ; rdfs:range xsd:integer .

The rdfs:domain and rdfs:range on derivedFrom are not documentation — they are rules. Because both ends are pinned to bp:Material, asserting X derivedFrom Y types both X and Y as materials, and if Y were declared something disjoint from bp:Material — a person who signed the batch record, a piece of equipment, a process — the graph is flagged inconsistent. A loose edge becomes a typed one. This matters because lineage edges get loaded from many source systems (the ELN, the MES, the LIMS), and a careless load that points a lot's parent at the operator who ran it, or at the bioreactor it grew in, is exactly the kind of error that quietly poisons a genealogy walk; domain and range turn that error into a flagged contradiction instead of a silent lie.

The one transitive spine: derivedFrom

Transitivity is the single most valuable axiom in this book, and it lives on one property [1]. Declaring bp:derivedFrom a owl:TransitiveProperty tells a reasoner that if SEED-001 derivedFrom SEEDFLASK-001 and SEEDFLASK-001 derivedFrom WCB-CHO-001, then SEED-001 derivedFrom WCB-CHO-001 — and, hop after hop, that the drug substance DS-001 derives all the way up to the research cell bank, without anyone asserting a single long-range link. Only the immediate parent edges are stated. The running example's lineage is eleven materials deep, and the seed train is where its first concrete edges of a specific campaign are laid down: a vial of WCB-CHO-001 is thawed, expanded through a shake flask, then a seed bioreactor, each scale-up a process that consumes one material and produces the next.

# instances.ttl — the seed train lays the first derivedFrom edges of the campaign, passage climbing.
bp:SEEDFLASK-001 a bp:ShakeFlaskCulture ; rdfs:label "shake-flask seed culture" ;
bp:derivedFrom bp:WCB-CHO-001 ; # rooted in the cell bank
bp:participatesIn bp:EXP-001 ;
bp:passageNumber 12 .
bp:SEED-001 a bp:SeedBioreactorCulture ; rdfs:label "SEED-001 (seed bioreactor culture)" ;
bp:derivedFrom bp:SEEDFLASK-001 ; # ...one hop back to the shake flask
bp:participatesIn bp:EXP-002 ;
bp:passageNumber 16 . # the count carried forward, checkable at release
bp:BATCH-2026-001 a bp:Batch ; rdfs:label "BATCH-2026-001" ;
bp:derivedFrom bp:SEED-001 . # ...and so, transitively, back to WCB-CHO-001

Why the root needs genealogy: the cell-bank tiers

The node every one of these edges climbs toward is not arbitrary. Mammalian-cell antibody manufacturing builds its cell substrate in tiers — a research cell bank (RCB), a master cell bank (MCB) drawn from it, and a working cell bank (WCB) drawn from the master — and each routine campaign is inoculated from a WCB vial so the rare, irreplaceable master is spent slowly. These are not optional housekeeping; the tier structure is a regulatory expectation for cell substrates, because the whole identity, sterility, and viral safety of every future lot rests on a characterized, frozen origin. The model makes the tiers a small taxonomy under bp:CellBank, and — crucially — declares them mutually disjoint, so a single bank can never be typed as two tiers at once:

# bioproc.ttl — the cell-bank tiers, and the axiom that keeps a bank in exactly one tier.
bp:CellBank a owl:Class ; rdfs:subClassOf bp:Material ; rdfs:label "Cell bank" .
bp:ResearchCellBank a owl:Class ; rdfs:subClassOf bp:CellBank ; rdfs:label "Research cell bank (RCB)" .
bp:MasterCellBank a owl:Class ; rdfs:subClassOf bp:CellBank ; rdfs:label "Master cell bank (MCB)" .
bp:WorkingCellBank a owl:Class ; rdfs:subClassOf bp:CellBank ; rdfs:label "Working cell bank (WCB)" .
[] a owl:AllDisjointClasses ; owl:members ( bp:ResearchCellBank bp:MasterCellBank bp:WorkingCellBank ) .

In the dataset those tiers are themselves a derivedFrom chain — WCB-CHO-001 derivedFrom MCB-CHO-001 derivedFrom RCB-CHO-001 — so the spine does not start at the working bank; it climbs through master and research to the very first frozen ampoule. The working bank also reaches sideways off the spine to the facts that make it usable: it expresses bp:mAb-A (the antibody it produces), and through the master bank's bp:hasClone bp:CLONE-7 it points back to the single-cell clone the line was selected from. The CHO host that does the expressing is not a string — bp:WCB-CHO-001 bp:hasHostOrganism bp:CHO-host, the same bp:CHO-host instance the engineered cell line carries — so "which organism makes this product?" resolves to one shared, NCBI-Taxonomy-aligned individual rather than the word "CHO" retyped in a dozen places. The bank's existence at the top of the chain is itself the output of an occurrent: the line was createdBy bp:TF-001, a transfection that takes the genetic construct as input and yields the cell line as output. Modeling that step as a process (not a footnote) is what lets "where did this line come from?" be a one-hop answer rather than a paragraph in a development report.

The passage count rides the same chain

The passage count — how many times the culture has been split and regrown — rides the genealogy. The working cell bank carries passageNumber 8, the shake-flask culture 12, the seed-bioreactor culture 16; each expansion adds generations [1]. Passage count is not bookkeeping: it bounds how long living cells may be grown before productivity drifts and product quality shifts, so a validated passage limit (here bp:PassageLimit-mAb-A bp:validatedPassageLimit 40) is a governance constraint a batch must respect. Because the count is a datatype property on each material, and lineage is the transitive object property between them, the graph answers the GMP question was this batch inoculated from cells within the validated passage limit? as a join rather than a manual reconstruction from lab notebooks — a fact established at the root meeting a count accumulated along the train. derivedFrom carries two rdfs:subPropertyOf refinements — bp:fromBatch and bp:includedFraction — so a more specific downstream edge (a clarified fraction included in a Protein A capture pool, a material from a batch) still entails the generic lineage edge a reasoner walks.

A genealogy chain reading left to right: the working cell bank WCB-CHO-001 at passage 8 expands via the process EXP-001 in shake flask SF-01 into the shake-flask culture SEEDFLASK-001 at passage 12, which expands via EXP-002 in seed bioreactor SBR-01 into the seed-bioreactor culture SEED-001 at passage 16, which inoculates the production batch BATCH-2026-001; a dashed transitive derivedFrom arc runs from the batch back to the cell bank, and a note records the validated passage limit of 40 that passages 8, 12, and 16 are all within. The transitive spine in action: each expansion is a process turning one cell material into the next, only the immediate derivedFrom edges are asserted, and the dashed arc is what a reasoner infers — the batch traces back to the working cell bank through hops no one stated. Original diagram by the authors, created with AI assistance.

Why the bank must be characterized — and why a reasoner alone will not enforce it

The reason genealogy matters at the root is that a misidentified cell bank is the worst possible error in this entire industry. Cross-contaminated and misidentified cell lines have quietly corrupted decades of life-science work, and for a manufacturing root node the failure is uniquely dangerous: a wrong identity at the cell bank propagates transitively, with full confidence, to every batch, every drug-substance lot, and every vial that derives from it — and no downstream integrity check can catch it, because every downstream record is internally consistent. So a working cell bank is required to carry characterization evidence — identity (is it the cell line we think it is?), sterility/mycoplasma (is it free of microbial contamination?), adventitious-agent / viral safety, and genetic stability — each a bp:CharacterizationResult. The model encodes that requirement as a qualified cardinality restriction on the WCB class:

# bioproc.ttl — a working cell bank must carry at least one characterization result.
bp:WorkingCellBank rdfs:subClassOf [ a owl:Restriction ;
owl:onProperty bp:hasCharacterization ;
owl:minQualifiedCardinality "1"^^xsd:nonNegativeInteger ; owl:onClass bp:CharacterizationResult ] .

In the running example WCB-CHO-001 carries all four: bp:hasCharacterization bp:CR-identity , bp:CR-sterility , bp:CR-viral , bp:CR-genetic, each a result that isAbout the bank with a verdict "PASS". But OWL is open-world — a missing characterization reads as unknown, not absent — so the restriction states the necessary condition without catching a gap. The runnable catch is a closed-world SHACL gate that demands the bank actually carry the evidence (CQ-17), and exactly one passage count, before it conforms:

# shapes.ttl — the cell-bank gate: a WCB must carry characterization and exactly one passage count.
bp:CellBankShape a sh:NodeShape ;
sh:targetClass bp:WorkingCellBank ;
sh:property [
sh:path bp:hasCharacterization ;
sh:minCount 1 ;
sh:message "A working cell bank must carry at least one characterization result." ] ;
sh:property [
sh:path bp:passageNumber ;
sh:minCount 1 ; sh:maxCount 1 ; sh:datatype xsd:integer ;
sh:message "A working cell bank must record exactly one passage count." ] .

The functional properties bp:hasHostOrganism, bp:hasClone, and bp:createdBy are declared owl:FunctionalProperty for the same reason: a bank descends from exactly one clone, has exactly one host, was created by one transfection — so two source systems each asserting the host collapse to one fact rather than duplicating provenance. This is where the instances chapter leaves its mark on the relations: the graph can pin a stable IRI to a bank, but it cannot make a living culture stop mutating. A culture at passage 60 is not, biologically, the same population it was at passage 5, and owl:sameAs cannot adjudicate whether it is still "the same line." The IRI is stable; the cells drift. That tension is the honest limit the passage limit and the characterization gate exist to manage — and the reason the genealogy spine is worth getting exactly right.

affectsQuality: the QbD knowledge as one edge

The preface promised that Quality by Design is secretly an ontology, and this is where that becomes literal. The link a development team works hardest to establish — this critical process parameter affects that critical quality attribute — is one object property waiting to be declared [2]. For an antibody, "feed rate affects monomer purity" is not a slogan; it is the kind of finding that decides whether a lot aggregates in the bioreactor or comes out as intact monomer. Declared as bp:affectsQuality from a bp:ProcessParameter to a bp:QualityAttribute, the fact stops living in a development report and becomes a queryable edge:

# bioproc.ttl — the QbD link, declared with its domain and range.
bp:affectsQuality a owl:ObjectProperty ;
rdfs:label "affects quality" ;
rdfs:domain bp:ProcessParameter ; rdfs:range bp:QualityAttribute ;
skos:definition "The Quality-by-Design link: a critical process parameter affects a critical quality attribute." .

But an edge alone is a claim without backing. What makes process knowledge trustworthy is the evidence hung on the parameter — the criticality assessment, the normal operating range (NOR), the wider proven acceptable range (PAR), and the design-of-experiments study that established the link [2]. In the dataset both feed rate and culture temperature carry real affectsQuality edges to the monomer CQA, and feed rate carries its full evidence trail:

# instances.ttl — affectsQuality with its NOR/PAR evidence and the study that proved it.
bp:FeedRate a bp:ProcessParameter ; rdfs:label "feed rate (CPP)" ;
bp:affectsQuality bp:MonomerPct-CQA ;
bp:hasCriticality bp:CRIT-FeedRate ;
bp:hasNormalOperatingRange bp:NOR-FeedRate ; # NOR 0.35-0.45
bp:hasProvenAcceptableRange bp:PAR-FeedRate ; # PAR 0.30-0.50 (wider)
bp:establishedBy bp:DOE-07 .
bp:Temperature a bp:ProcessParameter ; rdfs:label "culture temperature (CPP)" ;
bp:affectsQuality bp:MonomerPct-CQA .
bp:NOR-FeedRate a bp:NormalOperatingRange ; rdfs:label "feed-rate NOR 0.35-0.45" ;
bp:norLow 0.35 ; bp:norHigh 0.45 .
bp:PAR-FeedRate a bp:ProvenAcceptableRange ; rdfs:label "feed-rate PAR 0.30-0.50" ;
bp:parLow 0.30 ; bp:parHigh 0.50 .

The NOR is the tighter routine window the parameter is held to; the PAR is the wider region proven to still yield acceptable antibody. Modeling both as typed ranges — not as a sentence in a report — is what turns "feed rate matters" into "feed rate affects monomer purity per study DOE-07, controlled to its NOR 0.35–0.45 inside the PAR 0.30–0.50," a fact an investigator deviation-hunting on a low-monomer batch, or an auditor, can stand on. The response surface that says exactly how much monomer a given feed rate buys is not flattened into triples — that would balloon the graph into millions of meaningless rows and still lose the surface's shape; it is referenced by IRI through bp:DESIGNSPACE-mAb-A bp:referencesModel, keeping the graph the navigable index of the design space rather than its warehouse. The graph says that feed rate and temperature jointly affect monomer; the fitted model says how much.

occursIn: the run to its vessel

The next relation exists to prevent a category error the taxonomy chapter warned about — the "the batch is the bioreactor" conflation a naive multi-source load produces when the MES has one row for "the run in BR-101" and someone types it as both the material and the vessel. The batch is a bp:Material; the production bioreactor is bp:Equipment; the two are declared disjoint. So the link between them cannot be a second rdf:type on one node. It is an explicit object property from the process to the equipment:

# bioproc.ttl — occursIn ties a run to the persisting vessel it happened in, not by re-typing the batch.
bp:occursIn a owl:ObjectProperty ; rdfs:label "occurs in" ;
rdfs:domain bp:Process ; rdfs:range bp:Equipment ;
skos:definition "Relates a process to the persisting equipment in which it occurs (the run to its vessel) — in place of typing the batch material as equipment." .

The seed-train expansions already use it — bp:EXP-001 ... bp:occursIn bp:SF-01 (a shake flask), bp:EXP-002 ... bp:occursIn bp:SBR-01 (a seed bioreactor) — and the production culture run does too: bp:CCP-001 ... bp:occursIn bp:BR-101. This is what makes CQ-21 a one-hop query instead of an ambiguous second type on the batch node: ask which bp:CellCultureProcess occurred in which vessel, and the run-to-vessel fact is right there on its own edge, the vessel resolving to BR-101, a ProductionBioreactor. The distinction earns its keep when a vessel is implicated in a contamination: the bioreactor persists across campaigns while each batch material is consumed, so "which other runs occurred in this same vessel?" is only answerable if the vessel is its own individual, not a label fused onto one batch. A refinement, bp:performedOn (a sub-property of occursIn), pins a unit operation to the specific chromatography column or resin lot it ran on, so "which batches shared this resin?" — a real carryover-investigation question, since a Protein A resin is reused for many cycles — is answerable too.

contains: the hierarchy that is NOT lineage

The fourth relation is the one most easily confused with the first, and the model spends a deliberate effort keeping them apart. Once the antibody is filled into vials, the lot stops being a bulk material measured by concentration and becomes a population of counted, individually identified units — the bulk-to-discrete shift the instances chapter makes concrete. Aggregation — vials packed into a carton, cartons into a case, cases onto a pallet — is a parent-child containment structure mandated by track-and-trace regulation [3]. A carton does not derive from its vials; it contains them. So bp:contains is its own transitive object property, and the comment on its declaration is explicit that it is not a sub-property of derivedFrom:

# bioproc.ttl — contains is transitive like derivedFrom, but deliberately NOT a sub-property of it.
bp:contains a owl:ObjectProperty , owl:TransitiveProperty ; rdfs:label "contains" ;
rdfs:domain bp:Package ; rdfs:range bp:Material ;
skos:definition "The packing containment hierarchy (carton contains vials, case contains cartons, pallet contains cases). DELIBERATELY NOT a sub-property of derivedFrom: containment is mutable and is not lineage." .

The instances are a clean three-level chain over the one serialized vial in the dataset — a vial that itself derivedFrom its drug-product lot, so its release quality is inherited through the lot rather than restated on every vial:

# instances.ttl — vial -> carton -> case -> pallet (containment, NOT genealogy).
bp:CARTON-001 a bp:Carton ; rdfs:label "carton C-001" ; bp:contains bp:VIAL-DP-001-000042 .
bp:CASE-001 a bp:Case ; rdfs:label "case CS-001" ; bp:contains bp:CARTON-001 .
bp:PALLET-001 a bp:Pallet ; rdfs:label "pallet P-001" ; bp:contains bp:CASE-001 .

Both contains and derivedFrom are transitive, which is exactly why they must stay separate: a bp:contains+ walk answers "what is this packed inside, right now?" (mutable — it changes when a case is opened and repacked at a distributor), while a bp:derivedFrom+ walk answers "what was this made from?" (permanent — the vial's path back to the cell bank never changes). Collapse them and a recall query would treat a shipping carton as an ancestor of its contents — the family-tree-versus-roommate confusion, with regulatory consequences: a recall scoped by a muddled relation either misses contaminated units or quarantines the loading dock. (The serialized vial's own GS1 item key versus its ontology IRI — two legitimate identity systems for one object — is the lot-versus-item individuation tension the serialization model takes up, and the reason contains is designed now to keep that future cleanly off the lineage spine.)

Evaluation: the relations answer their questions

validate.py parses bioproc.ttl + align.ttl + instances.ttl, applies the OWL-RL closure, and runs the competency questions — 2120 triples close to 7137 after reasoning, and the transitive spine earns its long-range edges with no hand assertion. CQ-06 reads the affectsQuality edges directly:

# queries/CQ-06.rq — CQ-06: which process parameters affect monomer purity?
PREFIX bp: <https://example.org/bioproc#>
SELECT ?parameter ?attribute WHERE {
?parameter bp:affectsQuality ?attribute .
?attribute a bp:QualityAttribute .
}

It returns the two CPPs the dataset asserts — feed rate and culture temperature, both affecting MonomerPct-CQA — turning the most prized knowledge in development into a one-line query. CQ-13 walks the containment hierarchy with bp:contains+, returning the carton, case, and pallet the vial is packed inside, and CQ-14 proves the two transitive hierarchies never touch, as an ASK that is correctly False:

# queries/CQ-14.rq — CQ-14: containment is NOT genealogy. Correct when this ASK is FALSE.
PREFIX bp: <https://example.org/bioproc#>
ASK {
?container bp:contains+ bp:VIAL-DP-001-000042 .
bp:VIAL-DP-001-000042 bp:derivedFrom+ ?container .
}

The model is correct precisely because the answer is False: nothing a vial is packed inside is also something the vial was made from. CQ-21 confirms the run-to-vessel occursIn edge resolves to BR-101 for the production run, CQ-17 confirms WCB-CHO-001 carries all four characterizations so it conforms to the cell-bank gate, and CQ-18 confirms passage 8 is within the validated limit of 40. And the payoff of the spine is CQ-04, the impact walk: when the out-of-spec lot DP-004 fails — its high-molecular-weight aggregate at 2.41 %, above the 2.0 % release limit, even though its monomer is in spec — the query walks up its lineage to the shared ancestor and back down to find every sibling that shares its fate:

# queries/CQ-04.rq — when DP-004 fails, which drug products share its lineage?
PREFIX bp: <https://example.org/bioproc#>
SELECT DISTINCT ?affected WHERE {
bp:DP-004 (bp:derivedFrom)+ ?shared . # an ancestor of the failed lot
?affected (bp:derivedFrom)+ ?shared . # anything else derived from it
?affected a bp:DrugProduct .
FILTER(?affected != bp:DP-004)
} ORDER BY ?affected

DP-004 and the golden lots DP-001/DP-002 all trace back to WCB-CHO-001, so the walk surfaces DP-001 and DP-002 as sharing the failed lot's cell-bank ancestry — a recall scoped by query instead of by quarantining the whole campaign. The same forward fork that makes DS-001 fillsInto DP-001 , DP-002 (one drug substance splitting into sibling drug-product lots) is what makes shared-fate analysis a traversal rather than a spreadsheet archaeology dig. Five competency questions, all green — and the headline lineage walk from DS-001 returns its 11 ancestors entirely by transitivity.

The unsolved part: a relation is consistent, not necessarily correct

A reasoner can prove these relations are consistent — domain and range type their ends, transitivity closes the lineage, the disjointness guards keep contains from masquerading as derivedFrom and the cell-bank tiers from collapsing into one. It cannot prove they are correct. A derivedFrom edge with its direction reversed reasons perfectly and lies about ancestry — and at the cell-bank root, that lie propagates to every downstream lot with full confidence. The model also records lineage, not conservation: derivedFrom says a capture pool came from a batch, but it does not enforce mass balance, so a traversable genealogy can be quantitatively impossible (more antibody out than went in) unless the quantities are modeled too. Where a continuous, living seed-train expansion gets cut into discrete material nodes is a modeling judgment, not a fact the biology hands over — model too coarsely and you cannot trace a contamination to a transfer; too finely and the graph drowns in nodes nobody queries. And affectsQuality asserts that a parameter matters but cannot represent the form of the effect (that lives in the referenced response surface), so parameters that matter only in combination — feed rate and temperature jointly — are easy to model one edge at a time, quietly implying a process is simpler than it is [2]. The relations make the knowledge legible and queryable, a real advance over a buried report; legible is not the same as complete or correct.

Why it matters

The quality of a graph is decided by its relations more than its classes. Transitivity on the single derivedFrom spine is what makes lineage walkable to any depth from one frozen vial of cell bank to the patient's vial — and what makes an out-of-spec lot's siblings findable by query when a recall is on the line. The object/datatype fork is what keeps an edge you walk (back to the cell bank) distinct from a value you read (the lot's monomer purity). occursIn is what lets the batch and its bioreactor be separate things, so a vessel implicated in contamination can be cross-checked across campaigns. And keeping contains deliberately off the lineage spine is what keeps "what is this antibody made from?" and "what carton is this vial packed in, now?" two answerable questions instead of one muddled one. Get these four right and every lineage, impact, QbD, and recall query the digital thread runs is trustworthy by construction.

In the real world

None of these relations is this book's coinage. derivedFrom aligns up to the Relation Ontology's derives from, the standard way biomedical and manufacturing graphs say one thing comes from another; the cell-bank tiers and their characterization are not optional good practice but regulatory expectations for cell substrates, which is why every commercial mammalian-cell process maintains an RCB/MCB/WCB lineage with documented passage limits — CHO being the dominant host for commercial antibodies precisely because its behavior is so well characterized. affectsQuality, CPPs, CQAs, NOR and PAR are the vocabulary of the ICH quality guidelines that regulators and industry already share, which is why modeling them as a graph is natural rather than forced [2]; and aggregation under GS1, distinct from manufacturing lineage, is mandated by track-and-trace regulation and is live daily operation for every commercial drug product [3]. The relationships a development, manufacturing, and supply-chain team establish already are graphs in their reports and barcodes; the move this chapter argues for is to write them down as edges a machine can traverse — and to keep the two transitive hierarchies, lineage and containment, from ever being confused.

Key terms

  • Object property / datatype property — a relation to another thing (an edge to walk, like derivedFrom back to the cell bank) versus a relation to a literal value (a measurement to read, like monomerPct or passageNumber).
  • derivedFrom — the one transitive object property that is the genealogy spine; only immediate parent edges are asserted, and a reasoner infers lineage to any depth, rooting every campaign material in the working cell bank and, through it, the master and research banks.
  • Cell-bank tiers (RCB/MCB/WCB) — the disjoint, regulatorily expected hierarchy at the root of the spine; the working bank is characterized for identity, sterility, viral safety, and genetic stability, because a misidentified root propagates transitively to every lot.
  • affectsQuality — the core QbD object property from a process parameter to a quality attribute (feed rate → monomer purity), carrying its NOR, PAR, criticality, and DoE study as attached evidence.
  • NOR / PAR — the tighter normal operating range and the wider proven acceptable range bounding a parameter, modeled as typed ranges rather than prose.
  • occursIn — the object property tying a process to the persisting equipment it ran in (the culture run to BR-101), used instead of re-typing the batch material as a vessel (the material/equipment guard).
  • contains — the transitive packing hierarchy (carton → case → pallet), deliberately not a sub-property of derivedFrom because containment is mutable and is not lineage.

Where this leads

The relations are wired and the spine is transitive. But a relation that is merely declared can still be misused — a functional property like hasHostOrganism left unstated, an existential restriction not enforced, a disjointness forgotten so the "batch is the run" error slips through, or a cell-bank characterization gate that an open-world reasoner will not catch. The next chapter, Formalization: Axioms, Restrictions, and What a Reasoner Can and Cannot Catch, adds the OWL 2 axioms that give these classes and relations teeth — functional properties, existential and cardinality restrictions, disjointness — and faces honestly the line between what an OWL-RL closure enforces and what only a full DL reasoner or a SHACL gate will catch.