An Honest Verdict: What Ontologies Solve, and What They Leave to People

📍 Where we are: Part IX · The Verdict — the last chapter. We have modeled the bioprocess end to end and surveyed how the real industry uses it. This chapter steps all the way back and tells the truth about what we built — its real power, and its real limits.

This book made a single argument across all its chapters: that a stored fact becomes knowledge only when its meaning is modeled, and that modeling the bioprocess — the whole drug-making process, from growing cells through purifying and filling the product — as an ontology turns a heap of records into a navigable, queryable, trustworthy whole. (An ontology, in this book, is a machine-readable file that spells out the types of things in a domain — batches, materials, equipment, quality results — and the relationships and rules that connect them, so that software can reason over their meaning rather than just store text.) The argument is true. It is also incomplete, and an honest book owes you the incompleteness as clearly as the promise. Every chapter ended with an "unsolved part" on purpose; this chapter gathers them, because the pattern across them is the real lesson — sharper than any single technique — about what ontologies are for and what they cannot do.

The simple version

A great map transforms a journey: you can plan routes, see what connects to what, and find your way under pressure. But the map is not the territory — it cannot tell you the bridge washed out this morning, and a beautifully drawn map of the wrong valley will lead you confidently off a cliff. An ontology is a great map of a process. This chapter is honest about both halves: how much the map helps, and why you still need people who know the territory, check the map against it, and remember it is a map.

What this chapter covers

We separate, plainly, what ontologies genuinely solve from what they leave to people, show that every limit shares one shape — the model guarantees structure, humans supply substance — give a sober answer to when modeling is worth it, and close the book by pointing at the branch it was always leading toward: the learning lens of the next book (the learning lens is this series' other half — where this book models what data means, the next book uses data to predict).

What ontologies genuinely solve

These are real, earned wins, not marketing. Take them seriously, because they justify the whole effort.

Interoperability of structure. Anchoring every term to the BFO spine — BFO, the Basic Formal Ontology, a tiny standardized vocabulary of top-level categories (object, process, quality) that every domain ontology hangs from, the "spine" of the skeleton — and the IOF mid-level — the Industrial Ontology Foundry, a shared middle layer of manufacturing terms built on BFO — means a class built by one team is structurally compatible with one built by another, with no bespoke adapter. (A class is a type, like "Bioreactor"; the private-dialect trap is each team inventing its own incompatible terms, and a shared upper ontology closes it by construction — interoperability is built in from the start, not bolted on later.) Queryable lineage and impact. Faithful derivedFrom edges (an edge is a stored relationship between two things in the graph — here, "this material was derived from that one") and a transitive property (transitive means the relationship chains automatically: if A derives from B and B from C, then A derives from C) turn "where did this come from?" and "what shares its fate?" into one-line queries instead of weeks of archaeology — the digital thread (a single connected trail linking every record about a product across systems and stages), genuinely realized. Enforceable completeness. A SHACL release gate — SHACL (the Shapes Constraint Language) is a standard for writing data-validation rules, and a release gate is the automated check a batch's records must pass before the drug can be released — checks, mechanically and tirelessly, that every required result is present, single, typed, in range, and signed — a guard against incompleteness no human checklist applies as reliably. And the check is itself executable: the 23 competency questions (the concrete questions the ontology was built to answer, agreed up front) are the acceptance suite — the executable ORSD (Ontology Requirements Specification Document, the spec of what the model must do) where requirements and tests are the same artifact — so "is the model still complete?" is not a review meeting but a command that exits zero or non-zero (an exit code of zero means it passed, non-zero that it failed — the way an automated check reports pass/fail), the difference between a quality system that documents its checks and one that runs them. Shared, self-describing meaning. Global IRIs and QUDT units — an IRI (Internationalized Resource Identifier, a web-style globally unique name) and QUDT (a standard vocabulary of Quantities, Units, Dimensions and Types) — mean a value never travels bare (it always carries its unit) and a name never collides (two teams' identifiers can never be confused), the foundation that makes data FAIR (Findable, Accessible, Interoperable, Reusable — the standard for data others can actually use) — see the FAIR chapter. These are not small. A plant that achieves them has turned its data from something it has into something it can use.

What ontologies leave to people

And here is the other column, gathered from every chapter's unsolved part — the things the model cannot do, however elegant.

Correctness. A reasoner — the automated logic engine that derives new facts from the ontology's rules — proves an ontology consistent (free of self-contradiction), never correct (axioms); a SHACL gate proves a record complete, never true (release); an LRV — a Log Reduction Value, the metric for how much a step removes or inactivates virus — is a validated claim, not a measurement (viral safety); RDF (the Resource Description Framework, the graph data format these facts are stored in) "on the wire" — i.e. as it travels between systems — is compliant, not FAIR-in-fact (FAIR). A confidently mislabeled-but-plausible value passes every machine check. Identity reconciliation. Deciding that four systems' names denote the same real thing remains largely manual, and a wrong owl:sameAs propagates false facts silently. (OWL, the Web Ontology Language, is the logic layer on top of RDF; owl:sameAs is its strongest identity statement — it declares that two names refer to one and the same individual, so a reasoner fuses everything known about both.) This is the unsolved half of identity, sharpest at the cell-bank root and the GS1 bridge. The running graph models its own restraint here: where it cross-links a serialized vial to its GS1 key it asserts skos:exactMatch — from SKOS (the Simple Knowledge Organization System), a documented, retractable mapping that says "these two records (each an information artifact — a record or identifier about a thing, not the physical thing itself) are about the same thing" without merging them, the softer choice — not an owl:sameAs that would fuse two individuals and let a reasoner merge all their properties. owl:sameAs is a hammer; most real identity links want the softer SKOS mapping, and choosing wrongly is exactly how a graph starts telling confident lies.

The OBO–IOF seam. The biomedical ontologies (the OBO family — Open Biological and Biomedical Ontologies, the community that coordinates them) that describe the target — the molecule in the body the drug is designed to act on, which biomedical vocabularies describe — and the manufacturing ontologies that describe making it are both BFO-grounded but not seamlessly joined — they grew up as two separate vocabulary worlds, one built by biomedical researchers to describe molecules and disease, the other by manufacturing engineers to describe equipment and process, so their terms meet only where someone deliberately links them; the crosswalk (a hand-authored mapping between the two vocabularies) is yours to author, and our own align.ttl shows both its reach and its honest stopping point. It bridges where a verified leaf exists — written as triples (subject–predicate–object statements), bp:Equipment rdfs:subClassOf iof:PieceOfEquipment reads "our Equipment class is a kind of the IOF's PieceOfEquipment" (the bp: prefix marks our own bioprocess terms, iof: the Industrial Ontology Foundry's, and rdfs:subClassOf means is-a-kind-of); likewise bp:Quality rdfs:subClassOf BFO quality — and refuses to over-claim where one does not: bp:MechanismOfAction is flagged a lexical bridge (the two terms merely share a name) rather than a category entailment (a logically guaranteed is-a-kind-of relationship the reasoner may act on). Crucially the file carries zero owl:equivalentClass (the statement that two classes are exactly the same) and zero owl:sameAs: every external link is an rdfs:subClassOf subsumption (an is-a-kind-of link) or a skos:exactMatch, because asserting equivalence where you mean is-a-kind-of is how one wrong inference quietly propagates a false fact — the very failure this column is about.

Continuous-processing individuation. Individuation is deciding what counts as one discrete thing. The comfortable batch-and-unit-operation boundaries that make lineage clean — a batch is one defined production run, a unit operation is one processing step (such as a chromatography column or a filtration), and a lot is the discrete quantity of material those produce — dissolve when product flows continuously, with no settled ontology for where one time-bounded lot ends and the next begins. Cross-organizational federation. The thread is ironclad inside the factory and a fragile federation beyond it, dependent on parties you cannot mandate. Governance discipline. The model stays true only through stewardship — a social commitment no technology supplies — and most ontology projects fail on governance, not on logic.

The verdict in one view: the left column is real and earned, the right column is real and remaining — and every item on the right shares one shape, that the model guarantees structure while people must supply substance. Original diagram by the authors, created with AI assistance.

The pattern under every limit: structure versus substance

Lay the two columns side by side and the unsolved parts stop looking like a scattered list of caveats and resolve into a single shape. In every case, the ontology guarantees structure; a human must supply substance. The model guarantees that a process is an occurrent and a purity is a quality — but not that you classified this harvest correctly. It guarantees a record is complete — but not that the number in it is true. It guarantees two terms are structurally compatible — but not that two teams chose the same one. It guarantees a name is globally unique — but not that you matched it to the right real thing. This is not a flaw to be patched in a future version; it is the nature of a formal model. A model is a structure for holding substance, and it can no more supply the substance than a filing system can write the files. Seeing this clearly is what separates using ontologies well from over-trusting them: you lean on the structure exactly as far as it reaches, and you keep human judgment, data integrity, and governance doing the work only they can do.

When modeling is worth it — and when it is not

The honest cost-benefit follows from the pattern. Modeling as an ontology pays when the questions you need are cross-system, recursive, or lineage-shaped — what did this derive from, what shares its fate, which parameter drove this attribute — because those are exactly what a graph answers and a pile of records cannot. It pays when interoperability across systems, sites, or organizations is the goal, because shared upper ontologies are the only thing that delivers it without an adapter per pair. It pays when a regulated lifecycle demands traceability that survives decades and audits. The sharpest illustration is an out-of-specification event (a result outside its allowed limits): when DS-004 — a lot of drug substance, the purified active ingredient (here lot number 004) — fails its HMW-aggregate limit (HMW = high-molecular-weight aggregates, clumped-together antibody molecules that must stay below a set limit), the question an investigator must answer within days — under the deviation and CAPA (Corrective and Preventive Action) discipline of 21 CFR 211.192 (the US drug-manufacturing regulation governing investigations), and to scope any field action (a recall or market withdrawal) — is what else shares this lot's fate? In the graph that is one transitive derivedFrom walk: forward to every drug-product lot (the finished, filled dosage form) filled from the same drug substance and backward to the shared cell bank (the frozen, qualified stock of cells every batch is grown from), surfacing every sibling lot (another lot made from the same parent material) grown from it (here DP-001 and DP-002, two drug-product lots, via the shared working cell bank — the day-to-day vial stock drawn from the master bank). That is not a hopeful sketch but the executable impact question CQ-04 (competency question 04), which the validator below reports green (affected = ['DP-001', 'DP-002']) — the difference, again, between a traceability claim documented and one that runs. The same walk underwrites comparability after a process change and recall scoping after a complaint; a pile of records cannot answer it without weeks of archaeology, and that gap, multiplied across a twenty-year product lifecycle of audits and changes, is precisely where modeling repays its cost. It does not pay to model everything to the finest grain "because you can": over-axiomatization — piling on too many logical rules (axioms) — makes reasoning intractable (so slow the engine cannot finish in reasonable time), modeling every vial and every second of a run drowns the graph, and a model nobody governs rots into a liability. The discipline is to model at the granularity your real questions require, anchor to shared standards, govern what you build, and stop. An ontology is a powerful tool for a specific job, not a virtue to be maximized. And the cost-benefit is honest in both directions: the tools are largely free, but the real cost is not licenses — it is sustained, skilled human attention, paid in the ontologist who authors meaning correctly, the steward who governs the model for a decade, and the analyst who reconciles identity by hand.

The unsolved part: the deepest one is human

If the book has one final unsolved part, it is this: the binding constraint on modeling the bioprocess as knowledge is not the technology — RDF (the graph data format), OWL (the logic language), SHACL (the validation rules), BFO (the upper ontology), IOF (the manufacturing mid-level), QUDT (the units vocabulary), and LinkML (a Linked data Modeling Language for authoring schemas) are mature, standardized, and largely free — but the human discipline to author meaning correctly, reconcile identity honestly, govern the model faithfully, and resist both the temptation to over-model and the temptation to over-trust. Every technical limit in the right-hand column ultimately routes back to a person: someone classifies, someone matches, someone stewards, someone decides what is true. The field's genuine frontier is not a better triplestore; it is the organizational practice — the controlled vocabularies actually authored on the floor, the governance actually staffed, the cross-organizational trust actually built — that turns standards-compliance into FAIRness in fact. That is sobering and it is also clarifying: it means the path forward is known, and it is mostly a matter of doing the unglamorous work, well, for a long time.

Why it matters

A book that only sold the promise would mislead you into the exact over-trust that makes a graph a confident liar. The verdict matters because how you hold the tool determines whether it helps: lean on the structure it genuinely guarantees, and an ontology transforms a process's data into queryable, interoperable, enforceable knowledge; mistake structure for substance, and you will trust a complete-but-false record, a consistent-but-wrong model, a compliant-but-hollow graph. The whole book has been teaching one habit — model the structure rigorously, and keep human judgment, integrity, and governance supplying the substance — and this chapter names it outright so it outlasts the techniques.

In the real world

Across the Part VIII survey, the way industry actually uses the outputs of ontology work sorts into six patterns on a sharp maturity gradient — production at the top, the GMP (Good Manufacturing Practice, the regulated factory-floor regime — where every change must be validated and audited, so new technology lands here years after it is proven in R&D) execution floor still empty at the bottom. (A knowledge graph, which several rows turn on, is simply a database shaped as this kind of ontology-typed graph of connected facts; the company and product names below are real-world examples, not things you need to memorize.)

How the output is used	Representative real example	What is consumed and done	Maturity
Analytical-lab data semantics	AFO and the Allotrope Simple Model (AFO = Allotrope Foundation Ontology; its lighter companion is the Allotrope Simple Model, abbreviated ASM); QUDT units inside the file	Instruments emit results that carry their own meaning — typed equipment, material, process, and result, units as IRIs — and vendors pipe ASM into knowledge graphs as AI-ready data	Production
Regulatory identification and master data	IDMP and SPOR, UNII, SPL (standards and code systems for naming medicines to regulators — IDMP is the ISO Identification of Medicinal Products family); J&J's IDMP-O product master on Accurids (a product master is the single authoritative record of each product; Accurids is the vendor platform)	The governed record of what a product is, carried as machine-readable identifiers into agency submissions — the substrate behind a `bp:DS-001`	Production
R&D and FAIR knowledge graphs	Roche EDIS, Boehringer, Novo Nordisk OBDM, Novartis data42, AstraZeneca BIKG	Find and reuse datasets, federate omics (the large-scale "-omics" datasets — genomics, proteomics, and the like), IT, documents, and trials, run an inferencing graph (one a reasoner derives new facts over) across research data, and drive ML target identification	Production — R&D only
Lineage, impact, and cross-lifecycle queries	This book's loadable dataset and its validator; on real platforms the same `derivedFrom` path runs R&D-side as Foundry object links or Neo4j Cypher	"Where did this derive from", "what shares its fate", and "which parameter drove this attribute" as one-line graph queries, with a SHACL gate that refuses out-of-spec lots	R&D-side; proven in tested code
Grounding AI on the graph (GraphRAG)	Merck Synaptix, Bayer patient maps, Syngenta NOCTIS; Pistoia CMC Process Ontology Phase 3	Grounding an AI on the graph (GraphRAG = graph-based Retrieval-Augmented Generation) means it answers from facts retrieved along typed edges and cites them, rather than inventing plausible-sounding text — discovery-side, not release decisions	Mostly piloted
GMP manufacturing-floor semantics	PAS-X and the PI Asset Framework, with BioPhorum's OD-probe PoC the kind of pilot under way (PAS-X is a manufacturing-execution system, the PI Asset Framework an industrial data historian, BioPhorum an industry consortium, an OD probe an optical-density cell-growth sensor, and a PoC a proof-of-concept pilot)	The execution floor still runs on closed structured models and statistics; formal ontology here is pilots, not production	Not yet — piloted

Read the gradient honestly about this book, too. The fourth row — lineage, impact, and cross-lifecycle queries — is the one our own bp: graph fills, and its maturity tag is deliberate: R&D-side, proven in tested code, not a GMP-floor deployment. The running example is a teaching dataset that runs on a laptop and passes its own 23 acceptance tests; that is real evidence the engine works, and it is not evidence that a regulated plant has put a reasoner in its release path.

The standards are real and converging: BFO is an ISO/IEC standard (ratified by the international standards bodies), the OBO Foundry — the Open Biological and Biomedical Ontologies Foundry, the long-running community that coordinates the biomedical ontologies (the "OBO side" the manufacturing "IOF side" must be stitched to) — has governed interoperable biomedical ontologies for nearly two decades, the IOF and its biopharma council bring the same discipline to manufacturing, and FAIR is a measurable target with published metrics [1][2][3]. The industry survey of Part VIII puts both the convergence and its unevenness in sharp relief: the genuinely production-grade semantics today live in analytical-lab data (AFO and the Allotrope Simple Model) and in mandated regulatory identification (IDMP and SPOR, UNII, SPL), while the manufacturing-process ontologies remain pilots and the GMP floor still runs on structured and statistical models — and the loudest new driver, grounding AI on the graph, only raises the stakes on the very discipline this chapter is about. That real-world split is this verdict's thesis, already happening: the structure is standardized and arriving; the substance — correct, governed, FAIR-in-fact data — is still the unfinished, human work. The open-source book proves the engine runs on a laptop. The running example used throughout this book is itself a loadable dataset, and its validator reports both halves of the verdict in one breath — the transitive derivedFrom property path walks the full eleven-ancestor material-derivation lineage from the drug substance back through every pool (collected purified fractions), harvest (the cell-broth output of the bioreactor), and seed culture (the smaller cultures grown to expand the cells before the production reactor) to the research cell bank (the earliest qualified cell stock) — each material the output of a separate unit-operation process, the OWL-RL reasoner additionally materializes the long-range endpoints, every query answers, and the SHACL gate honestly refuses to conform because an out-of-spec sibling lot really is out of spec:

[1] parsed 2120 triples (bioproc + align + instances)
[2] reasoned: 2120 -> 7137 triples after OWL-RL closure
[3] competency questions (ORSD v1.0.0 acceptance tests):

      CQ     GROUP           RESULT DETAIL
      -----  --------------  -----  ----------------------------------------
      CQ-01  lineage         PASS   11 row(s)
      CQ-03  lineage         PASS   row {'batch': 'BATCH-2026-001', 'monomer': 98.611} present
      CQ-04  impact          PASS   affected = ['DP-001', 'DP-002']
      CQ-08  release         PASS   DS-001 release panel complete and in spec
      CQ-11  release         PASS   OOS ['DP-004', 'DS-004'] on path ['hmwPct']
      CQ-12  viral           PASS   sum(lrv) = 8.7 over 2 step(s)
      CQ-21  structural      PASS   row {'run': 'CCP-001', 'vessel': 'BR-101', 'vesselType': 'ProductionBioreactor'} present
      CQ-22  structural      PASS   transitive lineage + equipment-is-material inferred
      CQ-23  structural      PASS   Batch-as-process and Batch-as-bioreactor both caught
      ...  (23 competency questions in all)

      23/23 competency questions PASS

ALL CHECKS PASSED

The growth from 2120 to 7137 triples (each triple is one subject–predicate–object fact, the atomic statement of this whole field) is the OWL 2 RL closure — the reasoner running its rules until no new facts can be derived, then storing all of them — doing the structural work the chapter credits it with (CQ-22: the transitive derivedFrom walk reaches all 11 ancestors back to the research cell bank, and equipment is inferred a BFO material entity), and its choice is a verdict in miniature — RL (the Rule Language profile of OWL — the same RL as the OWL-RL reasoner named above) is a tractable profile that materializes entailments by forward-chaining rules (applying each rule to the known facts to derive new ones, repeatedly, until nothing new appears) in polynomial time (fast enough to always finish), which is why this runs on a laptop, but it reasons under the open-world assumption — the logical default that anything not stated is merely unknown, not false — and so will never conclude that a required result is missing (absence is not falsehood to it) — put plainly, if a required release result were simply never recorded, the open-world reasoner shrugs and says "maybe it exists, I just have not been told," so it can never flag the gap.

That is exactly why CQ-11 is a SHACL check, not a reasoner one: the gate closes the world (treats anything not present as genuinely absent — "not in the file" means "not done," the assumption a completeness check needs) and isolates the out-of-spec finding to one path (hmwPct) on the two OOS lots, DP-004 and DS-004, because every other panel value on those lots is in spec — and CQ-23 shows both disjointness guards — rules stating two categories can share no member — catching their planted conflations: a deliberately seeded error that treats a Batch (a quantity of material) as if it were the process that made it, and another treating a Batch as if it were the bioreactor it was grown in. The guard fires because a thing cannot be both at once. Consistency is an open-world property a reasoner proves; completeness is a closed-world property only a shape can enforce; truth is neither — it is the substance a person must supply.

The viral and QbD rows make the same point from the bioprocess side: CQ-12's 8.7-log total (a log reduction is a power-of-ten removal — 8.7 logs means viruses cut by roughly a factor of 10^8.7) is a defensible sum because its two barriers clear virus by orthogonal mechanisms — physically independent methods, so they cannot both fail the same way: low-pH inactivation, which destroys enveloped virus (those wrapped in a fatty membrane), and size-based nanofiltration, which physically sieves out small non-enveloped virus the low pH cannot touch — exactly the complementary, mechanistically distinct steps ICH Q5A (the viral-safety guideline that asks for clearance by independent modes so a sum may be claimed) calls for, and because each barrier acts on virus the other does not, no single virus slips past both, so the two reductions multiply and their logs add, not a coincidence of arithmetic. And the affectsQuality edges CQ-06 and CQ-07 surface are the ICH Q8 design-space relationship — ICH Q8 is the regulatory guideline on pharmaceutical development, and a design space is the proven-safe range of process settings — made queryable: feed rate and temperature are the critical process parameters (the settings that must be controlled, CPPs) whose ranges were shown, in development, to drive the monomer-content critical quality attribute (a product property that must stay in spec, a CQA, of the kind ICH Q6B governs for a biologic's release specification), so "which parameter drove this attribute?" becomes a one-line query rather than a buried entry in a development report — and the OOS path the SHACL gate isolates (hmwPct on DP-004 and DS-004) is, in those same terms, an aggregate CQA breaching its ICH Q6B acceptance limit, the regulated specification a release decision turns on.

That is the verdict made concrete: the structure runs, and it is honest about its limits. What every honest practitioner reports is exactly this chapter's split: the technology is ready, and the wins are real where the discipline is real — which is why the determining factor in whether a plant's ontology is an asset in five years is not its choice of triplestore but its commitment to the human practices the right-hand column demands.

The next lens needs this one: the graph as ground truth for learning

There is one more reason the right-hand column matters, and it is the bridge to the companion volume. The loudest 2026 reason to build a bioprocess ontology is to give a model something true to stand on — and that reframes, rather than relaxes, every limit above. A large language model (a system that produces fluent text by predicting plausible continuations — expert at form, indifferent to whether a specific claim is true) grounded by GraphRAG (graph-based retrieval-augmented generation: the model answers only from facts retrieved along the graph's typed edges and cites them, rather than inventing plausible text) inherits the graph's honesty and its limits. The same derivedFrom walk that answers CQ-04 for an investigator can ground a model's answer to "what was DS-004 derived from?" — but the answer is only as true as the identity reconciliation and the classification behind it, the very substance the right-hand column says a human must supply. A graph that confidently mislabels a sibling lot will ground a model that confidently repeats the mislabel, fluently and at scale, to everyone who asks. This is the chapter's thesis sharpened: AI does not let you skip correct classification, honest identity, and governed change — it punishes skipping them more severely than a spreadsheet ever did. The full argument is the frontier chapter; here it is enough to see that the next book's prerequisite is this book's right-hand column done well.

That dependence is concrete enough to change how a learning model must be built and validated, and the companion ML volume leans on three properties only a reasoned graph supplies. First, the graph fixes the unit of learning. Because lineage is explicit, a model that learns over these instances can be split the way it must be — not by random row but by batch, with a grouped / leave-one-batch-out cross-validation (holding out every row that shares a derivedFrom ancestor) so the score it reports is one it would earn on an unseen campaign, not one inflated by sibling samples of the same lot leaking across the split. The derivedFrom edges are the grouping key; a flat export leaves that grouping to a hopeful convention. Second, the reasoned graph is the trustworthy label. A SHACL-conformant subgraph — every required result present, single, typed, in range, and signed — is the only honest training set a model can have, because the gate that refuses a non-conformant release equally refuses a non-conformant retrieval: it certifies the subgraph is complete and well-typed rather than a partial load the model will cheerfully complete from training memory. Third comes a validation paradox worth naming. A fluent model is checked against held-out data, but the held-out data is only as honest as the graph it came from — and a model that quietly contradicts a reasoned graph (one whose owl:TransitiveProperty closure and SHACL shapes have already been machine-checked) is, in that contradiction, more likely wrong than the graph is, because the graph's answer was derived and certified while the model's was merely generated. The reasoned, shape-validated lineage is exactly the leak-free split and trustworthy label that bioprocess data — scarce, confounded, drift-prone — otherwise cannot supply.

Two boundaries close the loop. A graph feature handed to a model still carries an in-or-out-of-envelope question — is this lot's lineage the kind the model was validated on, or a configuration it has never seen? — and the typed graph answers it by construction: a retrieval that returns no conforming subgraph is the graph analogue of an applicability-domain flag (the out-of-distribution check that makes a soft sensor refuse rather than guess on unfamiliar ground), a refusal to answer in place of a confident error. And the substrate moves: as the governed-change machinery versions the ontology and the plant adds campaigns, a grounded model's behavior drifts with its ground truth, so the ontology's version belongs in the model's provenance and the retrieval layer needs the same monitored, change-controlled MLOps lifecycle a hybrid model or digital twin demands. Read that way, this book's whole right-hand column is the next book's risk register: every limit a human must close here is a failure mode a learning system would otherwise inherit, amplify, and narrate with perfect confidence.

Key terms

Structure versus substance — the unifying pattern: an ontology guarantees the structure of meaning (categories, compatibility, completeness, uniqueness) while humans must supply the substance (correct classification, true values, right matches, faithful stewardship).
The earned wins — interoperable structure, queryable lineage and impact, enforceable completeness, and shared self-describing meaning; the real value modeling delivers.
The remaining limits — correctness, identity reconciliation, the OBO–IOF seam, continuous-processing individuation, cross-organizational federation, and governance discipline; what the model leaves to people.
When to model — when questions are cross-system, recursive, or lineage-shaped, interoperability is the goal, or a regulated lifecycle demands durable traceability; not to maximize for its own sake.
The deepest unsolved part — the binding constraint is human discipline (authoring, reconciling, governing, not over-trusting), not the mature technology.
GraphRAG grounding — a fluent model answers only from facts retrieved along the graph's typed edges and cites them, so it inherits the graph's honesty and its limits; the right-hand column is the next book's risk register.
Grouped / leave-one-batch-out cross-validation — splitting a learning model's data by batch (holding out every row that shares a derivedFrom ancestor) so its reported score is one it would earn on an unseen campaign, not one inflated by sibling-lot leakage; the lineage edges are the grouping key.
The validation paradox — a model checked against held-out data is, when it contradicts a reasoned, shape-validated graph, the more likely wrong of the two, because the graph's answer was derived and certified while the model's was generated.

Where this leads

This completes Ontologies for Biopharmaceutical Manufacturing — the bioprocess walked end to end through the lens of meaning, with both its power and its limits laid bare. But meaning is only one of the two lenses this series branches into. The companion book, Machine Learning & AI for Biomanufacturing, walks the same process again through the lens of learning — soft sensors (software that infers a hard-to-measure quantity from easier signals), hybrid models (combining mechanistic equations with data-driven learning), and the validated AI first met in the data book's machine-learning chapter — and it needs exactly what this book built: a model that learns is only as trustworthy as the structured, FAIR, well-governed knowledge it learns from. Ontology gives the bioprocess a memory it can reason over; machine learning gives it a way to predict. The two lenses are the natural pair, and the next book takes up the second.

What this chapter covers​

What ontologies genuinely solve​

What ontologies leave to people​

The pattern under every limit: structure versus substance​

When modeling is worth it — and when it is not​

The unsolved part: the deepest one is human​

Why it matters​

In the real world​

The next lens needs this one: the graph as ground truth for learning​

Key terms​

Where this leads​