An Honest Verdict: What Ontologies Solve, and What They Leave to People
📍 Where we are: Part VIII · The Verdict — Chapter 31, the last. We have modeled the bioprocess end to end and surveyed how the real industry uses it. This chapter steps all the way back and tells the truth about what we built — its real power, and its real limits.
This book made a single argument across thirty chapters: that a stored fact becomes knowledge only when its meaning is modeled, and that modeling the bioprocess as an ontology turns a heap of records into a navigable, queryable, trustworthy whole. The argument is true. It is also incomplete, and an honest book owes you the incompleteness as clearly as the promise. Every chapter ended with an "unsolved part" on purpose; this chapter gathers them, because the pattern across them is the real lesson — sharper than any single technique — about what ontologies are for and what they cannot do.
A great map transforms a journey: you can plan routes, see what connects to what, and find your way under pressure. But the map is not the territory — it cannot tell you the bridge washed out this morning, and a beautifully drawn map of the wrong valley will lead you confidently off a cliff. An ontology is a great map of a process. This chapter is honest about both halves: how much the map helps, and why you still need people who know the territory, check the map against it, and remember it is a map.
What this chapter covers
We separate, plainly, what ontologies genuinely solve from what they leave to people, show that every limit shares one shape — the model guarantees structure, humans supply substance — give a sober answer to when modeling is worth it, and close the book by pointing at the branch it was always leading toward: the learning lens of the next book.
What ontologies genuinely solve
These are real, earned wins, not marketing. Take them seriously, because they justify the whole effort.
Interoperability of structure. Anchoring every term to the BFO spine and the IOF mid-level means a class built by one team is structurally compatible with one built by another, with no bespoke adapter — the private-dialect trap, closed by construction. Queryable lineage and impact. Faithful derivedFrom edges and a transitive property turn "where did this come from?" and "what shares its fate?" into one-line queries instead of weeks of archaeology — the digital thread, genuinely realized. Enforceable completeness. A SHACL release gate checks, mechanically and tirelessly, that every required result is present, single, typed, in range, and signed — a guard against incompleteness no human checklist applies as reliably. Shared, self-describing meaning. Global IRIs and QUDT units mean a value never travels bare and a name never collides, the foundation that makes data FAIR. These are not small. A plant that achieves them has turned its data from something it has into something it can use.
What ontologies leave to people
And here is the other column, gathered from every chapter's unsolved part — the things the model cannot do, however elegant.
Correctness. A reasoner proves an ontology consistent, never correct (axioms); a SHACL gate proves a record complete, never true (release); an LRV is a validated claim, not a measurement (viral safety); RDF on the wire is compliant, not FAIR-in-fact (FAIR). A confidently mislabeled-but-plausible value passes every machine check. Identity reconciliation. Deciding that four systems' names denote the same real thing remains largely manual, and a wrong owl:sameAs propagates false facts silently — the unsolved half of identity, sharpest at the cell-bank root and the GS1 bridge. The OBO–IOF seam. The biomedical ontologies that describe the target and the manufacturing ontologies that describe making it are both BFO-grounded but not seamlessly joined; the crosswalk is yours to author. Continuous-processing individuation. The comfortable batch-and-unit-operation boundaries that make lineage clean dissolve when product flows continuously, with no settled ontology for time-bounded lots. Cross-organizational federation. The thread is ironclad inside the factory and a fragile federation beyond it, dependent on parties you cannot mandate. Governance discipline. The model stays true only through stewardship — a social commitment no technology supplies — and most ontology projects fail on governance, not on logic.
The verdict in one view: the left column is real and earned, the right column is real and remaining — and every item on the right shares one shape, that the model guarantees structure while people must supply substance.
Original diagram by the authors, created with AI assistance.
The pattern under every limit: structure versus substance
Lay the two columns side by side and the unsolved parts stop looking like a scattered list of caveats and resolve into a single shape. In every case, the ontology guarantees structure; a human must supply substance. The model guarantees that a process is an occurrent and a purity is a quality — but not that you classified this harvest correctly. It guarantees a record is complete — but not that the number in it is true. It guarantees two terms are structurally compatible — but not that two teams chose the same one. It guarantees a name is globally unique — but not that you matched it to the right real thing. This is not a flaw to be patched in a future version; it is the nature of a formal model. A model is a structure for holding substance, and it can no more supply the substance than a filing system can write the files. Seeing this clearly is what separates using ontologies well from over-trusting them: you lean on the structure exactly as far as it reaches, and you keep human judgment, data integrity, and governance doing the work only they can do.
When modeling is worth it — and when it is not
The honest cost-benefit follows from the pattern. Modeling as an ontology pays when the questions you need are cross-system, recursive, or lineage-shaped — what did this derive from, what shares its fate, which parameter drove this attribute — because those are exactly what a graph answers and a pile of records cannot. It pays when interoperability across systems, sites, or organizations is the goal, because shared upper ontologies are the only thing that delivers it without an adapter per pair. It pays when a regulated lifecycle demands traceability that survives decades and audits. It does not pay to model everything to the finest grain "because you can": over-axiomatization makes reasoning intractable, modeling every vial and every second of a run drowns the graph, and a model nobody governs rots into a liability. The discipline is to model at the granularity your real questions require, anchor to shared standards, govern what you build, and stop. An ontology is a powerful tool for a specific job, not a virtue to be maximized.
The unsolved part: the deepest one is human
If the book has one final unsolved part, it is this: the binding constraint on modeling the bioprocess as knowledge is not the technology — RDF, OWL, SHACL, BFO, IOF, QUDT, and LinkML are mature, standardized, and largely free — but the human discipline to author meaning correctly, reconcile identity honestly, govern the model faithfully, and resist both the temptation to over-model and the temptation to over-trust. Every technical limit in the right-hand column ultimately routes back to a person: someone classifies, someone matches, someone stewards, someone decides what is true. The field's genuine frontier is not a better triplestore; it is the organizational practice — the controlled vocabularies actually authored on the floor, the governance actually staffed, the cross-organizational trust actually built — that turns standards-compliance into FAIRness in fact. That is sobering and it is also clarifying: it means the path forward is known, and it is mostly a matter of doing the unglamorous work, well, for a long time.
Why it matters
A book that only sold the promise would mislead you into the exact over-trust that makes a graph a confident liar. The verdict matters because how you hold the tool determines whether it helps: lean on the structure it genuinely guarantees, and an ontology transforms a process's data into queryable, interoperable, enforceable knowledge; mistake structure for substance, and you will trust a complete-but-false record, a consistent-but-wrong model, a compliant-but-hollow graph. The whole book has been teaching one habit — model the structure rigorously, and keep human judgment, integrity, and governance supplying the substance — and this chapter names it outright so it outlasts the techniques.
In the real world
Across the Part VII survey, the way industry actually uses the outputs of ontology work sorts into six patterns on a sharp maturity gradient — production at the top, the GMP execution floor still empty at the bottom:
| How the output is used | Representative real example | What is consumed and done | Maturity |
|---|---|---|---|
| Analytical-lab data semantics | AFO and the Allotrope Simple Model; QUDT units inside the file | Instruments emit results that carry their own meaning — typed equipment, material, process, and result, units as IRIs — and vendors pipe ASM into knowledge graphs as AI-ready data | Production |
| Regulatory identification and master data | IDMP and SPOR, UNII, SPL; J&J's IDMP-O product master on Accurids | The governed record of what a product is, carried as machine-readable identifiers into agency submissions — the substrate behind a bp:DS-001 | Production |
| R&D and FAIR knowledge graphs | Roche EDIS, Boehringer, Novo Nordisk OBDM, Novartis data42, AstraZeneca BIKG | Find and reuse datasets, federate omics, IT, documents, and trials, run an inferencing graph over research data, and drive ML target identification | Production — R&D only |
| Lineage, impact, and cross-lifecycle queries | This book's loadable dataset and its validator; on real platforms the same derivedFrom path runs R&D-side as Foundry object links or Neo4j Cypher | "Where did this derive from", "what shares its fate", and "which parameter drove this attribute" as one-line graph queries, with a SHACL gate that refuses out-of-spec lots | R&D-side; proven in tested code |
| Grounding AI on the graph (GraphRAG) | Merck Synaptix, Bayer patient maps, Syngenta NOCTIS; Pistoia CMC Process Ontology Phase 3 | A model answers along typed edges and cites them rather than inventing — discovery-side, not release decisions | Mostly piloted |
| GMP manufacturing-floor semantics | Sanofi Modulus and the Pistoia Methods Hub | The execution floor still runs on closed structured models (PAS-X, PI Asset Framework) and statistics; formal ontology here is pilots, not production | Not yet — piloted |
The standards are real and converging: BFO is an ISO/IEC standard, the OBO Foundry has governed interoperable biomedical ontologies for nearly two decades, the IOF and its biopharma council bring the same discipline to manufacturing, and FAIR is a measurable target with published metrics [1][2][3]. The industry survey of Part VII puts both the convergence and its unevenness in sharp relief: the genuinely production-grade semantics today live in analytical-lab data (AFO and the Allotrope Simple Model) and in mandated regulatory identification (IDMP and SPOR, UNII, SPL), while the manufacturing-process ontologies remain pilots and the GMP floor still runs on structured and statistical models — and the loudest new driver, grounding AI on the graph, only raises the stakes on the very discipline this chapter is about. That real-world split is this verdict's thesis, already happening: the structure is standardized and arriving; the substance — correct, governed, FAIR-in-fact data — is still the unfinished, human work. The open-source book proves the engine runs on a laptop. The running example used throughout this book is itself a loadable dataset, and its validator reports both halves of the verdict in one breath — the reasoner closes the transitive lineage across the full eleven-step unit-operation chain from the drug substance back to the research cell bank, every query answers, and the SHACL gate honestly refuses to conform because an out-of-spec sibling lot really is out of spec:
[1] parsed 2100 triples (bioproc + align + instances)
[2] reasoned: 2100 -> 7089 triples after OWL-RL closure
transitive derivedFrom inferred DS-001 -> WCB-CHO-001: True
transitive derivedFrom inferred DS-001 -> RCB-CHO-001: True
equipment BR-101 inferred BFO material entity: True
[3] lineage walk from DS-001: 11 ancestors
BATCH-2026-001 CLAR-001 MCB-CHO-001 PApool-001 POLpool-001 RCB-CHO-001
SEED-001 SEEDFLASK-001 VFpool-001 VIpool-001 WCB-CHO-001
lineage+CQA (originating batch): {'BATCH-2026-001': 98.611}
impact of DP-004 (shared cell bank): ['DP-001', 'DP-002']
affectsQuality edges: [('FeedRate', 'MonomerPct-CQA'), ('Temperature', 'MonomerPct-CQA')]
run -> vessel (occursIn): [('CCP-001', 'BR-101', 'ProductionBioreactor')]
HMW trajectory along the chain: [('PApool-001', 4.1), ('POLpool-001', 1.4)]
orthogonal viral-clearance LRVs: [('VF-001', 4.2), ('VI-001', 4.5)] # total 8.7
[4] SHACL whole-graph conforms: False # DS-004/DP-004 fail hmwPct MaxInclusive (2.41 > 2.0)
violating focus nodes: ['DP-004', 'DS-004'] failing paths: ['hmwPct']
DS-001-only graph conforms: True
planted Batch-is-a-Process caught (conforms False): True
planted Batch-also-Bioreactor caught (conforms False): True
ALL CHECKS PASSED
The growth from 2100 to 7089 triples is the OWL-RL closure doing the structural work the chapter credits it with; the conforms: False is the gate doing exactly its job — and the violation is isolated to one path (hmwPct) on the two out-of-spec lots, because every other panel value on those lots is in spec. The batch appears once — typed only Batch, the vessel kept separate — and both disjointness guards catch their planted conflations. That is the verdict made concrete: the structure runs, and it is honest about its limits. What every honest practitioner reports is exactly this chapter's split: the technology is ready, and the wins are real where the discipline is real — which is why the determining factor in whether a plant's ontology is an asset in five years is not its choice of triplestore but its commitment to the human practices the right-hand column demands.
Key terms
- Structure versus substance — the unifying pattern: an ontology guarantees the structure of meaning (categories, compatibility, completeness, uniqueness) while humans must supply the substance (correct classification, true values, right matches, faithful stewardship).
- The earned wins — interoperable structure, queryable lineage and impact, enforceable completeness, and shared self-describing meaning; the real value modeling delivers.
- The remaining limits — correctness, identity reconciliation, the OBO–IOF seam, continuous-processing individuation, cross-organizational federation, and governance discipline; what the model leaves to people.
- When to model — when questions are cross-system, recursive, or lineage-shaped, interoperability is the goal, or a regulated lifecycle demands durable traceability; not to maximize for its own sake.
- The deepest unsolved part — the binding constraint is human discipline (authoring, reconciling, governing, not over-trusting), not the mature technology.
Where this leads
This completes Ontologies for Biopharmaceutical Manufacturing — the bioprocess walked end to end through the lens of meaning, with both its power and its limits laid bare. But meaning is only one of the two lenses this series branches into. The companion book, Machine Learning and AI in Biopharmaceutical Manufacturing, walks the same process again through the lens of learning — soft sensors, hybrid models, and the validated AI first met in the data book's machine-learning chapter — and it needs exactly what this book built: a model that learns is only as trustworthy as the structured, FAIR, well-governed knowledge it learns from. Ontology gives the bioprocess a memory it can reason over; machine learning gives it a way to predict. The two lenses are the natural pair, and the next book takes up the second.