The Frontier: Ontologies as the Ground Truth for AI

📍 Where we are: Part VIII · Ontologies in Industry Today. The previous chapters surveyed who builds these models and why. This one looks at the newest reason anyone cares — and at the gap between the demo and the plant.

The newest reason the industry cares about ontologies has, at first glance, nothing to do with the patient identity and classification work of the last twenty-nine chapters — and on second glance, everything to do with it. That reason is artificial intelligence.

A large language model is a magnificent guesser. It predicts the next plausible word, and it does this so well that the output reads as understanding. But fluency is not knowledge, and it does not know what is true about your plant.

Ask it whether DP-004 (a drug-product lot — a defined batch of finished product) was released, meaning approved by Quality for distribution. The true answer is sitting in the graph — no: that lot failed the release decision by going out of specification (a measured result outside its acceptance limit, which fails the lot regardless of the attributes that passed) on high-molecular-weight aggregate (clumped antibody, an immunogenicity-risk attribute) at 2.41 percent against its 2.0 percent limit, while its monomer purity (the intact-antibody fraction) of 98.687 percent against a 95.0 percent floor passed. A lot must meet every limit, so failing one attribute rejects it regardless of the rest. But ask the model alone and it will compose a confident, grammatical, entirely invented answer with the same ease it composes the correct one. It has no way to tell the two apart, and neither, at a glance, do you.

The model supplies fluency. Something else has to supply the truth.

That gap — between a system that always sounds right and a record of what is actually right — is the whole subject of this chapter. It is also, not coincidentally, the gap every prior chapter was quietly closing.

That something, the industry decided between 2024 and 2026, is the knowledge graph — and behind the graph, the ontology. This final survey chapter walks through that convergence, separates the production reality from the press release, and arrives at the book's own conclusion in a new key: AI does not relax the discipline these chapters preached. It raises the stakes on it.

The simple version

Imagine a brilliant improviser who can speak fluently on any subject but has never set foot in your factory. Left alone, he answers every question smoothly and is wrong half the time. Now hand him a binder of verified, cross-referenced facts about exactly this factory — really a web of things (the nodes) joined by named links (the edges), which is what a knowledge graph is — and require him to answer only from the binder. The fluency suddenly becomes useful, because it is anchored to truth. The improviser is the AI. The binder is the knowledge graph. The ontology is what keeps the binder honest, consistent, and connected.

What this chapter covers

We frame why generative AI needs ontologies at all — the structure-versus-substance argument the book has made all along, now restated for the age of the language model.

We define retrieval-augmented generation and its graph-native form, GraphRAG, the technique at the center of the 2024-2026 wave.

We then survey the real evidence: the Pistoia Alliance's push to make data "AI-ready," a vendor-led scientific-AI stack that names ontologies among its pillars, GraphRAG ecosystems at several large companies, the cleanest published knowledge graph in the field, and the lighter-weight analytical-data and modular-automation trends feeding all of it.

And we name the unsolved part plainly: the binding constraint on this whole frontier is not the model but the data underneath it.

Throughout, every adoption claim carries a maturity tag (production / piloted / proposed / academic), and every vendor number is marked as what it is. The field is loud right now, and a survey is only useful if it is quieter than its subject.

GraphRAG answers by traversing the verified bp: graph and citing the typed edges it walked, so the model narrates a true lineage rather than inventing one. Original diagram by the authors, created with AI assistance.

Why a fluent model still needs a true graph

A language model learns the shape of language from an enormous corpus. It does not learn the facts of a particular manufacturing campaign, and it has no internal mechanism for knowing when it is wrong.

The industry's answer is retrieval-augmented generation — a pattern in which, before the model writes its answer, a retrieval step pulls relevant verified facts from a trusted store and places them in front of the model. The model is then instructed to answer from those facts rather than from its training memory. The retrieved facts, not the model's recollection, become the source of truth for that one answer.

Plain RAG over a document store already helps: a model that quotes a retrieved batch record is harder to fool than one improvising from memory. But a document store retrieves passages, and a passage is only as connected as whoever wrote it made it. The lineage you need may be spread across four reports that never mention each other.

GraphRAG is the version where the trusted store is a knowledge graph (a web of facts in which each thing is a node and each typed link between two things — like bp:derivedFrom — is an edge). Instead of retrieving loose passages of text, the system retrieves connected facts — this lot bp:derivedFrom that cell bank (the frozen, qualified vial of cells a campaign starts from), this CPP (critical process parameter — a setting you control, such as feed rate) bp:affectsQuality that CQA (critical quality attribute — a measured release property, such as monomer purity) — and the model answers along the edges, following each named link from node to node. Those are the literal edge names in the companion graph: a reader can grep bp:affectsQuality and find the Quality-by-Design link asserted, not a bare word the prose made up.

The difference is not cosmetic. Loose text retrieval returns whatever sounds relevant; graph retrieval returns whatever is actually linked, with the link itself carrying meaning the model can read. A question about what a lot derived from is answered by traversing the lineage, not by hoping a paragraph mentioning both happens to exist.

And it is worth seeing that this is not a metaphor: the retrieval step is a query the book already wrote. The verified store answers "what was DP-004 derived from?" with one SPARQL property path over the typed edges — SPARQL being the standard query language for graphs of this kind. You do not need to read it to follow this: the one line that matters is the path bp:DP-004 (bp:derivedFrom)+ ?ancestor, which says walk every derivedFrom link out of DP-004.

PREFIX bp: <https://example.org/bioproc#>
SELECT ?ancestor ?type WHERE {
  bp:DP-004 (bp:derivedFrom)+ ?ancestor .
  ?ancestor a ?type .
} ORDER BY ?ancestor

The (bp:derivedFrom)+ is the whole trick. Because bp:derivedFrom is declared owl:TransitiveProperty — meaning if A derives from B and B from C, then A derives from C automatically — the one path walks the lineage to any depth — DP-004 → DS-004 → PApool-004 → BATCH-2026-004 here, but the identical query would walk a twenty-step chain. The model never computes the lineage; it receives those bound rows and narrates them. That is what "the graph does the knowing" means, stated as code rather than slogan.

This is the same digital-thread lineage walk the book built by hand earlier — byte-for-byte the (bp:derivedFrom)+ idiom of the lineage competency question, here pointed at bp:DP-004 instead of the drug substance — now placed at the service of a model that can phrase the answer for a human. The graph does the knowing; the model does the talking.

This is not a use the book stumbled into late. The very first chapter's requirements brief listed "grounding for retrieval-augmented AI" among the ontology's intended end-uses, alongside lineage traceback and release verification — so GraphRAG was a specified purpose of this model from before a single class was drawn, not a fashion bolted on at the end. And because that brief was made executable, the retrieval query above is not new code: it is the lineage competency question already in cq-catalog.json, run as a passing test by validate.py. A graph that grounds an AI answer is the same graph the harness already certifies — the proof harness that keeps every competency question green is, read in this light, the regression suite for the retrieval layer too.

This is precisely the structure-versus-substance distinction the book drew when it first separated what a thing is from what we say about it, now load-bearing for AI. The graph supplies substance: verified, identified, linked statements. The model supplies structure: fluent natural language over them.

The two roles are easy to confuse, and confusing them is the characteristic mistake of the moment. A model that sounds authoritative is supplying structure, not substance; it will narrate a false lineage as smoothly as a true one. Only the graph can make the lineage true, and only the ontology can make the graph mean what it claims to mean.

And the ontology is what makes the retrieved facts both trustworthy and connected — it is the reason the retrieval step can follow bp:derivedFrom from DP-004 back to its bioreactor batch and know that the path means something. Without the ontology underneath, GraphRAG retrieves a pile of strings. With it, GraphRAG retrieves knowledge.

A second consequence follows. A graph grounded in a shared upper ontology can answer questions its authors never anticipated, because the relationships are typed and composable rather than hard-coded into a report. That is exactly the property an AI agent needs when it is asked something off-script — and exactly the property a flat export of proprietary tables does not have.

The distinction matters for the survey below. Several of the cited efforts publish formal OWL/RDF ontologies; others organize data into structured proprietary schemas and call the result an ontology.

The two are not the same. A formal ontology commits to a logic, reuses public upper-level terms, and can be reasoned over; a proprietary schema is a private agreement about column names, useful but not interoperable by construction. A vendor's "AI-native schema" is not a BFO-grounded OWL model just because the marketing word is shared. The chapter keeps them apart, and where the public evidence does not let us tell which is meant, it says so.

The 2024-2026 evidence

The momentum is real, and it is mostly piloted. Regulated use under GMP (Good Manufacturing Practice — the regulated quality system every licensed medicine is made under) is still nascent — a distinction worth holding onto as we go through the cases. The reason that gate stays shut is not shyness; it is that a graph used in a quality decision becomes a regulated computerized system, and the bar is concrete. Any system that produces or holds GMP records must be validated under the GAMP 5 / CSA risk-based discipline, and any electronic record or signature it carries must meet 21 CFR Part 11 and EU GMP Annex 11 — audit trail, version-pinned change control, and signer attribution — against the ALCOA+ integrity expectations (data that is attributable, legible, contemporaneous, original, and accurate, plus complete, consistent, enduring, and available) the same machinery enforces. A pilot retrieval graph clears none of that yet; that is the gap between a demo and a release-grade system, and the survey below sits squarely on the demo side of it.

Three things are easy to lose in the noise, so it is worth fixing them before the survey. A consortium announcement is a commitment, not a shipped system. A vendor benchmark is a claim the vendor chose to publish, not an independent measurement. And the marketing word "ontology" is applied to everything from a BFO-grounded OWL model to a tidy table of column names. Each case below is read with those three filters on. Several of these names recur in Enterprise Knowledge Graphs at Big Pharma and The Platforms; here they are read only through one lens — what, if anything, they give an AI to stand on.

Making data "AI-ready." The Pistoia Alliance announced Phase 3 of its CMC Process Ontology in October 2025, explicitly to make life-sciences data "AI-ready and interoperable from the lab through to manufacturing." The work builds on the ISA-88 and ISA-95 automation standards and is backed by Eli Lilly, Amgen, Merck & Co., AstraZeneca, GSK, and Johnson & Johnson.

The substantive integration work — spanning its CMC Process, Equipment, Material, Analytical, and IDMP ontologies — is scoped for 2026 and is therefore still forward-looking [1]. The IDMP substance identifier is, in our running model, what a real plant would put behind bp:DS-001. The announcement names ISA-88 and ISA-95 as its base; it does not claim to import BFO or IOF Core, so neither do we. (piloted)

The vendor stack. In November 2024, TetraScience and NVIDIA placed "sophisticated scientific ontologies" at the center of a four-pillar scientific-AI stack — compute, models, use-case expertise, and ontologies — integrating CUDA-X and BioNeMo. The companies said an accompanying Lead Clone Assistant could cut lead-clone selection time by up to 80 percent.

In October 2025 the same vendor launched SAIL (Scientific AI Lighthouse) with Takeda as founding partner, aiming to reduce CMC cycle times by organizing data into "AI-native schemas, taxonomies, and ontologies" [2].

The 80 percent and the cycle-time improvement are vendor press-release figures — self-reported, unaudited; read them as claims, not measurements. Note too that "schemas, taxonomies, and ontologies" spans a wide rigor range, and the public material does not establish a formal OWL/RDF ontology under the marketing word. (piloted)

GraphRAG in the wild. Neo4j's GraphTalk Pharma and Life Sciences 2025 recap surfaced several graph ecosystems aimed at agentic generative AI: Merck Group's Synaptix, connecting pre-clinical, clinical, and regulatory knowledge; Bayer's patient maps; and Syngenta's NOCTIS, an open-source reaction-to-knowledge-graph toolkit [3].

The recap does not call any of these "production," and GraphRAG for regulated GMP or quality contexts is not found in public evidence we surveyed. These are research and discovery deployments, not release-decision systems — close to the patient's problem, not yet inside the plant's. The enterprise-graph survey places these same systems on its R&D side of the GMP boundary; this chapter only asks what they ground a model on. (piloted)

The rigorous exemplar. The cleanest published account of doing this properly is Novo Nordisk's Ontology-Based Data Management (OBDM), described in the Journal of Biomedical Semantics in 2025: a knowledge graph in production use that reuses public ontologies — AFO, OBI, ChEBI, and BFO — rather than inventing a private vocabulary [4].

It is the field's best peer-described evidence that an ontology-grounded graph can be run in earnest rather than demonstrated once.

It is also the only case here that is both formally ontology-based and reported in the open literature, which is why it carries more evidentiary weight than the louder announcements around it. The reuse is the tell: a team that adopts AFO, OBI, ChEBI, and BFO instead of minting its own vocabulary is doing the interoperability work this book has argued for, not just the storage work. (production)

The data feeding it. Two infrastructure trends quietly make the rest possible.

In analytical data, the 2024-2025 shift away from the heavyweight Allotrope Data Format toward the lightweight Allotrope Simple Model (ASM) — with vendors building ASM-to-knowledge-graph workflows — is the dominant trend producing AI-ready analytical results [5]. A simpler on-the-wire format that more instruments actually emit is worth more to a graph than a richer one that few systems implement. (production)

In automation, the first Module Type Package V2.0 plugfest ran in 2025 with multiple providers, and the formal MTP 2.0 specification published around early 2026 — the modular substrate a future AI-orchestrated plant would ride on [6]. A pharma-specific "MTP+" extension is sometimes mentioned but is not found in public evidence as a named, released deliverable. (piloted)

Case	What it is	Maturity
Pistoia CMC Process Ontology Phase 3	"AI-ready" CMC vocabulary on ISA-88 / ISA-95	(piloted)
TetraScience + NVIDIA / SAIL	ontologies named as a pillar of a scientific-AI stack	(piloted)
Merck Synaptix / Bayer / Syngenta NOCTIS	GraphRAG for discovery and regulatory knowledge	(piloted)
Novo Nordisk OBDM	published, reuse-based ontology graph in production use	(production)
Allotrope Simple Model	lightweight analytical data piped to the graph	(production)
MTP V2.0	modular automation substrate	(piloted)

Read the table as a gradient, not a verdict. The two production rows are infrastructure — a data format and a single published graph — while the headline AI applications above them sit at pilot.

And the regulated release decision, the place this whole book has aimed, appears in none of them. The frontier is genuinely moving; it has not yet arrived at the gate. That is not a criticism of the work — pilots are how serious things start — but it is the honest shape of 2026, and a survey that rounded the pilots up to production would be doing exactly the thing this chapter warns against.

The unsolved part: the model is not the bottleneck — the data is

Read across the digital-twin and manufacturing-AI literature and the same obstacle keeps surfacing, and it is not the algorithms. The barrier named again and again is semantic and FAIR (Findable, Accessible, Interoperable, Reusable) data standardization: ontology-grounded, interoperable, well-governed data is the constraint, not the modeling [7].

This "the modeling is solved, the data is not" thesis leans heavily on a single review and is asserted broadly across the field; read it as a strongly held industry opinion, not as settled fact. One could argue the modeling is far from finished — continuous processing and cross-organizational federation, the book's own open seams, are evidence enough.

But on the narrower point the thesis is hard to dispute: no amount of model cleverness rescues an input that is ungoverned, unidentified, and semantically mute.

Concretely, the difference between mute and meaningful is the difference between a historian tag emitting a bare float — BR101.Feed.PV = 0.40, a plant data-recorder reading with no units or context (PV is the process value, the live measured number) — and the same reading landing in the graph as a unit-bearing, identified quantity: a feed rate of 0.40 vessel-volumes per day (UCUM /d, a standard machine-readable unit code), realized on the production phase of BATCH-2026-001 within its normal operating range, against a culture-temperature setpoint of 36.5 degC (NOR 36.0-37.0, its normal operating range). The number is identical; the retrievability is not. A model asked whether a feed deviation could explain an aggregate excursion can reason over the second form and cannot even parse the first. The AI-readiness work the survey names is precisely this lifting of bare process values into typed, unit-carrying, lineage-anchored statements.

But its implication is exactly the book's own. AI does not let you skip the work of correct classification, honest identity, and governed change. It punishes skipping that work more severely than ever.

A spreadsheet with a wrong cell sits there inertly until someone reads it. A model that learns from a hollow or mislabeled graph does not fail quietly. It invents an answer with confidence, fluently, and at scale — and a confident, fluent, wrong answer is the most expensive kind, because it is the one no one stops to check.

This is also where the book's existing machinery earns its second life. The SHACL release gate (shapes.ttl) — SHACL being a constraint language that checks a graph conforms to a set of required shapes (rules about what every node must carry) before the graph is trusted — was built to refuse a non-conformant release; in an AI setting the same shapes refuse a non-conformant retrieval. Before a subgraph is handed to a model, conformance is what certifies it is complete and well-typed — that every lot has its bp:derivedFrom parent, every CQA its value, every signature its signer — rather than a partial load the model will cheerfully complete from training memory. A graph that fails its shapes is exactly the hollow or mislabeled graph above; SHACL is how you catch it before the model does, fluently, in front of a customer.

That SHACL-screened subgraph is also the only honest training set a model that learns over these instances can have, and it changes how such a model must be validated. The companion ML volume's model-validation chapter insists that a model is trustworthy not because its accuracy is high but because it was validated, locked, and documented against a pre-stated acceptance criterion — and a graph supplies two things that discipline needs. First, it fixes the unit of learning. Because lineage is explicit, a model can be split the way it must be: not by random rows but by batch, with a grouped / leave-one-batch-out cross-validation that holds out every row sharing a bp:derivedFrom ancestor, so the score the model reports is one it would actually earn on an unseen campaign rather than one inflated by leaking sibling samples of the same lot across the split. The graph makes that grouping mechanical — the derivedFrom edges are the grouping key — where a flat table leaves it to a hopeful convention. Second, it supplies a validation paradox worth naming: a fluent model is checked against held-out data, but the held-out data is itself only as honest as the graph it came from. A model that quietly contradicts a reasoned graph — one whose owl:TransitiveProperty closure and SHACL shapes have already been machine-checked — is, in that contradiction, more likely wrong than the graph is, because the graph's answer was derived and certified while the model's was generated. The graph is the ground truth the ML chapters keep reaching for: an ontology-reasoned, shape-validated lineage is exactly the trustworthy label and leak-free split that bioprocess data, scarce and confounded, otherwise cannot supply.

And the ontology does for the retrieval boundary what an applicability-domain gate does for a soft sensor. A graph feature handed to a model still carries an in-or-out-of-envelope question — is this lot's lineage the kind the model was validated on, or a configuration it has never seen? — and the typed graph answers it by construction: a query that returns no conforming subgraph is the retrieval-time analogue of an out-of-distribution flag, a refusal to answer rather than a confident guess on unfamiliar ground. The same boundary governs upkeep. A graph that grounds a model is not built once; as the governed-change machinery versions the ontology and the plant adds campaigns, the grounded model's behavior drifts with its substrate, so the retrieval layer needs the same monitored, change-controlled lifecycle the ML book's MLOps discipline demands of any deployed model — the ontology's version is part of the model's provenance, not a detail beneath it.

The model emits an answer for every query, but the reasoned, SHACL-conformant graph is the ground truth it is checked against: matching predictions are accepted, while a prediction that contradicts the graph or returns no conforming subgraph makes the graph abstain rather than guess. Original diagram by the authors, created with AI assistance.

So the headline reverses the usual hype. The frontier is not a cleverer model. The models are already astonishing. The frontier is a graph good enough to deserve one — and that is a question of identity, classification, and governance, not of parameters.

Why it matters

The loudest reason in 2026 to build a bioprocess ontology is no longer compliance or integration — it is to give artificial intelligence something true to stand on.

That reframes every earlier chapter without contradicting one of them. The identity discipline, the SHACL gates, the governed change control: nothing about them changes. What changes is the consequence of getting them wrong, which used to be a confused report and is now a confident machine repeating the confusion to everyone who asks.

Correct classification, honest identity, governed change, FAIR-in-fact: these were never optional hygiene, and now they are the literal grounding substrate of the next generation of manufacturing intelligence.

It also explains why the survey above is so uneven. The press releases announce models and stacks; the production reality is a data format and one published graph. The order is telling. Substance is harder than structure, and it is the part the industry is still building. A plant that does the unglamorous identity-and-governance work first will have something for a model to stand on; a plant that buys the model first will have a very fluent liar.

A model is only ever as trustworthy as the structured, well-governed knowledge it is anchored to — and that knowledge is what this entire book has been about building.

Key terms

Generative AI / large language model — a system that produces fluent natural language by predicting plausible continuations; expert at form, indifferent to whether a specific claim is true.
Hallucination — a model's confident production of a false but plausible-sounding statement, the failure mode that grounding is meant to prevent.
Retrieval-augmented generation (RAG) — a pattern that retrieves verified facts from a trusted store and requires the model to answer from them rather than from training memory.
GraphRAG — RAG whose trusted store is a knowledge graph, so retrieval follows typed relationships (such as bp:derivedFrom) and returns connected facts rather than loose text.
Grounding / ground truth — the verified, curated facts an AI system is anchored to; in this chapter, the knowledge graph and the ontology beneath it.
AI-ready data — data structured, identified, and governed well enough to serve as reliable input to AI; the explicit goal of the Pistoia Alliance's Phase 3 work.
Allotrope Simple Model (ASM) — a lightweight successor to the Allotrope Data Format for analytical results, increasingly piped into knowledge graphs.
OBDM (Ontology-Based Data Management) — Novo Nordisk's published, reuse-based knowledge graph in production use, the field's clearest peer-described exemplar.
Knowledge graph / node / edge — a store of facts as typed links — node–edge–node — that retrieval can walk, e.g. lot —derivedFrom→ batch.
RDF / OWL — RDF is the graph data model (subject–predicate–object triples); OWL is the logic layer on top of it that lets you declare properties like owl:TransitiveProperty and reason over them.
SPARQL — the standard query language for RDF graphs; (bp:derivedFrom)+ is one SPARQL property path.
SHACL — a constraint language that checks a graph conforms to required shapes (shapes.ttl) before it is trusted; the same gate that refuses a non-conformant release also refuses a non-conformant retrieval.
Grouped / leave-one-batch-out cross-validation — splitting a model's data by batch rather than by row, holding out every row that shares a bp:derivedFrom ancestor, so the reported score is not inflated by sibling samples of the same lot leaking across the split; the graph's lineage edges supply the grouping key.
Validation paradox — a fluent model is checked against held-out data, but a reasoned, shape-validated graph is more trustworthy than the model that contradicts it, because the graph's answer was derived and certified while the model's was generated.
Applicability domain (retrieval-time) — the in-or-out-of-envelope check on whether a queried lineage is the kind a grounded model was validated on; a query that returns no conforming subgraph is the graph analogue of an out-of-distribution flag.
BFO / IOF Core — public upper ontologies (shared top-level term sets) that a model reuses instead of inventing its own; reuse is what makes graphs interoperable.
FAIR — data that is Findable, Accessible, Interoperable, and Reusable.
GMP (Good Manufacturing Practice) — the regulated quality system every licensed medicine is made and released under.
CPP / CQA — critical process parameter (a setting you control, such as feed rate) / critical quality attribute (a measured release property, such as monomer purity).

Where this leads

That is the frontier — vivid, fast-moving, and honest only if we are honest about how much of it is still a slide rather than a shipped system.

The book has one task left: to add up what ontologies genuinely deliver against what they quietly leave undone. The next and final chapter, An Honest Verdict: What Ontologies Solve, and What They Leave to People, folds this survey's gradient — one production graph, a clutch of pilots, an empty GMP gate — into the book-wide ledger, settles that account, and points beyond this book to its companion volume, Machine Learning & AI for Biomanufacturing, for which everything here has been the prerequisite, because a model that learns is only ever as trustworthy as the structured, FAIR, well-governed knowledge it learns from.

What this chapter covers​

Why a fluent model still needs a true graph​

The 2024-2026 evidence​

The unsolved part: the model is not the bottleneck — the data is​

Why it matters​

Key terms​

Where this leads​