Skip to main content

The Platforms: How Vendors Sell Semantics

📍 Where we are: Part VII · Ontologies in Industry Today — Chapter 26. The standards and vocabularies are real, and we have walked them end to end. Now we ask the practical question a plant actually faces: when you buy software, what do you get?

The previous chapters established that the vocabularies exist — BFO at the top, IOF Core in the middle, domain terms for substances and processes below. But a standard is not a product. A plant does not download an ontology and run on it; it buys platforms, and those platforms arrive with the word semantics printed on the box. Every vendor on the slide deck has discovered that "knowledge graph" sells, and so the word now adorns products that mean wildly different things by it.

The question this chapter resolves is whether what is inside the box is a formal, governed ontology you can import and reason over, or a proprietary data model with a graph drawn on the lid. That distinction is not pedantry.

Two products can both say knowledge graph on the same slide and mean entirely different things — one an OWL/RDF graph you can export, align, and reason over; the other a closed object model whose "graph" is a query convenience. Telling them apart, before you sign, is the difference between a model you own and a model you will reverse-engineer later, after the integration budget is already spent. The survey that follows is organized around that single discrimination, applied honestly to every product it names.

The simple version

Imagine furniture shopping. Some stores sell you lumber and standardized joinery: take it home, combine it with wood from any other store, and build whatever you want. Others sell you a beautiful cabinet that opens only with their key and fits only their other cabinets. Both stores say "modular." Only one of them lets you walk out with parts that work anywhere. Buying bioprocess software is the same — some vendors hand you an open, portable model; others hand you a gorgeous closed box. The cabinet may be exactly what you need. But you should know which one you are buying.

What this chapter covers

We survey the commercial landscape across four layers and apply one test throughout: is this a formal OWL/RDF ontology — a model with explicit, machine-checkable semantics you can export and align — or a proprietary structured data model wearing the knowledge graph label as branding?

We look at lab-informatics platforms, where the semantics are genuinely formal; at manufacturing-execution and historian systems, where they are rich but closed; at the "digital twin of the organization" framing and the graph databases beneath enterprise semantic layers, where the category splits down the middle; and finally at the unglamorous bridge nobody advertises — the mapping machinery that turns relational and historian data into triples.

Throughout, we tag each claim by maturity (production/piloted/proposed/academic) and flag vendor marketing as exactly that. We are careful, in particular, not to conflate a formal OWL/RDF ontology with a structured or proprietary data model, because that conflation is precisely what the marketing invites and the whole chapter exists to undo.

Four side-by-side tier columns showing where one batch's artifacts live: green Lab informatics (formal OWL/RDF ontology; TetraScience, Benchling, Revvity, Sapio, Scitara; bp), amber MES / batch (proprietary structured model; Körber PAS-X, Siemens Opcenter; bp), cyan Historian (object / event-frame model; AVEVA PI System and PI Asset Framework, formerly OSIsoft; time-series of bp), and violet Enterprise graph (mixed RDF and object models, mostly R and D; Palantir Foundry, Stardog, Ontotext, Neo4j; bp); the amber MES and cyan Historian columns drop indigo arrows into a bottom band reading R2RML / RML mapping, the bridge nobody advertises. Color = what is inside the box: green a formal ontology; amber and cyan proprietary models; violet a mixed tier, with only the closed MES and historian tiers needing the R2RML/RML mapping bridge to reach the graph. Original diagram by the authors, created with AI assistance.

Lab informatics: where the semantics are real

The scientific-data and laboratory-informatics layer is where genuinely semantic tooling has its firmest footing — because it sits closest to R&D, where heterogeneous instrument data and external vocabularies have always been the problem.

A chromatography system, a plate reader, and a mass spectrometer each speak their own dialect; the only way to ask one question across all three is to map them onto a shared model. That pressure — felt daily, by scientists who cannot wait for a standards committee — has produced real ontology integration in this tier, not just the label.

TetraScience runs the Tetra Scientific Data Cloud, which transforms raw instrument output into an open Intermediate Data Schema (IDS), maps that schema to the Allotrope Simple Model (ASM), and integrates SciBite's CENtree ontology manager to supply controlled terminology. The architecture is exactly the layered one the standards chapters described: a normalization schema underneath, an interchange model above it, and a governed vocabulary alongside.

One reported FAIRification effort, running about two and a half years, aimed to convert data from more than 6,000 instruments into ASM, then index and serve it through a knowledge graph (production) [1]. The customer is anonymized in the vendor factsheet as a "top 25 pharma"; a specific identity has circulated only via a conference program, so treat any named adopter here as inferred, not asserted.

Benchling approaches semantics at the point of capture rather than after the fact. Its Registry supports ontology-backed data capture, and the SciBite Ontology Entity Registry app aligns captured records to enterprise ontologies held in CENtree as scientists enter them — the BioAssay Ontology is the worked example. Capturing meaning at the keyboard, rather than reconciling it in a later cleanup, is the more durable design.

Benchling also maintains the open-source allotropy Python library that converts instrument output to ASM (production) [2]. The frequently quoted "more than half of the top 50 biopharma" figure is vendor marketing and should be read as such — a self-reported reach claim, not an audited count of governed-ontology deployments.

The remaining lab platforms cluster similarly. Revvity Signals One ships the ability to plug in public or custom ontologies plus semantic search (production) [3]. Sapio integrates Elsevier and SciBite content into an AI co-scientist, though its "living knowledge graph" phrasing is marketing rather than a claim about formal governance. Scitara DLX is a vendor-neutral laboratory-data integration platform compatible with ASM-JSON, AnIML, and SiLA.

Across all of them, named pharma end-customers are generally not disclosed — which is itself worth noting, because it means most adoption evidence at this layer is self-reported, and self-reported reach is not the same as governed-ontology depth.

What unites the credible end of this layer is a dependence on a small set of genuinely open anchors: ASM as the instrument-data interchange model, CENtree as the terminology manager, and the published allotropy library as the conversion path. Where a vendor leans on those, the claim of "semantic" tooling has something verifiable behind it; where the only evidence is the phrase on the slide, it does not.

The practical test for a buyer is whether the controlled terms and the schema mapping can be exported and reused outside the platform — because that, not the marketing, is what makes the data portable when the next tool arrives. By that test the lab tier is the most reassuring of the four, which is fitting, since it is also the one under the most genuine pressure to interoperate.

Manufacturing and historians: structured models, not ontologies

Drop from the lab to the plant floor and the picture changes sharply. The dominant execution and historian platforms encode rich semantics — but as proprietary structured data models, not OWL/RDF ontologies.

This is the single most important distinction in the chapter, because it is the one most often blurred, and the blur is where a procurement decision goes wrong. A model can be detailed, version-controlled, standards-aligned, and entirely correct, and still be closed — and "closed but correct" is the default state of the manufacturing tier, not an exception to be explained away.

Körber Werum PAS-X and Siemens Opcenter Execution Pharma carry batch meaning through Master Batch Record (MBR) recipe models, reusable building blocks, and version-controlled equipment management, integrating through the ISA-95 and ISA-88 standards (production) [4]. These are powerful and deeply deployed; the recipe in a PAS-X MBR knows the difference between a parameter and a result as surely as any OWL class would. But that knowledge lives inside a closed structured model, not in a formal ontology you can import and align with your own.

The distinction has a sharp edge in practice. A common secondhand framing attributes a formal "Equipment State Diagram" ontology artifact to PAS-X; that artifact was not found in surveyed vendor sources, which instead document version-controlled equipment status and lifecycle management.

The lesson is not that the framing is malicious but that it is the kind of upgrade — from "structured equipment model" to "ontology" — that happens easily on a slide and badly on a contract. The meaning is real; the formal, exportable ontology is simply not the form it takes.

AVEVA PI System (formerly OSIsoft) supplies the de facto contextualization layer over plant time-series through its PI Asset Framework (AF): equipment and process hierarchies, reusable templates, Asset Analytics, and Event Frames — time-bounded events that capture batches, shifts, or downtime windows (production) [5]. AF is an object/template plus event-frame model, widely deployed in regulated pharma, and decidedly not a formal semantic-web ontology.

When our running batch BATCH-2026-001 emits a time-series trace, PI AF is very likely where that trace acquires its equipment and process context — but that context lives in AF's model, not as triples in a graph you could query alongside bp:DS-001. The contextualization is genuine and valuable; it simply does not arrive in a form the rest of the semantic stack can read without translation.

The reason this layer holds its meaning in proprietary form is not vendor stubbornness but the nature of the floor it serves. A validated MES or historian changes slowly and under tight change control, because each change touches a system of record in a regulated process. An OWL ontology, by contrast, is built to be edited, aligned, and re-reasoned. The two cultures pull in opposite directions, and the manufacturing vendors have, rationally, chosen stability over openness.

The consequence for a semantic program is concrete: the richest, most trustworthy operational meaning in the building — what equipment ran which step of which batch, and when — sits in exactly the systems least willing to emit it as triples. Bridging that gap is not a modeling problem; it is a mapping problem, and we return to it below.

LayerRepresentative productsWhat it actually isMaturity
Lab informaticsTetraScience, Benchling, RevvityFormal ontology integration (ASM, CENtree)production
MES / batchKörber PAS-X, Siemens OpcenterProprietary structured recipe/equipment modelproduction
HistorianAVEVA PI Asset FrameworkObject/template plus event-frame modelproduction
Enterprise graphPalantir, Stardog, Ontotext, Neo4jMixed: true RDF graphs and proprietary object modelsproduction (mostly R&D)

The "digital twin of the organization" and the graph databases

One tier up sits the enterprise graph, and here the OWL/RDF-versus-proprietary line runs straight through the middle of the category. The same word, "ontology," names both an exportable RDF artifact and a closed object model — and the marketing rarely volunteers which one you are looking at.

Palantir Foundry calls its core abstraction an "Ontology," binding datasets and models to real-world objects — plants, equipment, products, orders — through objects, properties, links, plus actions and functions. Palantir frames it as "a digital twin of the organization," and over roughly 2023 to 2025 it became the backbone for the platform's AI agents (production) [6].

It is, on the evidence, an object model with graph semantics rather than a published OWL/RDF artifact; the term "Ontology" is doing branding work alongside its technical one, and the two senses should not be merged. No named pharma GxP-manufacturing Ontology customer was found in the public evidence — documented life-sciences adopters are adjacent to manufacturing rather than on the GMP floor — and the popular "Factory then Line then Machine then Part" hierarchy comes from third-party analyses, not Palantir's own documentation, so it should not be cited as the product's own model.

The graph databases are where formal semantics return in force, and almost entirely in R&D.

Stardog names Boehringer Ingelheim as its flagship: a semantic layer over roughly 90% of the company's R&D data, delivered through virtualization with no ETL — a vendor-reported figure (production for R&D) [7]. Ontotext GraphDB underpins AstraZeneca's LinkedLifeData usage and Roche's terminology stack, and the vendor claims an AI-powered target-discovery solution at a "leading top 10 pharma" — that last figure is a vendor claim (production for AstraZeneca and Roche) [8]. Neo4j, a property-graph database rather than an RDF triplestore, backs AstraZeneca's biological knowledge graph and Novartis/NIBR graphs (production in R&D) [9].

Two distinctions matter inside this tier. The first is RDF versus property graph: Stardog and Ontotext are RDF/OWL stores whose contents are, in principle, exportable and alignable against the standards earlier chapters built on, whereas Neo4j's property-graph model is expressive and fast but not natively RDF, so its "knowledge graph" is a different artifact that needs a mapping to become interoperable in the standards sense.

The second distinction is where the deployments actually live. The pattern is unmistakable: the genuinely semantic deployments cluster in discovery and research, where the questions are exploratory and the data already heterogeneous — not on the GMP execution floor, where the systems are validated, the schemas are frozen, and change is expensive. A buyer evaluating one of these platforms for manufacturing is therefore extrapolating from R&D evidence, and should say so out loud rather than let an R&D case study stand in for a GMP one.

One batch, four boxes: where our running example actually lives

It helps to trace the book's own example through the landscape, because it shows the gap concretely rather than in the abstract.

Picture the artifacts the earlier chapters modeled — the working cell bank WCB-CHO-001, the production run BATCH-2026-001, the drug substance DS-001 — and ask, for each, which of these four boxes holds the real data in a working plant. The answer is rarely "one box," and that is the whole difficulty in miniature.

The cell-bank and assay records originate in the lab layer, where they may genuinely be ontology-aligned at capture — a Benchling registry entry tied to a CENtree term, an instrument result normalized to ASM.

Here the semantics are formal, and bp:WCB-CHO-001 could plausibly carry an honest type from the moment it is recorded. This is the one box where the book's idealized model and the commercial reality come closest to matching.

The production step lives in the MES, where BATCH-2026-001's recipe, parameters, and equipment usage are encoded in a PAS-X or Opcenter MBR — rich, correct, and closed.

The time-series behind that step lives in the historian, contextualized by PI AF into equipment and event-frame structure. Both are real meaning, captured by mature systems doing their jobs well; neither is RDF, and so neither joins the lineage graph without a mapping.

The release and lineage view — the thing this book most wants, the graph in which bp:DS-001 derives transitively from bp:WCB-CHO-001 — exists, today, mostly in the R&D-flavored enterprise-graph tier, if it exists in graph form at all.

The conclusion writes itself. The data is all present; it is just scattered across boxes that do not speak the same formal language, each fluent in its own dialect and deaf to the others. The only thing standing between them is a translation layer — which is precisely the seam the next section is about.

The bridge nobody advertises: getting structured data into RDF

How does a semantic layer sit over data that lives in relational databases and historians without anyone re-keying it? The answer is mapping, and it is the most under-discussed seam in the whole stack — under-discussed precisely because it is unglamorous and because admitting it exists means admitting the graph is not the system of record.

R2RML — the W3C RDB-to-RDF Mapping Language — and its more general RML extension declare, in machine-readable rules, how database tables become RDF triples [10]. Those triples can be materialized into a store, or queried virtually so the data never moves. The same mapping that turns a LIMS table into a graph one day can be re-pointed at a new LIMS the next, which is exactly why it is the durable asset and the platform around it often is not.

This is the actual machinery behind every "virtualization" and "no-ETL" claim: a structured-data shop floor — PI, MES, LIMS — feeds a semantic layer through a mapping file, not a migration. The IDMP substance identifier a real plant puts behind our bp:DS-001, the lot number in its MES, the time-series tag in its historian — all of them reach the graph through exactly this seam, or they do not reach it at all.

The mapping file is therefore the true measure of how open a "knowledge graph" really is. If the vendor will hand you the R2RML or RML and let you point it at your own store, you own the bridge and can rebuild the graph anywhere. If the mapping is internal and undocumented, the graph is theirs and you are renting access to your own data through their pipe. This is the most decision-relevant question in the whole survey, and it is almost never on the slide, because the answer that protects the buyer is the one that least flatters the vendor.

The unsolved part: a graph on a slide is not a governed ontology

The word knowledge graph on a vendor slide is not evidence of a formal, governed ontology. Much of what ships is a proprietary object model with a graph veneer, and the loudest adoption numbers — "more than half of the top 50," "leading top 10 pharma," "90% of R&D data" — are self-reported and unaudited.

None of these claims could be verified against independent evidence in the surveyed public material, and several named pharma adopters appear only in vendor case studies or conference programs rather than in independently published deployments. That does not make them false; it makes them unconfirmed, and the honest posture toward an unconfirmed vendor figure is to repeat it with its source attached and lean on it for nothing.

The genuinely semantic layer is real in R&D informatics and in catalog and terminology tooling. As formal OWL/RDF, it is, on the available public evidence, absent from the GMP execution floor — exactly where this book's manufacturing batch lives.

That gap is not a failure of any one vendor; it is the unsolved state of the market. A plant should plan around it rather than assume a slide has closed it — which in practice means budgeting for the mapping layer, demanding the mappings as deliverables, and treating any unverified "knowledge graph" on the manufacturing floor as a goal to be engineered rather than a feature already bought.

Why it matters

A plant choosing tools has to read past the branding and ask one concrete question: does this product hand me a governed, importable ontology, or a closed model I will map out again later? The answer decides whether the digital thread that links WCB-CHO-001 through BATCH-2026-001 to DS-001 can be reasoned over as one graph, or whether it stays trapped in a dozen proprietary schemas, each correct in isolation and mute across the boundary.

The R2RML/RML mapping layer is precisely what sets the cost of that bridge — cheap if the data was modeled with export in mind, brutal if it was not. A vendor who treats the mapping as a deliverable is selling you a model you can leave with; a vendor who treats it as a trade secret is selling you a model you can only stay inside.

Buying for semantics, then, means buying for the exit, not just the demo. The question is never how good the graph looks on stage, but whether you can carry your part of it out the door — and that is a question the next chapter shows is hard even for a single company that owns every system involved.

Key terms

  • OWL/RDF ontology — a model with explicit, machine-checkable semantics expressed in open standards, which can be exported, aligned with other ontologies, and reasoned over.
  • Proprietary structured data model — a vendor's internal schema (recipes, asset templates, object types) that carries real meaning but is closed and not directly importable as a formal ontology.
  • Intermediate Data Schema (IDS) — TetraScience's open schema for normalized instrument data, mapped onward to the Allotrope Simple Model.
  • PI Asset Framework (AF) — AVEVA's object/template plus event-frame layer that contextualizes plant time-series; widely deployed in regulated pharma but not a semantic-web ontology.
  • Event Frame — a PI AF construct representing a time-bounded event such as a batch, shift, or downtime window.
  • Master Batch Record (MBR) — the recipe model in MES platforms such as PAS-X that encodes batch-procedure semantics as a proprietary structured model.
  • Digital twin of the organization — Palantir's framing for an object model that binds data and logic to real-world entities; powerful, but an object model rather than a published OWL/RDF artifact.
  • R2RML / RML — the W3C RDB-to-RDF Mapping Language and its extension, which declare how relational tables become RDF triples, materialized or queried virtually.

Where this leads

The vendors sell the parts; the next chapter asks what happens when a single large pharmaceutical company tries to assemble them into one coherent fabric. Enterprise Knowledge Graphs at Big Pharma follows the semantic layer out of the demo and into the messy reality of an enterprise with hundreds of systems, decades of legacy data, and a governance problem no mapping file alone can solve.