The Platforms: How Vendors Sell Semantics

📍 Where we are: Part VIII · Ontologies in Industry Today. The standards and vocabularies are real, and we have walked them end to end. Now we ask the practical question a plant actually faces: when you buy software, what do you get?

The previous chapters established that the vocabularies exist — BFO at the top, IOF Core in the middle, domain terms for substances and processes below. But a standard is not a product. A plant does not download an ontology and run on it; it buys platforms, and those platforms arrive with the word semantics printed on the box. Every vendor on the slide deck has discovered that "knowledge graph" sells, and so the word now adorns products that mean wildly different things by it.

The question this chapter resolves is whether what is inside the box is a formal, governed ontology you can import and reason over, or a proprietary data model with a graph drawn on the lid. That distinction is not pedantry.

Two products can both say knowledge graph on the same slide and mean entirely different things — one an OWL/RDF graph you can export, align, and reason over; the other a closed object model whose "graph" is a query convenience. Telling them apart, before you sign, is the difference between a model you own and a model you will reverse-engineer later, after the integration budget is already spent. The survey that follows is organized around that single discrimination, applied honestly to every product it names.

The simple version

Imagine furniture shopping. Some stores sell you lumber and standardized joinery: take it home, combine it with wood from any other store, and build whatever you want. Others sell you a beautiful cabinet that opens only with their key and fits only their other cabinets. Both stores say "modular." Only one of them lets you walk out with parts that work anywhere. Buying bioprocess software is the same — some vendors hand you an open, portable model; others hand you a gorgeous closed box. The cabinet may be exactly what you need. But you should know which one you are buying.

What this chapter covers

We survey the commercial landscape across four layers and apply one test throughout: is this a formal OWL/RDF ontology — a model with explicit, machine-checkable semantics you can export and align — or a proprietary structured data model wearing the knowledge graph label as branding? An ontology, in this book, is a shared, formal vocabulary of the things in a domain and how they relate; OWL (Web Ontology Language) and RDF (Resource Description Framework) are the open W3C standards it is written in, RDF recording facts as simple subject–predicate–object statements called triples. A graph here is not a chart but that web of linked statements; machine-checkable semantics means a computer can derive new facts from the model and flag ones that violate its rules, rather than just store rows; aligning two ontologies means declaring where a term in one means the same thing as a term in another, so the two can be queried together. (The full set is gathered under Key terms at the end.)

We look at lab-informatics platforms, where the semantics are genuinely formal; at manufacturing-execution and historian systems (a historian here being the time-series database that records every plant sensor reading, not a person), where they are rich but closed; at the "digital twin of the organization" framing and the graph databases beneath enterprise semantic layers, where the category splits down the middle; and finally at the unglamorous bridge nobody advertises — the mapping machinery that turns relational and historian data into triples.

Throughout, we tag each claim by maturity (production/piloted/proposed/academic) — production means shipping and used in live deployments; piloted means trialled in a limited setting; proposed means announced or designed but not yet deployed; academic means demonstrated in research rather than industry — and flag vendor marketing as exactly that. We are careful, in particular, not to conflate a formal OWL/RDF ontology with a structured or proprietary data model, because that conflation is precisely what the marketing invites and the whole chapter exists to undo.

Color = what is inside the box: green a formal ontology; amber and cyan proprietary models; violet a mixed tier, with only the closed MES and historian tiers needing the R2RML/RML mapping bridge to reach the graph. Original diagram by the authors, created with AI assistance.

Lab informatics: where the semantics are real

The scientific-data and laboratory-informatics layer is where genuinely semantic tooling has its firmest footing — because it sits closest to R&D (research and development), where heterogeneous instrument data — readings from many different instruments, each in its own format — and external vocabularies have always been the problem.

A chromatography system, a plate reader, and a mass spectrometer each speak their own dialect; the only way to ask one question across all three is to map them onto a shared model. That pressure — felt daily, by scientists who cannot wait for a standards committee — has produced real ontology integration in this tier, not just the label.

TetraScience runs the Tetra Scientific Data Cloud, which transforms raw instrument output into an open Intermediate Data Schema (IDS) — a common, vendor-neutral structure for that output — maps that schema to the Allotrope Simple Model (ASM, an industry-standard format for analytical lab data), and integrates SciBite's CENtree ontology manager to supply controlled terminology (an approved, governed list of terms so everyone names the same thing the same way). SciBite is a life-sciences semantics vendor; CENtree is its tool for storing and serving such vocabularies — an ontology manager. The architecture is exactly the layered one the standards chapters described: a normalization schema underneath, an interchange model above it, and a governed vocabulary alongside.

One reported FAIRification effort — FAIR being the goal of making data Findable, Accessible, Interoperable, and Reusable, so FAIRification is the work of converting messy data into that well-described, reusable form — running about two and a half years, aimed to convert data from more than 6,000 instruments into ASM, then index and serve it through a knowledge graph (production) [1]. The customer is anonymized in the vendor factsheet as a "top 25 pharma" (industry shorthand for one of the twenty-five largest pharmaceutical companies by revenue); a specific identity has circulated only via a conference program, so treat any named adopter here as inferred, not asserted.

Benchling approaches semantics at the point of capture rather than after the fact. Its Registry supports ontology-backed data capture, and the SciBite Ontology Entity Registry app aligns captured records to enterprise ontologies held in CENtree as scientists enter them — the BioAssay Ontology is the worked example. Capturing meaning at the keyboard, rather than reconciling it in a later cleanup, is the more durable design.

Benchling also maintains the open-source allotropy Python library that converts instrument output to ASM (production) [2]. The frequently quoted "more than half of the top 50 biopharma" figure is vendor marketing and should be read as such — a self-reported reach claim, not an audited count of governed-ontology deployments.

The remaining lab platforms cluster similarly. Revvity Signals One ships the ability to plug in public or custom ontologies plus semantic search — search that matches on meaning, using the ontology to find related concepts, rather than only on the exact words typed (production) [3]. Sapio integrates Elsevier and SciBite content into an AI co-scientist, though its "living knowledge graph" phrasing is marketing rather than a claim about formal governance. Scitara DLX is a vendor-neutral laboratory-data integration platform compatible with ASM-JSON, AnIML, and SiLA — three open lab-data standards (the JSON encoding of ASM; the Analytical Information Markup Language; and the SiLA instrument-connectivity standard), so compatibility means an instrument's output can leave the platform in a form other tools can read.

Across all of them, named pharma end-customers are generally not disclosed — which is itself worth noting, because it means most adoption evidence at this layer is self-reported, and self-reported reach is not the same as governed-ontology depth.

What unites the credible end of this layer is a dependence on a small set of genuinely open anchors: ASM as the instrument-data interchange model, CENtree as the terminology manager, and the published allotropy library as the conversion path. Where a vendor leans on those, the claim of "semantic" tooling has something verifiable behind it; where the only evidence is the phrase on the slide, it does not.

The practical test for a buyer is whether the controlled terms and the schema mapping can be exported and reused outside the platform — because that, not the marketing, is what makes the data portable when the next tool arrives. By that test the lab tier is the most reassuring of the four, which is fitting, since it is also the one under the most genuine pressure to interoperate.

Manufacturing and historians: structured models, not ontologies

Drop from the lab to the plant floor and the picture changes sharply. The dominant execution and historian platforms encode rich semantics — but as proprietary structured data models, not OWL/RDF ontologies. (A historian, here, is not a person but a specialized database that records every plant sensor reading over time — the temperature, pressure, and flow traces of each run — and an execution platform, or MES, Manufacturing Execution System, is the software that drives and records the steps of a production batch on the floor.)

This is the single most important distinction in the chapter, because it is the one most often blurred, and the blur is where a procurement decision goes wrong. A model can be detailed, version-controlled, standards-aligned, and entirely correct, and still be closed — and "closed but correct" is the default state of the manufacturing tier, not an exception to be explained away.

Körber Werum PAS-X and Siemens Opcenter Execution Pharma carry batch meaning — a batch being one production run of the product — through Master Batch Record (MBR) recipe models (the master template specifying every step, material, and parameter of that run), reusable building blocks, and version-controlled equipment management, integrating through the ISA-95 and ISA-88 standards — the two ISA standards for, respectively, how plant-floor systems exchange data with the business systems above them (ISA-95) and how a batch recipe is structured (ISA-88) (production) [4]. These are powerful and deeply deployed; the recipe in a PAS-X MBR knows the difference between a parameter and a result as surely as any OWL class would. But that knowledge lives inside a closed structured model, not in a formal ontology you can import and align with your own.

The distinction has a sharp edge in practice. A common secondhand framing attributes a formal "Equipment State Diagram" ontology artifact to PAS-X; that artifact was not found in surveyed vendor sources, which instead document version-controlled equipment status and lifecycle management.

The lesson is not that the framing is malicious but that it is the kind of upgrade — from "structured equipment model" to "ontology" — that happens easily on a slide and badly on a contract. The meaning is real; the formal, exportable ontology is simply not the form it takes.

AVEVA PI System (formerly OSIsoft) supplies the de facto contextualization layer over plant time-series through its PI Asset Framework (AF): equipment and process hierarchies, reusable templates, Asset Analytics, and Event Frames — time-bounded events that capture batches, shifts, or downtime windows (production) [5]. AF is an object/template plus event-frame model, widely deployed in regulated pharma, and decidedly not a formal semantic-web ontology.

When our running batch BATCH-2026-001 emits a time-series trace, PI AF is very likely where that trace acquires its equipment and process context — but that context lives in AF's model, not as triples in a graph you could query alongside bp:DS-001. The contextualization is genuine and valuable; it simply does not arrive in a form the rest of the semantic stack can read without translation.

Concretely: the tag BR101.Temp.PV — a historian tag being the named channel for one sensor's stream, here the process value (.PV) of bioreactor BR101's temperature — is bare numbers until an AF element template (a reusable equipment blueprint in PI AF) binds it to — that is, attaches it as a property of — the asset bp:BR-101. An Event Frame then stamps it with the run's phases — the growth phase, then the production phase (bp:CCP-001-growth, bp:CCP-001-production, two cell-culture-process steps, in the companion graph this book ships). Only with that context can the historian answer "what was the culture temperature during the production phase of BATCH-2026-001?" The structure that lets it is the same ISA-88 procedural model (batch then unit procedure then operation then phase) the MES Master Batch Record carries, which is why the historian and MES tiers describe the run in compatible terms even though neither emits it as RDF.

The reason this layer holds its meaning in proprietary form is not vendor stubbornness but the nature of the floor it serves. A validated MES or historian changes slowly and under tight change control, because each change touches a system of record in a regulated process. An OWL ontology, by contrast, is built to be edited, aligned, and re-reasoned. The two cultures pull in opposite directions, and the manufacturing vendors have, rationally, chosen stability over openness.

The consequence for a semantic program is concrete: the richest, most trustworthy operational meaning in the building — what equipment ran which step of which batch, and when — sits in exactly the systems least willing to emit it as triples. Bridging that gap is not a modeling problem; it is a mapping problem, and we return to it below.

Layer	Representative products	What it actually is	Maturity
Lab informatics	TetraScience, Benchling, Revvity	Formal ontology integration (ASM, CENtree)	production
MES / batch	Körber PAS-X, Siemens Opcenter	Proprietary structured recipe/equipment model	production
Historian	AVEVA PI Asset Framework	Object/template plus event-frame model	production
Enterprise graph	Palantir, Stardog, Ontotext, Neo4j	Mixed: true RDF graphs and proprietary object models	production (mostly R&D)

The "digital twin of the organization" and the graph databases

One tier up sits the enterprise graph, and here the OWL/RDF-versus-proprietary line runs straight through the middle of the category. The same word, "ontology," names both an exportable RDF artifact and a closed object model — and the marketing rarely volunteers which one you are looking at.

Palantir Foundry calls its core abstraction an "Ontology," binding datasets and models to real-world objects — plants, equipment, products, orders — through objects, properties, links, plus actions and functions. Palantir frames it as "a digital twin of the organization," and over roughly 2023 to 2025 it became the backbone for the platform's AI agents (production) [6].

It is, on the evidence, an object model with graph semantics rather than a published OWL/RDF artifact; the term "Ontology" is doing branding work alongside its technical one, and the two senses should not be merged. No named pharma GxP-manufacturing Ontology customer was found in the public evidence — GxP being the umbrella for the "Good Practice" regulations that govern drug making, and GMP (Good Manufacturing Practice) its production-floor member — documented life-sciences adopters are adjacent to manufacturing rather than on the GMP floor — and the popular "Factory then Line then Machine then Part" hierarchy appears in third-party analyses rather than in Palantir's own Foundry Ontology documentation [6], so it should not be cited as the product's own model.

The graph databases are where formal semantics return in force, and almost entirely in R&D.

Stardog names Boehringer Ingelheim as its flagship: a semantic layer over roughly 90% of the company's R&D data, delivered through virtualization — querying the source data where it already lives — with no ETL (Extract, Transform, Load: the usual work of copying data out of source systems, reshaping it, and loading it into a new store), a vendor-reported figure (production for R&D) [7]. Ontotext GraphDB underpins AstraZeneca's LinkedLifeData usage and Roche's terminology stack, and the vendor claims an AI-powered target-discovery solution at a "leading top 10 pharma" — that last figure is a vendor claim (production for AstraZeneca and Roche) [8]. Neo4j, a property-graph database rather than an RDF triplestore, backs AstraZeneca's biological knowledge graph and Novartis/NIBR graphs (production in R&D) [9].

Two distinctions matter inside this tier. The first is RDF versus property graph — two different ways to store a graph: RDF holds it as standard triples that any RDF tool can read, while a property graph stores nodes and edges that each carry their own attributes, in a vendor's own model (the formal definitions of both follow a few paragraphs down). Stardog and Ontotext are RDF/OWL stores whose contents are, in principle, exportable and alignable against the standards earlier chapters built on, whereas Neo4j's property-graph model is expressive and fast but not natively RDF, so its "knowledge graph" is a different artifact that needs a mapping to become interoperable in the standards sense. A property graph reaches RDF through a tool such as Neo4j's neosemantics (n10s) plugin or an export to RDF-star (an RDF extension that lets a triple carry attributes of its own, the way property-graph edges do), and the cost of maintaining that bridge is the same recurring tax the historian and MES tiers pay — the property-graph store does not escape the mapping seam, it just meets it one layer up.

The second distinction is where the deployments actually live. The pattern is unmistakable: the genuinely semantic deployments cluster in discovery and research, where the questions are exploratory and the data already heterogeneous — not on the GMP execution floor, where the systems are validated, the schemas are frozen, and change is expensive. A buyer evaluating one of these platforms for manufacturing is therefore extrapolating from R&D evidence, and should say so out loud rather than let an R&D case study stand in for a GMP one.

What the AI agents stand on: a buyer's test for the "AI-native" claim

The newest selling point across this tier is the one that should be read most skeptically. Palantir frames its Ontology as the backbone for AI agents; Sapio packages Elsevier and SciBite content into an "AI co-scientist"; Ontotext advertises an AI-powered target-discovery solution. The honest question is not whether a graph helps a model — it does — but which artifact the model is grounded on, because the OWL/RDF-versus-proprietary line decides whether that grounding is verifiable. A model that retrieves connected facts from a graph before it answers is doing GraphRAG (graph-native retrieval-augmented generation — pulling verified, linked facts from a trusted store and requiring the model to answer from them rather than from training memory, the technique the companion AI chapter (ontologies as the ground truth for AI) builds out in full); when the store is a closed object model, the retrieval is real but the grounding it claims cannot be exported, audited, or aligned, so the buyer cannot independently check that the cited lineage is true.

Two methodological facts decide whether such grounding is trustworthy, and both favor the exportable OWL/RDF artifact over the closed object model. First, a graph that has been reasoned and shape-validated is a stronger ground truth than a fluent model: a subgraph whose owl:TransitiveProperty lineage closure and SHACL release shapes have already been machine-checked is, where the two disagree, more likely right than the model that contradicts it — the validation paradox that the model-validation chapter of the companion ML volume turns into a test, since a model is trustworthy only when it was validated against a pre-stated, honest acceptance criterion, and a reasoned graph is exactly that. Second, the lineage edges supply the missing split. A model learning over these instances must be scored by grouped, leave-one-batch-out cross-validation — holding out every row sharing a bp:derivedFrom ancestor so a sibling sample of BATCH-2026-001 cannot leak from the training fold into the test fold and inflate the score — and the bp:derivedFrom edges are the grouping key, mechanical in a triplestore and a hopeful convention in a flat vendor export. The same boundary doubles as an applicability-domain check at retrieval time: a query that returns no SHACL-conforming subgraph is the graph analogue of an out-of-distribution flag, a refusal to answer rather than a confident guess on a lineage the model was never validated on. None of this is reachable through a property graph or object model whose contents you cannot export — which is exactly why the buyer's test for an "AI-native" platform is the same export-and-mapping test as the rest of the survey, not a separate one. The platform with the loudest AI story is not the one to trust; the one whose grounding graph you can carry out the door and check is.

It is worth naming the categories plainly, because a reader arriving from the lifecycle parts has met these tools as verbs ("we reason over the graph", "we validate it") without meeting the products that are them.

A triplestore is a database whose unit of storage is the RDF triple and whose query language is SPARQL (Stardog, Ontotext GraphDB above; Apache Jena and RDF4J in open source).
A property-graph database stores nodes-and-edges-with-attributes instead and queries them in Cypher or Gremlin (Neo4j), which is why it needs the bridge above to reach RDF.
A reasoner is the engine that derives the entailed facts a graph only implies — the ones that must be true given what is stated, such as transitive lineage (if A came from B and B from C, then A traces to C) and equipment-is-material typing (concluding an item's category from the rules it satisfies).
A SHACL engine (SHACL is the W3C Shapes Constraint Language) is the one that checks a graph against shape constraints — declared rules a graph must satisfy, such as a release-completeness gate (every released lot must carry its required records) or a disjointness gate (no item is wrongly typed as two mutually exclusive things at once).
An ontology editor such as Protégé is where the vocabulary itself is authored and its OWL axioms are sanity-checked before any data is loaded.
Persistent-identifier (PID) services — DOIs, w3id.org redirects — are the registries that keep a published term's IRI (its Internationalized Resource Identifier, the web address that names it) resolvable — still leading somewhere when looked up — for the long haul, the FAIR concern publication returns to.

Set against that vendor landscape, what this book's own running example actually needs is deliberately modest: an in-process triplestore (rdflib), an OWL-RL reasoner (owlrl — OWL-RL being a lightweight, rule-based reasoning profile of OWL) for the transitive-lineage and equipment-typing inferences, and a SHACL engine (pyShACL) for the release and disjointness gates — all free, all offline, no platform required. That is the point of the non-functional offline-validatable requirement set in the specification: the campaign graph that carries WCB-CHO-001 through to DS-001 fits inside the small open stack, and the commercial tiers above are what you reach for only when the same model has to scale across an enterprise — the subject of the next chapter.

One batch, four boxes: where our running example actually lives

It helps to trace the book's own example through the landscape, because it shows the gap concretely rather than in the abstract.

Picture the artifacts the earlier chapters modeled — the working cell bank WCB-CHO-001 (a frozen, qualified stock of the CHO, or Chinese-hamster-ovary, cells that make the antibody), the production run BATCH-2026-001, the drug substance DS-001 — and ask, for each, which of these four boxes holds the real data in a working plant. (The recurring bp: prefix on identifiers such as bp:WCB-CHO-001 is just this book's namespace tag, marking a name as belonging to our example ontology.) The answer is rarely "one box," and that is the whole difficulty in miniature.

The cell-bank and assay records originate in the lab layer, where they may genuinely be ontology-aligned at capture — a Benchling registry entry tied to a CENtree term, an instrument result normalized to ASM.

Here the semantics are formal, and bp:WCB-CHO-001 could plausibly carry an honest type from the moment it is recorded. This is the one box where the book's idealized model and the commercial reality come closest to matching.

The production step lives in the MES, where BATCH-2026-001's recipe, parameters, and equipment usage are encoded in a PAS-X or Opcenter MBR — rich, correct, and closed.

The time-series behind that step lives in the historian, contextualized by PI AF into equipment and event-frame structure. Both are real meaning, captured by mature systems doing their jobs well; neither is RDF, and so neither joins the lineage graph without a mapping.

The CPPs (Critical Process Parameters — the dials a plant must hold within range for the run to be valid) that defined the run — the production-phase culture temperature held near 36.5 degC and the relative feed rate around 0.40 vessel-volumes per day — live as recipe parameters in the MES and as tags in the historian. The release CQAs (Critical Quality Attributes — the measured properties a finished lot must meet to be released) that decided the lot — SEC (size-exclusion chromatography) monomer at 98.6 percent, CEX (cation-exchange chromatography) main charge variant near 70.7 percent — live in the LIMS (Laboratory Information Management System, the QC lab's results database). The single most valuable query a continued-process-verification program can ask is the one that joins them: did this temperature profile produce that monomer result? Today those two facts sit in two boxes that share no formal language, so the join is hand-stitched in a spreadsheet rather than answered by a reachability query over one graph — which is exactly the cross-box question competency questions as queries can pose once the mapping exists. The catch is that the spreadsheet is the regulatory weak point: a hand-stitched join carries no native audit trail, so it satisfies ALCOA+ (the data-integrity expectation that records be attributable, legible, contemporaneous, original, and accurate — plus complete, consistent, enduring, and available) only by manual discipline, where a mapped graph inherits the contemporaneous, attributable record its sources already keep. If that join graph were ever to drive a GMP decision rather than a CPV summary it would itself become a regulated computerized system — validated under the risk-based CSV-to-CSA discipline and bound by 21 CFR Part 11 and EU GMP Annex 11 for its audit trail, version-pinned change control, and signer attribution — which is one more reason the production-floor systems hold their meaning in validated proprietary form rather than emitting it as a freely re-reasoned graph.

The release and lineage view — the thing this book most wants, the graph in which bp:DS-001 derives transitively from bp:WCB-CHO-001 — exists, today, mostly in the R&D-flavored enterprise-graph tier, if it exists in graph form at all.

The conclusion writes itself. The data is all present; it is just scattered across boxes that do not speak the same formal language, each fluent in its own dialect and deaf to the others. The only thing standing between them is a translation layer — which is precisely the seam the next section is about.

The bridge nobody advertises: getting structured data into RDF

How does a semantic layer sit over data that lives in relational databases and historians without anyone re-keying it? The answer is mapping, and it is the most under-discussed seam in the whole stack — under-discussed precisely because it is unglamorous and because admitting it exists means admitting the graph is not the system of record.

R2RML — the 2012 W3C RDB-to-RDF Mapping Language — declares in machine-readable rules how relational tables become RDF triples. RML (originated at IDLab/Ghent, now under W3C community standardization) generalizes R2RML beyond SQL to any source — CSV, JSON, XML — via rml:logicalSource and rml:referenceFormulation [10]. That generalization is not academic: the companion bridge is RML, not R2RML, precisely because a PI Web API feed arrives as CSV/JSON rows, not database tables. Those triples can be materialized into a store, or queried virtually so the data never moves — and the engines that do the virtual rewrite at query time (Ontop translating SPARQL into SQL against the source, the virtualization modes in Stardog and Ontotext) are the actual machinery behind every "no-ETL" claim earlier in this chapter. The same mapping that turns a LIMS table into a graph one day can be re-pointed at a new LIMS the next, which is exactly why it is the durable asset and the platform around it often is not.

The shape of it is always the same: a structured-data shop floor — PI, MES, LIMS — feeds a semantic layer through a mapping file, not a migration. The IDMP substance identifier a real plant puts behind our bp:DS-001 (modeled in regulatory semantics), the lot number in its MES, the time-series tag in its historian — all of them reach the graph through exactly this seam, or they do not reach it at all.

This book does not leave that mapping abstract. The companion suite ships exactly such a bridge and runs it: historian-map.rml.ttl is an RML rr:TriplesMap (a TriplesMap being one rule that says how each source row turns into triples) that turns one historian row — ts, tag, value, unit, quality, batch_id — into a W3C SOSA observation. SOSA (Sensor, Observation, Sample, and Actuator) is a small standard W3C vocabulary for recording that a sensor measured a property of something at a time, so each historian reading becomes a self-describing fact. historian_to_rdf.py executes that mapping in process on the same row shape a PI Web API connector produces; in the Turtle below, each indented line is one triple stating a property of the observation:

hist:obs/BR101.Temp.PV/2026-03-02T08:00:10Z
    a sosa:Observation ;
    sosa:observedProperty hist:tag/BR101.Temp.PV ;
    sosa:hasSimpleResult  "36.51"^^xsd:float ;
    sosa:resultTime       "2026-03-02T08:00:10Z"^^xsd:dateTime ;
    qudt:ucumCode         "Cel" ;          # the unit travels with the value, never bare
    bp:fromBatch          bp:BATCH-2026-001 .

bp:BATCH-2026-001 a bp:Batch ; bp:hasTrace hist:tag/BR101.Temp.PV .

In the excerpt, "36.51"^^xsd:float is a typed literal — the value 36.51 with a ^^xsd: tag telling a machine to read it as a floating-point number rather than text, and ^^xsd:dateTime does the same for the timestamp; declaring the type is what makes the stored value machine-checkable data rather than a bare string.

The modeling decision in that last line is the whole answer to the obvious objection — you cannot put millions of points in a graph. You do not. The graph holds one bp:hasTrace index edge per batch-and-tag pair, pointing at the historian by IRI; the dense stream stays in PI. The mapping makes the graph an index over the historian, not a copy of it. That is what a governed bridge looks like, and it fits in thirty lines of Turtle (bp:hasTrace/bp:fromBatch are first introduced in instances and the graph).

The path runs one layer deeper than the historian. OPC UA (Open Platform Communications Unified Architecture) is the standard industrial protocol by which plant equipment exposes its live data; Data Access is its service for reading a current value. opcua_to_rdf.py maps an OPC UA Data Access read — the NodeId ns=2;s=BR101.Temp.PV (the address of one data point: namespace 2, string identifier BR101.Temp.PV), its DataValue (the value plus quality and timestamp), and its EUInformation unit (the OPC UA structure carrying the engineering unit) — into the same sosa:Observation shape. The OPC UA NodeId's string identifier is the historian tag, and the EUInformation unit is the same UCUM code Cel the historian column carries — UCUM (Unified Code for Units of Measure) and QUDT being the standard vocabularies that pin down units so Cel unambiguously means degrees Celsius, so both routes mint the identical observation IRI and the identical unit — the UCUM/QUDT discipline of identifiers and units. The result is one continuous, mapped path from the wire to the graph rather than four disconnected models — the runnable counterpart of the bridge the from the wire to the graph chapter builds in full.

The mapping file is therefore the true measure of how open a "knowledge graph" really is. If the vendor will hand you the R2RML or RML and let you point it at your own store, you own the bridge and can rebuild the graph anywhere. If the mapping is internal and undocumented, the graph is theirs and you are renting access to your own data through their pipe. This is the most decision-relevant question in the whole survey, and it is almost never on the slide, because the answer that protects the buyer is the one that least flatters the vendor.

The four questions to ask before you sign

Ask	A model you can own	A model you will rent
Export. Can I export the controlled terms and the schema as OWL/RDF?	Yes — published OWL/SKOS (SKOS is the W3C standard for simple term lists and taxonomies, a taxonomy being terms arranged broader-to-narrower, like a classification tree), downloadable	"It's in our platform"
Mapping. Will you hand me the R2RML/RML and let me point it at my store?	Yes — the mapping is a deliverable	The mapping is internal / a trade secret
Artifact. Is the "graph" RDF/OWL, or a property graph / object model?	RDF triplestore, alignable upstream	Property graph or closed object model; needs its own bridge
Evidence. Is the named adopter a GMP-manufacturing case, or an R&D one?	A GMP deployment, independently published	An R&D case study standing in for manufacturing

The further right a product sits in that table, the larger the mapping bill the next chapter describes.

The unsolved part: a graph on a slide is not a governed ontology

The word knowledge graph on a vendor slide is not evidence of a formal, governed ontology. Much of what ships is a proprietary object model with a graph veneer, and the loudest adoption numbers — "more than half of the top 50," "leading top 10 pharma," "90% of R&D data" — are self-reported and unaudited.

None of these claims could be verified against independent evidence in the surveyed public material, and several named pharma adopters appear only in vendor case studies or conference programs rather than in independently published deployments. That does not make them false; it makes them unconfirmed, and the honest posture toward an unconfirmed vendor figure is to repeat it with its source attached and lean on it for nothing.

The genuinely semantic layer is real in R&D informatics and in catalog and terminology tooling. As formal OWL/RDF, it is, on the available public evidence, absent from the GMP execution floor — exactly where this book's manufacturing batch lives.

That gap is not a failure of any one vendor; it is the unsolved state of the market. A plant should plan around it rather than assume a slide has closed it — which in practice means budgeting for the mapping layer, demanding the mappings as deliverables, and treating any unverified "knowledge graph" on the manufacturing floor as a goal to be engineered rather than a feature already bought.

Why it matters

A plant choosing tools has to read past the branding and ask one concrete question: does this product hand me a governed, importable ontology, or a closed model I will map out again later? The answer decides whether the digital thread that links WCB-CHO-001 through BATCH-2026-001 to DS-001 can be reasoned over as one graph, or whether it stays trapped in a dozen proprietary schemas, each correct in isolation and mute across the boundary.

The R2RML/RML mapping layer is precisely what sets the cost of that bridge — cheap if the data was modeled with export in mind, brutal if it was not. A vendor who treats the mapping as a deliverable is selling you a model you can leave with; a vendor who treats it as a trade secret is selling you a model you can only stay inside.

Buying for semantics, then, means buying for the exit, not just the demo. The question is never how good the graph looks on stage, but whether you can carry your part of it out the door — and that is a question the next chapter shows is hard even for a single company that owns every system involved.

Key terms

OWL/RDF ontology — a model with explicit, machine-checkable semantics expressed in open standards (OWL, the Web Ontology Language; RDF, the Resource Description Framework, which records facts as subject–predicate–object triples), which can be exported, aligned with other ontologies, and reasoned over.
Proprietary structured data model — a vendor's internal schema (recipes, asset templates, object types) that carries real meaning but is closed and not directly importable as a formal ontology.
Intermediate Data Schema (IDS) — TetraScience's open schema for normalized instrument data, mapped onward to the Allotrope Simple Model.
PI Asset Framework (AF) — AVEVA's object/template plus event-frame layer that contextualizes plant time-series; widely deployed in regulated pharma but not a semantic-web ontology.
Event Frame — a PI AF construct representing a time-bounded event such as a batch, shift, or downtime window; in a batch context its hierarchy mirrors the ISA-88 procedural model (batch then unit procedure then operation then phase), the same structure a PAS-X Master Batch Record encodes, which is why the historian and MES tiers describe the run in compatible terms even though neither emits it as RDF.
Master Batch Record (MBR) — the recipe model in MES platforms such as PAS-X that encodes batch-procedure semantics as a proprietary structured model.
Digital twin of the organization — Palantir's framing for an object model that binds data and logic to real-world entities; powerful, but an object model rather than a published OWL/RDF artifact.
R2RML / RML — the W3C RDB-to-RDF Mapping Language and its extension, which declare how relational tables become RDF triples, materialized or queried virtually.
GraphRAG — graph-native retrieval-augmented generation: an AI model answers by retrieving connected, verified facts from a knowledge graph rather than from training memory, so the grounding is only as exportable and auditable as the graph beneath it — a closed object model can serve a model but not let a buyer check what it grounds on.
Validation paradox / grouped cross-validation — a reasoned, SHACL-shape-validated graph is a stronger ground truth than a fluent model that contradicts it; and its bp:derivedFrom lineage edges supply the grouping key for leave-one-batch-out cross-validation, holding out every row sharing an ancestor so a sibling sample cannot leak across the split and inflate the score.

Where this leads

The vendors sell the parts; the next chapter asks what happens when a single large pharmaceutical company tries to assemble them into one coherent fabric. Enterprise Knowledge Graphs at Big Pharma follows the semantic layer out of the demo and into the messy reality of an enterprise with hundreds of systems, decades of legacy data, and a governance problem no mapping file alone can solve.

What this chapter covers​

Lab informatics: where the semantics are real​

Manufacturing and historians: structured models, not ontologies​

The "digital twin of the organization" and the graph databases​

What the AI agents stand on: a buyer's test for the "AI-native" claim​

One batch, four boxes: where our running example actually lives​

The bridge nobody advertises: getting structured data into RDF​

The four questions to ask before you sign​

The unsolved part: a graph on a slide is not a governed ontology​

Why it matters​

Key terms​

Where this leads​