Skip to main content

The Shop Floor and the Digital Twin: Where Ontologies Are Still Arriving

📍 Where we are: Part VII · Ontologies in Industry Today — Chapter 29. The previous chapters found the laboratory and the regulatory boundary semantically mature. Now we walk onto the production floor, where the model this book built meets the place it has reached last.

The lab has Allotrope and SiLA; the regulatory boundary has IDMP and structured submissions. Both are places where formal semantics already do real work. The GMP production floor — the bioreactors, the chromatography skids, the manufacturing execution system stitching them together — is different. It is the most instrumented part of the plant and the least ontologized. The data is enormous and the stakes are highest, yet the working "semantics" of the floor today are not ontologies at all.

This chapter is honest about that frontier. Two kinds of model genuinely run in production there, and neither is a formal ontology: structured information models that move shop-floor data between systems, and statistical models that decide whether a batch looks like the good ones. The formal-ontology layer — the kind this book has spent twenty-eight chapters building — is arriving, but it arrives as proofs of concept, consortium pilots, and academic releases. Telling those two worlds apart is the whole point of the chapter.

The simple version

Picture a busy kitchen that runs flawlessly. Orders move on printed tickets in a fixed format every station understands, and the head chef judges each dish against a photo of the perfect plate. It works — but nobody has written down what a "dish" actually is, how it relates to its ingredients, or why this sauce counts as the same recipe as last week's. The tickets and the reference photo are not a cookbook you can reason over. The floor of a biologics plant is that kitchen: it has tickets and reference photos in abundance, and it is only now starting to write the cookbook.

What this chapter covers

We separate three things that are easy to blur. First, structured information models — the ISA-95/IEC 62264 hierarchy and its B2MML XML serialization, the manufacturing execution system (MES) master batch record, and the asset-framework contextualization layer — which run in production but are not formal OWL ontologies. Second, statistical models — the multivariate Continued Process Verification (CPV) approach built on a "golden batch," which is mathematics, not semantics. Third, the genuinely ontological work: the interoperability pilots (Asset Administration Shell, SiLA 2, OPC UA LADS, SOSA/SSN), the bioprocess ontologies themselves (BioPhorum, MCBO, the IOF Biopharma reference ontologies), and the digital twins that some hope will eventually stand on them. Throughout, every adoption claim carries a maturity tag, because on this floor the gap between shipped and proposed is the most important fact there is.

A three-column maturity ledger of shop-floor technologies. The green Production column lists ISA-95 / B2MML, MES master batch record, CPV / golden batch, and SOSA / SSN, with the verdict that it moves and measures data, not meaning. The amber Piloted column lists Asset Administration Shell, SiLA 2 / AnIML, OPC UA LADS, BioPhorum ontology, and Digital twins, reaching for the floor but not yet shipped. The violet Academic / Proposed column lists MCBO, the NIST digital-twin framework, and IOF Biopharma ontologies, none with a named production home. A rose band below states the floor's meaning still lives in proprietary and statistical models while the formal layer is the missing one. The GMP shop floor sorted by maturity: structured and statistical models run in production today, while formal bioprocess ontologies and ontology-grounded digital twins are still pilots and academic work. Original diagram by the authors, created with AI assistance.

What actually runs: structured and statistical models

Start with what is unambiguously in production — and notice that none of it is an ontology in this book's sense.

The connective tissue of the floor is ISA-95 (the ISA-95/IEC 62264 standard for enterprise-control integration) and its XML serialization, B2MML (Business To Manufacturing Markup Language). Together they define the vocabulary and message shapes by which an ERP system and an MES exchange Level 3-to-Level 4 information — what to make, in what quantity, against which order. The major MES platforms a biologics plant might run — Körber PAS-X, Siemens Opcenter, Rockwell FactoryTalk PharmaSuite, AVEVA, Dassault Apriso, SAP Digital Manufacturing — all build on this layer. The documented Genentech case integrated SAP IDocs to B2MML, and the pattern became common enough that SAP later shipped B2MML content packages [1]. This is a semantic information model in production — but it is a structured XML schema, not a formal OWL ontology you can run a reasoner over. (production)

Alongside it sits the MES master batch record (MBR) — the electronic recipe and the as-executed batch record it generates. This is where our running example's bp:BATCH-2026-001 actually lives on a real floor: not as an RDF node with a derivedFrom edge to bp:SEED-001, but as a row in a proprietary MES database, the genealogy encoded in the vendor's own model. The same is true all the way out to the released bp:DS-001 and bp:DP-001 lots: the relationships are real; the formalism is not portable.

The second production-grade model is statistical. Continued Process Verification — the FDA's Stage 3 of process validation, the ongoing assurance that a process stays in its validated state — is implemented today as multivariate statistics over a golden batch (a reference profile distilled from historically good runs). The deployed mathematics is principal component analysis and partial least squares (PCA/PLS), with Hotelling's T-squared and Q-residual statistics flagging when a live batch drifts off the manifold of normal operation. It is packaged in commercial tooling — MilliporeSigma's Bio4C ProcessPad, Sartorius/Umetrics SIMCA — and an ISPE case applied it to a 5,000-L cell-culture batch [11]. It answers "is this batch like the good ones?" with real rigor. But it is a statistical data model, not an ontology: it computes distances; it does not represent meaning. (production)

LayerWhat it isFormal ontology?Maturity
ISA-95 / B2MMLXML information model for ERP↔MES exchangeNo (structured XML)(production)
MES master batch recordProprietary electronic recipe + genealogyNo (vendor model)(production)
CPV / golden batchMultivariate statistics (PCA/PLS)No (statistical)(production)
AAS, SiLA 2, OPC UA LADSInteroperability / twin metamodelsPartly / mappable(piloted)
SOSA/SSNSensor telemetry ontologyYes (W3C/OGC)(production standard)
MCBO, NIST DT, IOF BiopharmaFormal bioprocess ontologiesYes (BFO/IOF)(academic / proposed)

The interoperability pilots: metamodels reaching for the floor

One layer above the production floor, a set of standards is being piloted that would make floor data portable enough for ontologies to grab onto.

The Asset Administration Shell (AAS) is the Industrie 4.0 digital-twin metamodel — a standardized container describing an asset's properties and capabilities, mappable to OPC UA. ISPE's Pharma 4.0 Plug & Produce subcommittee published a proof of concept integrating qualified lab devices with the AAS as a digital twin (2023) — concept-paper stage, not a production deployment [2]. (piloted)

SiLA 2 (Standardization in Lab Automation, version 2) is an open, royalty-free standard for lab-device connectivity, paired with AnIML for analytical data. It is in early, piloted deployment: an open-source SiLA 2 connector for Tecan's FluentControl was released in 2024 with support from Tecan, UniteLabs, and Roche, and Roche's "AC/DC" concept uses SiLA 2 drivers in its Basel R&D setting [3]. A further claim that this connector passed Site Acceptance Testing and expanded across Roche was not found in public evidence, so we leave it aside. (piloted)

OPC UA LADS (OPC 30500, the Laboratory and Analytical Device Standard) — the first OPC UA companion specification for lab and analytical devices, released around December 2023 — was demonstrated end-to-end at a 2025 hackathon alongside the Allotrope Foundation Ontology and ASM, uniting the communication standard (live data on the wire) with the data standard (documented, structured results). The published artifacts are explicitly proof-of-concept simulators, not shipped products [4]. (piloted)

Finally, SOSA/SSN — the W3C/OGC Semantic Sensor Network ontology — is the standard, vendor-neutral way to model sensor, PAT, and bioreactor telemetry semantically [5]. It is the formal-ontology counterpart to the proprietary contextualization an asset framework such as AVEVA PI Asset Framework provides — and the natural home for the probe readings this book's production-bioreactor chapter modeled. SOSA/SSN is a mature, production-grade standard; uptake on the pharma floor specifically remains light. (production as a standard; light pharma-floor uptake)

The bioprocess ontologies themselves

Now the formal ontologies — the ones that would actually represent the floor's meaning. They exist, they are good work, and not one of them has a named production deployment in biomanufacturing.

The BioPhorum biomanufacturing ontology ("Big Data to Smart Data," November 2023) is a system-independent ontology for biomanufacturing process data. It was validated at NC State's BTEC, where it reportedly cut an OD-probe calibration workflow across five bioreactors from four hours to thirty minutes [6]. That figure is a single consortium proof-of-concept datapoint — read it as one validation case, not a representative metric. (piloted)

Two academic ontologies apply this book's own foundations directly. NIST's "Towards Ontologizing a Digital Twin Framework for Manufacturing" (2023) uses BFO and IOF Core to formalize the ISO 23247 digital-twin framework, with a bioreactor as its worked biomanufacturing example [8]. And MCBO (the Mammalian Cell Bioprocessing Ontology) is a BFO + IOF Core hub-and-spoke ontology posted to bioRxiv in early 2026 — not yet peer-reviewed — validated on 723 curated cell-culture instances against SPARQL competency questions and MIT-licensed for behind-firewall use, with no named industrial deployment [9]. These are exactly the kind of model our bp: graph is a small cousin of: BFO-grounded, IOF-aligned, queryable. (academic)

The closest thing to a coordinated reference effort is the OAGi/NIIMBL IOF Biopharma ontologies — BFO- and IOF-Core-aligned reference ontologies drawing on ISA-88 and ISA-95. A public release appeared in December 2024, with a larger formal release announced for late 2025 [10]. A claim that the Pistoia Alliance's CMC Process Ontology imports these is not corroborated by Pistoia's own release, which cites only ISA-88/95 — so we do not assert the link. (proposed)

The digital twins on top

The digital twins that capture headlines are real and impressive — and they are built on hybrid mechanistic and machine-learning models, not on formal ontologies. Samsung Biologics' tri-modal bioreactor twin combines computational fluid dynamics, first-principles kinetics, and multivariate/ML modeling; GSK's closed-loop vaccine twin (built with Siemens and Atos from 2019) closes a control loop around a single process [7]. Both are scoped to a single unit operation, and the "closed-loop" characterization of the GSK case rests on 2019-era vendor PR, so read it as a vendor claim. The math is mature; the semantics underneath — the part that would let one twin's model mean the same thing as another's — is the missing layer. (piloted)

The unsolved part: the floor's meaning still lives in proprietary and statistical models

The honest summary is uncomfortable. On the GMP shop floor, the working semantics are structured (B2MML, the MES batch record, the asset framework) and statistical (the golden-batch CPV model). The formal-ontology layer is a frontier of proofs of concept and academic ontologies with no named production home. Two specific gaps echo verdicts this book reached earlier. First, continuous-processing individuation: when product flows continuously rather than in discrete lots, the clean batch and unit-operation boundaries our bp:BATCH-2026-001 relied on dissolve, and no settled ontology yet exists for time-bounded lots in a continuous line. Second, the data-standardization bottleneck — across the digital-twin literature, FAIR and semantic data standardization, not the modeling and not the math, is repeatedly named as the chief obstacle to scaling twins and manufacturing AI. That "modeling is solved, the data is not" framing leans heavily on a single review; treat it as a defensible opinion, not an established fact.

Why it matters

This is where the book's model meets its hardest reality. Our bp: graph treated a unit operation as a clean process node with a derivedFrom edge and a CQA attached. The real floor still encodes that same relationship in a proprietary master batch record or a statistical golden-batch model — both of which work, and neither of which is a governed, validated ontology you can reason over or share. Turning the floor's meaning into a formal model is unfinished, and the reason is not a shortage of cleverness. It is a shortage of discipline: shared identifiers, FAIR data, validated vocabularies. That is precisely why the closing chapters of this book are about discipline, not technology.

In the real world

What is actually shipping on the floor — versus what is piloted — sorts cleanly, and the evidence is worth gathering in one place rather than leaving scattered through the survey. In production: the ISA-95/B2MML information model moves work between ERP and MES, concrete enough that the Genentech SAP-IDoc-to-B2MML integration prompted SAP to ship B2MML content packages [1]; the MES master batch record holds the genealogy this book models as bp:BATCH-2026-001 derivedFrom bp:SEED-001, on out to the released bp:DS-001 and bp:DP-001 lots, but in a vendor-proprietary schema rather than a portable graph; and multivariate CPV over a golden batch runs in commercial tooling, with an ISPE case applying it to a 5,000-L cell-culture batch [11]. Piloted: the BioPhorum biomanufacturing ontology was validated at NC State's BTEC as a single OD-probe-calibration proof of concept [6], and the Samsung Biologics and GSK digital twins are real but each scoped to one unit operation and built on hybrid mechanistic and ML models, not formal ontologies [7]. Academic or proposed: MCBO, the NIST digital-twin framework, and the OAGi/NIIMBL IOF Biopharma reference ontologies — BFO- and IOF-grounded, exactly the family this book's bp: graph belongs to — carry no named production deployment yet [8][9][10]. The pattern is consistent: structured and statistical models ship; the formal, shareable semantic layer is the one still arriving.

Key terms

  • ISA-95 / B2MML — the IEC 62264 standard for enterprise-to-control integration and its XML serialization; a structured information model, not a formal ontology, that moves data between ERP and MES.
  • Master batch record (MBR) — the electronic recipe an MES executes and the as-executed record it produces, holding batch genealogy in a vendor-proprietary model.
  • Continued Process Verification (CPV) — FDA Stage 3 of process validation; on real floors, a multivariate statistical model (PCA/PLS over a golden batch), not a semantic one.
  • Golden batch — a reference profile distilled from historically good runs, against which live batches are compared by statistical distance.
  • Asset Administration Shell (AAS) — the Industrie 4.0 standardized digital-twin metamodel for an asset's properties and capabilities, piloted for pharma via ISPE.
  • SiLA 2 / AnIML — open, royalty-free standards for lab-device connectivity and analytical data, in early piloted deployment.
  • OPC UA LADS — OPC 30500, the first OPC UA companion specification for laboratory and analytical devices, demonstrated as proof of concept.
  • SOSA/SSN — the W3C/OGC Semantic Sensor Network ontology; the vendor-neutral, formal way to model sensor and PAT telemetry, counterpart to a proprietary asset framework.
  • Digital twin — a live computational model of a physical asset; on the biomanufacturing floor today, built on hybrid mechanistic and ML models rather than formal ontologies.
  • MCBO — the Mammalian Cell Bioprocessing Ontology, a BFO + IOF Core academic ontology validated on curated cell-culture instances, with no named industrial deployment.

Where this leads

The floor showed us a frontier where the math is mature and the meaning is still arriving — and it named the missing layer plainly: governed, shared, semantic data. The final chapter asks what becomes possible when that layer finally exists. The next chapter, The Frontier: Ontologies as the Ground Truth for AI, turns from where ontologies are arriving to why their arrival matters most: as the verifiable ground truth that keeps machine learning and large language models honest about a process where being wrong is not an option.