Why Numbers Don't Connect: The Semantic Interoperability Problem

📍 Where we are: Part III gave the data an organizational backbone — governance, quality, master data; Part IV now asks why, even with perfect plumbing and clean records, numbers from different systems still refuse to line up, and what it takes to make meaning travel.

In the last chapter we built the human and policy scaffolding around data — data governance (who decides the rules), the dimensions of data quality, and master data management (one trusted version of the things the business cares about, like products and materials). Governance can make every system's records clean, owned, and well described, and still leave a maddening problem untouched: when you finally pull two clean datasets side by side, the numbers do not connect. The pipes are perfect. The pump moves water flawlessly. And yet what comes out the other end does not add up.

This chapter is about that gap. It has a name — the semantic interoperability problem — and it is one of the single biggest reasons that, after decades of digitization, biomanufacturers still cannot easily combine their own data. The FAIR principles (Findable, Accessible, Interoperable, Reusable) for scientific data management were formulated in direct response to exactly this difficulty [1].

The simple version

Imagine three people describing the same event. One says "the meeting is at noon," another writes "12:00," a third notes "12h00 GMT." A human shrugs and understands all three. A computer does not. To a machine, "noon," "12:00," and "12h00 GMT" are three unrelated strings of characters — unless someone has told it, formally, that they mean the same instant in time. Semantic interoperability is the work of telling machines, once and for all, when two differently-spelled things are really the same thing.

What this chapter covers

We start by separating two ideas that sound alike — syntactic and semantic interoperability — with a worked example. Then we catalogue the sources of heterogeneity (the many ways the same fact gets recorded differently), show why the obvious fix of "just map them" collapses as systems multiply, and arrive at the first real remedy: controlled vocabularies, shared lists of agreed terms. We finish at the edge of the deeper solution — a shared model of meaning — which is the subject of the next chapter.

This problem is the data-management mirror of a physical reality. The bioreactor temperature this chapter keeps returning to is a real probe in a real vessel — the same vessel and the same physical measurement that the manufacturing book follows through production in the bioreactor. One physical fact; many recorded shadows. The whole burden of semantics is to teach machines that those shadows fall from a single object.

Two kinds of "talking": syntax versus meaning

Recall the distinction the connectivity chapter drew between format and meaning. Interoperability is the ability of separate systems to work together, and it comes in layers. A well-known model formalizes those layers further — the Levels of Conceptual Interoperability Model (LCIM) — naming them from no/technical interoperability up through syntactic (shared format) and semantic (shared meaning) [6]; later extensions add pragmatic, dynamic, and conceptual levels, giving the mature seven-level model.

Two of those layers matter here. Syntactic interoperability means systems agree on format: the file opens, the message parses, the fields line up. Semantic interoperability means they agree on meaning: both ends know that a value in one system and a value in another describe the same kind of real-world thing, on the same scale, under the same conditions [6]. Syntax moves the bytes. Semantics preserves the meaning. You can have the first completely and the second not at all.

Here is what that looks like in a real plant. Three systems — a process historian (a time-series database such as AVEVA PI System, formerly OSIsoft PI), a manufacturing execution system (MES) such as Emerson DeltaV MES (formerly Syncade), and a laboratory information management system (LIMS) such as LabWare — each record what is, physically, one measurement: the temperature inside bioreactor BR101. Each describes it differently:

Branch diagram: a single real-world fact, the temperature inside bioreactor BR101 is 37.0 degrees Celsius, fans out to three systems, a Historian time-series database, an MES batch record, and a LIMS sample log, each producing an irreconcilable record that disagrees on tag name, value, unit, and timestamp. One physical fact, three machine descriptions. Every system here is internally correct; none agrees with the others on name, number, unit, or time format. Original diagram by the authors, created with AI assistance.

Every box is valid. Every system is internally consistent and would pass its own data-quality checks. The three timestamps even denote the very same instant — epoch 1718271000 (seconds counted since 1970-01-01 UTC, the reference point computers measure time from), 05:30 EDT, and 09:30Z are one moment written three ways (and note the subtlety: because June 13 falls in daylight saving time, US Eastern is EDT at UTC−4, so the matching wall-clock value is 05:30 EDT; writing it 05:30 EST (the standard-time offset, UTC−5) would silently name an instant an hour later — 10:30Z instead of 09:30Z — exactly the hazard this chapter warns about). Yet a program asked to "find all temperature readings for batch BATCH-2026-001 in BR101" would find none of them automatically, because nothing tells it that TIC101.PV (the historian's instrument-tag style: the present value of temperature-indicating-controller loop 101), Temperature, and temp_reactor are the same property of the same vessel — or that 37.0, 98.6, and 310.15 are the same value in Celsius, Fahrenheit, and Kelvin, recorded at one and the same time. The data moved perfectly. The meaning did not.

To make this concrete: a query that genuinely retrieves all three readings would have to encode, by hand, every reconciliation the systems left implicit — something like this sketch (not any real query language — just the reconciliation written out):

MATCH  property IN { TIC101.PV, Temperature, temp_reactor }
  AND  vessel   IN { BR-101, EQ-00457 }   // these two strings are the same asset — private knowledge
NORMALIZE  unit  ->  degC      // 37.0 = (98.6 - 32) / 1.8 = 310.15 - 273.15
NORMALIZE  time  ->  UTC       // 1718271000 = 2024-06-13 05:30 EDT = 09:30Z

Every one of those clauses is private knowledge that a person had to supply. Multiply this by hundreds of tags and dozens of systems and you have the everyday reality of biomanufacturing analytics — which is precisely what a shared layer of meaning is meant to eliminate.

The many faces of heterogeneity

Six kinds of mismatch

That single example already hides several distinct problems. Semantic heterogeneity — the umbrella term for the gap, meaning that the same things are conceived and labelled differently across systems and people — comes in six identifiable flavours, and naming them is the first step to fixing them [5].

Naming. The same property wears different labels: TIC101.PV, Temperature, temp_reactor. Conversely — and worse — the same label can mean different things in two systems (one plant's "yield" is mass; another's is a percentage).
Units of measure. Celsius, Fahrenheit, Kelvin; grams per litre versus milligrams per litre — the same titer written 2 and 2000, a thousand-fold disagreement waiting to corrupt a calculation. A number without its unit is meaningless, and a number with an implied unit is a trap. This is why standards bodies define the units themselves formally — the ISO/IEC 80000 series fixes the names and symbols of quantities and units — and why machine-readable code systems such as UCUM (the Unified Code for Units of Measure, a unit-code string carried alongside values in lab and HL7 data) and unit ontologies such as QUDT (which the mapping record later in this chapter anchors to) let a data format carry the unit with the value rather than leaving it in someone's memory.
Identifier schemes. Is the vessel BR-101, BR101, Bioreactor-1, or asset tag EQ-00457? Two systems can hold rich data about the same physical bioreactor and never realize it, because their names for the thing never match.
Timestamps and time zones. Epoch seconds, local wall-clock time, and UTC with a Z suffix can all denote the same instant — or, read naively, three different instants. A missing time zone has silently corrupted more analyses than almost any other single defect.
Granularity. One system logs every second; another stores a one-minute average; a third keeps a single per-batch number. Combining them requires knowing what each row actually represents — and reconciling them is not a free conversion like degC→K but an explicit resampling or aggregation rule (a time-weighted average, say, versus last-observation-carried-forward) that is itself part of the transformation. Joining a 1 Hz historian stream to a per-batch LIMS result without stating that rule is its own silent-drift hazard: an unstated aggregation method changes the number without changing anything visible.
Implicit context. The deadliest category, because it is invisible. A column called temp might mean the setpoint or the measured value; a result might assume a sample was filtered first. The knowledge that disambiguates it lives in an engineer's head, not in the data.

The FAIR principles — a widely adopted framework for Findable, Accessible, Interoperable, Reusable data — directly address this. Their I, Interoperability, demands that data use shared, formally defined vocabularies and carry machine-actionable meaning, precisely so that the heterogeneity above can be resolved by software rather than by a human who happens to remember the context [1].

Clean data is not connected data

Clean data is not connected data. A dataset can score perfectly on the quality dimensions from the previous chapter — complete, accurate, consistent within itself — and still be semantically isolated. Quality is about being right; interoperability is about being combinable. They are different problems, and passing one does not pass the other.

Why "just map them" does not scale

The n-squared tangle

The intuitive fix is a translation table: write a converter that turns the Historian's vocabulary into the MES's, another for MES-to-LIMS, and so on. These are point-to-point mappings, and they fail for the same reason point-to-point connections failed in the connectivity chapter — they explode in number.

Two networks: on the left, every system connected to every other system by pairwise mappings forming a dense n-squared tangle; on the right, each system connected once to a central shared model, forming n clean spokes. Map each system once to a shared model, and the n² tangle of pairwise mappings collapses to one mapping per system (n). Original diagram by the authors, created with AI assistance.

With n systems, pairwise mapping needs on the order of n² converters; four systems need six, ten systems need forty-five. Each converter encodes one team's private understanding, and each breaks the moment a system is upgraded, a tag is renamed, or a unit convention shifts — which in a living plant is constantly. This is why so much analytics effort goes not into analysis but into wrangling: hand-reconciling names, units, identifiers, and timestamps before any real question can be asked. Mediating meaning through one shared reference model turns that n² tangle into n mappings — each system aligns once, to the shared model, and is then comparable with every other [4]. The economics are the same lesson as connectivity standards, applied one level deeper: not to the wire, but to the words.

Two networks of six systems. On the left, a Historian, MES, LIMS, ELN, SCADA, and ERP are joined by fifteen pairwise rose lines forming a dense tangle, labelled 6 systems equals 15 converters. On the right, the same six systems each connect by a single violet spoke to a central shared model labelled IOF Core, labelled 6 systems equals 6 mappings. Pairwise converters grow as n(n−1)/2; mapping each system once to a shared model grows as n. Six systems need fifteen converters point-to-point but only six mappings through a shared model. Original diagram by the authors, created with AI assistance.

This is not merely an integration headache; it is the place where the most expensive analytics failures hide. The pairwise tangle is also where meaning silently drifts: each converter encodes a private assumption that no one else can see, and when one of those assumptions quietly changes, nothing flags it — a hazard we return to at the end of this chapter.

A first remedy: reference data and controlled vocabularies

The first practical step toward shared meaning is modest and powerful: agree on the words. Reference data, introduced with master data in the previous chapter, is the set of standard, approved values a field is allowed to take, and a controlled vocabulary is an agreed, governed list of terms with their definitions — so that everyone writes degC, not also Celsius, °C, and centigrade. When a number must be reported in a unit drawn from one official list, two systems can finally be compared without a human in the loop.

International standards supply exactly such vocabularies. ISA-95, standardized as IEC 62264, defines a canonical set of object models and terms for enterprise-to-manufacturing data — what a production request, a material lot, or an equipment element formally is — so that interfaces between business and plant systems share one definition instead of negotiating their own [2]. In the analytical laboratory, the Allotrope Foundation — a consortium whose members include large pharmaceutical companies such as Merck, GSK, and Amgen — maintains shared vocabularies and ontologies (notably the Allotrope Foundation Ontologies, AFO) so that results, and the units, techniques, and identifiers attached to them, mean the same thing regardless of which vendor's instrument produced them, directly attacking the naming, units, and identifier heterogeneity above [8].

But a flat vocabulary — a simple list of approved terms — has a ceiling. A list can tell a machine that degC is an allowed unit; it cannot tell the machine that Celsius and Kelvin are both temperatures, related by a fixed formula, while grams-per-litre is something else entirely. A list of allowed equipment names cannot express that a bioreactor is a kind of vessel, which is part of an upstream suite, which participates in a fermentation. The relationships between terms — is a kind of, is part of, participates in — carry as much meaning as the terms themselves, and a flat list has nowhere to put them [5]. To capture relationships, you need not a list but a model.

The promise: a shared model of meaning

That model is an ontology: a formal, machine-readable specification of the concepts in a domain and the relationships among them — a shared map of what exists and how it connects [5]. In practice these models are written in the W3C's Web Ontology Language (OWL), built on RDF and serialized in formats such as RDF/XML or the more readable Turtle, so that the meaning is expressed in a standard a computer can load and reason over. Where a vocabulary lists words, an ontology states facts a computer can reason over: a bioreactor is a kind of equipment; a temperature reading is a measurement of a temperature; Celsius and Kelvin measure the same quantity. Once those facts are written down formally, software can do what the human did effortlessly at the start of this chapter — recognize that three different descriptions point to one reality — without a human in the loop.

Standards and governance: BFO, IOF, Allotrope, FAIR

To keep separate ontologies mutually compatible, the field anchors them to a shared top-level (or upper) ontology: a small, domain-neutral set of the most general categories — object, process, quality, role — that everything else inherits from. The international standard for one such foundation is the Basic Formal Ontology (BFO), standardized as ISO/IEC 21838-2, explicitly designed to support data exchange and integration across heterogeneous systems [7]. Building manufacturing ontologies on this common footing is the mission of the Industrial Ontologies Foundry (IOF), a community that articulated the semantic-interoperability problem for manufacturing and adopted shared principles for building reference ontologies that interoperate by design rather than by accident [3]. The IOF Core Ontology (the IOF Core) is a concrete, BFO-grounded mid-level model designed to harmonize heterogeneous manufacturing data — a path toward the single shared reference model that could turn the n² tangle into n mappings [4].

Why it matters

Regulatory traceability and the cost of wrangling

For data management, this chapter draws a hard line that is easy to miss. Every investment in the previous parts of this book — instruments, historians, integration standards, governance, integrity controls — can be flawless and still leave you unable to answer a simple cross-system question, because the data describes the same world in incompatible terms. Connectivity standards solved can the bytes arrive. Governance solved are the records trustworthy and owned. Neither solves do two numbers mean the same thing — and without that, every analytics project pays the wrangling tax again from scratch, and every combined dataset risks silently averaging Fahrenheit with Celsius. Semantics is not a finishing polish applied at the end; it is the layer that decides whether all the rest of your data can ever be used together.

The stakes are also regulatory. A batch record must be complete and traceable — in the United States, 21 CFR 211.192 requires that production and control records be reviewed and that any discrepancy be fully investigated — yet a record whose meaning is scattered across systems that disagree on names, units, and timestamps cannot easily be reconstructed end to end. The records themselves come from the plant information systems — historian, MES, LIMS — whose disagreement this chapter dissects, and the regulatory weight of getting that reconstruction right is the subject of the manufacturing book's chapter on quality, the regulatory framework, and the role of data. Technology transfer compounds the risk: when a method moves between sites, ICH Q2(R2) on analytical procedure validation [10] and ICH Q14 on analytical procedure development [11] — both adopted at Step 4 on 1 November 2023 — assume that a specification stated at one site means the same thing at the next, units included. Semantic heterogeneity quietly undermines exactly that assumption.

The transfer case deserves a closer look, because it is where semantics and operational validation collide. A scale-up or tech-transfer moves a process from a development skid to a 2,000-litre production bioreactor, and that move is signed off only after IQ/OQ/PQ (Installation, Operational, and Performance Qualification — the staged proof that equipment was installed right, operates right, and performs right for its real workload, built out in validating computerized systems) and cleaning validation establish the new line as equivalent. But qualification proves the equipment and cleaning are sound; it does not prove the two sites describe a setpoint the same way. A feed rate transferred as "0.40" can be a relative vessel-volume-per-day fraction at the sending site and an absolute litres-per-hour value at the receiving one — both internally valid, neither flagged, and the silent unit mismatch is precisely the kind cleaning-validation and PQ runs are not designed to catch. The discipline that does address it has a name and a recent shift: computerized system validation (CSV) historically tested and documented every field uniformly, while the FDA's risk-based Computer Software Assurance (CSA) approach concentrates rigorous, scripted proof on the functions that actually bear on patient safety — the unit on a critical-process-parameter tag, not the colour of a trend line — and the contrast is the whole subject of validating computerized systems: GAMP 5 and the move to CSA. Underneath both, the integrity of the transferred record is judged against the ALCOA+ attributes (the regulators' data-integrity checklist — that data be attributable, legible, contemporaneous, original, and accurate, plus complete, consistent, enduring, and available) and made legally binding by 21 CFR Part 11 (US) and EU GMP Annex 11 (EU), the electronic-records-and-signatures rules. The uncomfortable truth this chapter keeps returning to is that a value can satisfy every ALCOA+ attribute within its own system and still be semantically wrong once pooled across sites — accuracy is judged inside one system's frame, and semantics is the cross-system frame those rules were never written to police.

Anatomy of a semantic-mapping record

Everything above converges on one concrete artifact: a single governed row in a tag dictionary that resolves the heterogeneity for one measurement, once, for everyone. It is the data object that turns "three irreconcilable records" into "one comparable quantity." Dissecting it shows exactly which private knowledge has to be written down — and where it then travels.

One row of a governed tag dictionary: it names the canonical concept, records how every system spells it, states the rules that reconcile units and time, and points to where the meaning is used downstream. Original diagram by the authors, created with AI assistance.

Read top to bottom, the record is the whole argument of this chapter made operational. The canonical_tag_name is the one agreed name — the very thing a flat list of allowed terms gives you. The isa95_position places it in the IEC 62264 equipment hierarchy so a machine knows which vessel and what kind of measurement, not just a string [2]. The per-system block is where heterogeneity lives in the open: the Historian's TIC101.PV in degC, the MES's Temperature in degF, the LIMS's temp_reactor in K, each with its own timestamp convention. The transformation_rule is the formerly-private knowledge — the unit algebra and the time-zone normalization — written down so software can apply it without a human. The vocabulary_source records which standards the terms and units are anchored to, and the governance_owner and last_verified_date are the governance hooks from the previous chapter: a mapping with no owner and no review date is exactly the silent liability the next section describes.

This is also the anchor of a cross-book thread. The canonical name here is a data-management abstraction; in the open-source implementation it becomes a concrete tag in a Unified Namespace — a single real-time hierarchy that serves as the plant's one source of truth for current state — structured as asset.measurement.role, and stored as a literal row in a tag-dictionary table — the subject of the open-source book's chapter on naming and the Unified Namespace. From there the same fact is restated as machine-reasonable RDF triples in the semantics and knowledge-graph chapter, which shows how the flat row becomes assertions a reasoner can traverse. Physical artifact, data record, code row: the temperature probe in the bioreactor becomes this dictionary row becomes a triple in a graph.

The mapping record as triples, and the gate that guards it

That "becomes a triple in a graph" is worth making literal, because it is where this chapter's first remedy meets the formal machinery the ontology book is built on. Written as RDF (the Resource Description Framework — the standard way to record data as subject-predicate-object triples, the atomic facts of a graph), the canonical tag and its readings stop being table cells and become statements a machine can reason over (@prefix lines simply alias a long namespace to a short one):

# The canonical concept and its per-system aliases as RDF triples.
@prefix tag: <https://example.org/tag#> .
@prefix qudt: <http://qudt.org/schema/qudt/> .

tag:bioreactor.BR101.temperature.measured
    tag:isa95Position   "Site/Area/BR101/Temperature" ;
    tag:unit            qudt:DegreeCelsius ;          # the unit travels WITH the value
    tag:aliasHistorian  "TIC101.PV" ;
    tag:aliasMES        "Temperature" ;
    tag:aliasLIMS       "temp_reactor" ;
    tag:governanceOwner "process-data-steward" ;
    tag:lastVerified    "2026-06-01"^^xsd:date .

But a triple store will happily accept a mapping with no owner, no unit, or no review date — the very gaps that turn a dictionary row into the silent liability the next section describes. Catching an absent required field is not a question a query over the triples that exist can answer; it is a question about the triples that should exist and do not, which is exactly the closed-world check (a check that treats a missing required fact as a failure, not as merely unknown) that SHACL (the Shapes Constraint Language — a W3C language for validating that graph data has a required structure) performs. A shape can gate every mapping the way a release specification gates a lot:

# A SHACL shape: every mapping MUST carry a unit, an owner, and a verification date.
tag:MappingShape a sh:NodeShape ;
    sh:targetClass tag:Mapping ;
    sh:property [ sh:path tag:unit ;            sh:minCount 1 ;
                  sh:message "Mapping has no unit — a number without a unit is meaningless." ] ;
    sh:property [ sh:path tag:governanceOwner ; sh:minCount 1 ;
                  sh:message "Unowned mapping: no steward to catch a drifted convention." ] ;
    sh:property [ sh:path tag:lastVerified ;    sh:minCount 1 ; sh:datatype xsd:date ] .

An unowned or unitless mapping is now a validation report, not a gap someone has to notice — the same mechanism the ontology book uses to gate a drug-substance lot's full CQA panel in the release gate and SHACL, where presence and cardinality (sh:minCount/sh:maxCount) decide conformance. The reverse direction — which mappings are overdue for re-verification? — is a one-line SPARQL (the standard query language for RDF, as SQL is for tables) competency question (a plain-English question the data must be able to answer, run as a pass/fail check), the same derivedFrom-style traversal the relations-and-genealogy chapter walks:

# CQ: list every mapping not verified since 2026-01-01 — the data steward's overdue queue.
PREFIX tag: <https://example.org/tag#>
SELECT ?mapping ?owner WHERE {
  ?mapping tag:lastVerified ?d ; tag:governanceOwner ?owner .
  FILTER (?d < "2026-01-01"^^xsd:date)
}

And the last_verified_date is not just a column but a provenance edge in the W3C PROV-O sense (the standard vocabulary for recording who or what produced a fact, and from which activity): the verification links the mapping by prov:wasGeneratedBy back to the review activity and prov:wasAttributedTo the steward who ran it, so an auditor reads not just the date but the act that asserted it. The flat tag-dictionary row and the gated, provenance-bearing graph are the same record in two notations — a checklist whose completeness a machine can guarantee. Anchoring units to a unit ontology like QUDT rather than a free-text string is what lets that guarantee mean something: the concepts (Celsius, Kelvin), not just their spellings, become things a reasoner can relate.

In the real world

The bioreactor temperature is the running example because it is simple, but the heterogeneity bites hardest downstream, where a single number can decide a viral-safety claim. Take the low-pH viral-inactivation hold that follows Protein A capture — the acid step that disrupts enveloped viruses, walked through in the manufacturing book's viral inactivation chapter. Its acceptance is a triple of pH ≤ 3.6, held for ≥ 60 minutes, at a controlled temperature — and every one of those three numbers is a heterogeneity trap. The historian may log hold-pH against the same degC-vs-degF ambiguity as the bioreactor, but worse: a time mismatch here is a safety failure, not a rounding error. If the MES records the hold start in plant-local time and the historian timestamps the pH trend in UTC, a naive join can compute a hold duration an hour off — long enough to pass a 60-minute requirement that was actually missed, or to fail one that was met. The same Protein A step's pooling-window cut points (the two UV-trace thresholds, in milli-absorbance units, between which the elution peak is collected as product) are logged by the chromatography data system in its own tag style, while the LIMS records the resulting pool's host-cell-protein result in ng/mg and the historian holds the live UV trace in mAU — three systems, one purification decision, the same naming/unit/timestamp gaps the bioreactor showed, now standing between a lot and its viral-clearance and impurity-clearance claims. Downstream is exactly where "clean data is not connected data" stops being abstract: each system's record is internally valid, yet the safety-relevant combination is the thing nothing reconciles automatically.

This is not theory. ISA-95 is the de facto reference model for business-to-plant integration, supplying the shared object definitions and terminology that interfaces between enterprise and manufacturing systems are built against [2], and the Allotrope Foundation's shared ontologies are an active, pharma-industry response to laboratory data that no two instruments described the same way [8]. The Industrial Ontologies Foundry is building the manufacturing reference ontologies, on a BFO foundation, to make this map-once-to-a-shared-model approach practical across vendors and companies [3][4]. A shared semantic layer takes aim at the concrete heterogeneity this chapter catalogues: a sensor reporting TIC101.PV at one partner and temp_reactor at another reconciled to one agreed property; a Fahrenheit field and a Kelvin field resolved to a single Celsius quantity; an EQ-00457 asset tag and a BR101 label recognized as the same vessel; and epoch, local, and UTC timestamps normalized to one instant — so that a question can be asked once and answered across organizational boundaries, the FAIR Interoperability principle made operational [1].

What is still unsolved: silent semantic drift

Standards, dictionaries, and ontologies make meaning expressible. They do not make it stay put. The hardest open problem in this data flow is not building the mapping — it is detecting when a mapping has quietly stopped being true.

Consider a real-shaped failure. A company runs the same process at two sites and pools the data in a cloud lake that normalizes every temperature to absolute Celsius. For a year the sister site's bioreactor historian reports temperature the same way the home site does, and the pooled data is clean. Then an instrument is replaced, and the new device's configuration — set by a local engineer following a local convention — reports temperature as a local offset rather than absolute Celsius, an undocumented one-degree shift. Nothing in the pipeline breaks. The numbers are still numbers, still plausible, still in range. The historian validates, the MES validates, the cloud normalizer dutifully applies the old rule, and the pooled dataset is now silently one degree cold for that site — a one-degree error in a critical process parameter that silently makes the two sites' batches no longer comparable, and that no quality check on any single system can see, because each system is internally consistent. The defect lives in the gap between systems, exactly where the previous tangle hid it. It surfaces only at audit, when an investigator reconstructs the batch end to end and finds two sites that should agree disagreeing by a degree.

Semantic drift is the data-management twin of model drift

The one-degree failure above is not only a data problem — it is the exact mechanism that silently breaks a machine-learning model, and seeing the two as one problem is how the methods of the ML book apply here. A model trained on the pooled data learns a relationship between inputs and an answer; the drifted site has changed what its inputs mean without changing how they look. The ML book gives that change two precise names. Covariate shift is when the distribution of the inputs moves while the underlying physics is unchanged — detectable without labels by watching the input distribution, for instance with a Population Stability Index. Concept drift is the dangerous twin: the relationship between input and answer changes while the inputs look perfectly normal, so it can only be caught once slow ground truth arrives — a lagging discovery. A convention that flips from absolute to offset Celsius under an unchanged tag name is concept drift dressed as a clean number, and the MLOps and lifecycle chapter builds the residual control charts and drift detectors that hunt exactly this.

Two more ML disciplines fall straight out of the heterogeneity catalogue. First, data leakage: if you train a release-prediction model on pooled multi-site data and split it by row rather than by batch, near-identical neighbouring rows — or, worse, two sites' records of the same physical fact reconciled by a hidden mapping — land in both training and test, inflating the reported accuracy into a number that collapses on a genuinely new batch. The fix is grouped (leave-one-batch-out) cross-validation (holding all of one batch's rows out of training and scoring only on that unseen batch), and the models and validation chapter shows why an honest split is the only admissible one. Second, applicability domain: a model is valid only over the input region it was trained on (cell lines, scales, raw-material lots), and a semantically drifted feed is exactly an input that has quietly wandered outside that region — the same shift that signals a process leaving its envelope signals a model now extrapolating. And the governance hooks this chapter put on the mapping record are the same ones a deployed model needs: a model's lineage — which pinned dataset trained it, who promoted it, when it was last revalidated — is the governance_owner and last_verified_date discipline applied to a learned artifact rather than a tag. Because a regulated model is locked-then-relearn (frozen in production, never edited in place, with each retrain a new validated version under a Predetermined Change Control Plan), a mapping that silently drifts beneath it poisons the data the next retrain learns from — semantic interoperability is, quietly, a precondition for trustworthy ML, not a separate concern.

This is semantic drift, and it is genuinely hard because the controls we have were built for other failure modes. ISPE GAMP 5, in its data-integrity guidance, frames integrity around the ALCOA+ attributes — the regulators' data-integrity checklist (attributable, legible, contemporaneous, original, accurate, plus complete, consistent, enduring, and available) — and the data life cycle, but a value that is attributable, legible, contemporaneous, original, and accurate within its own system can still be semantically wrong once pooled, and GAMP 5 offers process discipline rather than an automatic detector for a convention that shifts under a stable name [9]. ICH Q2(R2) assumes a specification means the same thing across sites, units included — the very assumption drift violates — without prescribing how to verify it continuously [10]. And 21 CFR 211.192 mandates that the discrepancy be investigated once found, but says nothing about finding it before it reaches a record review. The mapping record from the previous section is the best defense we have — its last_verified_date and governance_owner exist precisely so a drifted convention has somewhere to be caught — yet re-verification is still a human, periodic act, not a guarantee. Closing this gap, with machine-checkable constraints that fire when a stream's meaning shifts, is open work; the open-source book's semantics and knowledge-graph chapter shows one direction, encoding the expected relationships so a reasoner can flag a violation, but no industry-wide solution yet makes silent drift loud.

Key terms

Interoperability — the ability of separate systems to work together.
Syntactic interoperability — agreement on data format: the message parses and the fields line up.
Semantic interoperability — agreement on data meaning: both ends know two values describe the same kind of real-world thing.
LCIM (Levels of Conceptual Interoperability Model) — a layered model naming interoperability from technical connection up through syntactic and semantic to pragmatic, dynamic, and conceptual levels.
Semantic heterogeneity — the gap created when the same things are named and conceived differently across systems and people.
Wrangling — the manual reconciliation of names, units, identifiers, and timestamps before data can be analyzed.
Point-to-point mapping — a separate meaning-converter per pair of systems; suffers the n² problem.
Reference data — the set of standard, approved values a field is allowed to take.
Controlled vocabulary — a governed list of agreed terms with definitions.
UCUM (Unified Code for Units of Measure) — a machine-readable code system that carries a unit together with a value, so software can normalize Celsius/Fahrenheit/Kelvin without a human.
Ontology — a formal, machine-readable model of the concepts in a domain and the relationships among them.
Top-level / upper ontology — a small, domain-neutral set of the most general categories that domain ontologies inherit from.
BFO (Basic Formal Ontology, ISO/IEC 21838-2) — a standardized top-level ontology built to support data integration across heterogeneous systems.
IOF (Industrial Ontologies Foundry) — a community building shared, BFO-grounded reference ontologies for manufacturing.
IOF Core Ontology (IOF Core) — the IOF's BFO-grounded mid-level ontology that gives heterogeneous manufacturing systems one shared model to map to.
Tag dictionary — a governed table whose rows each map one canonical concept to how every system names, units, and timestamps it, with the transformation rules and the owner who maintains it.
Semantic drift — a mapping that silently stops being true because a system's convention changes under an unchanged name, producing values that are internally valid but no longer mean what the pooled model assumes.
RDF / triple — the Resource Description Framework, the standard way to write data as subject-predicate-object triples, the atomic facts of a graph.
SHACL (Shapes Constraint Language) — a W3C language that validates a graph against required structure closed-world, so a missing mandatory field (a mapping with no unit or no owner) becomes a validation report rather than an unnoticed gap.
SPARQL / competency question — the standard query language for RDF (as SQL is for tables), used to run a plain-English question — "which mappings are overdue for re-verification?" — as a pass/fail check.
PROV-O provenance — the W3C vocabulary for recording who or what produced a fact and from which activity; a last_verified_date becomes a prov:wasGeneratedBy edge back to the review that asserted it.
QUDT — a unit ontology that anchors a unit (Celsius, Kelvin) as a concept a reasoner can relate, not just a spelling, so a value carries its unit's meaning with it.
Covariate shift vs. concept drift — the input distribution moving (detectable without labels) versus the input-to-answer relationship changing while inputs look normal (caught only with slow ground truth); silent semantic drift is concept drift wearing a clean number.
Data leakage / grouped (leave-one-batch-out) cross-validation — information bleeding from test into training, which inflates a model's reported accuracy; the fix splits the held-out data by batch, not by row, so near-identical rows cannot leak.
Applicability domain — the input region (cell lines, scales, raw-material lots) a model was proven over; a semantically drifted input is one that has quietly left it, making the model extrapolate.
Locked-then-relearn / PCCP — freezing a deployed model in production and never editing it in place; each retrain is a new validated version promoted under a pre-approved Predetermined Change Control Plan, so a silently drifted mapping poisons the next retrain's data.
ALCOA+ — the regulators' data-integrity checklist (attributable, legible, contemporaneous, original, accurate, plus complete, consistent, enduring, available); a value can satisfy every attribute within one system and still be semantically wrong once pooled across sites.
IQ/OQ/PQ — Installation, Operational, and Performance Qualification, the staged proof a transferred or scaled-up system was installed, operates, and performs right — equipment validation that does not by itself prove two sites describe a setpoint the same way.
CSV vs. CSA — Computerized System Validation's uniform test-everything posture versus the FDA's risk-based Computer Software Assurance, which concentrates rigorous proof on the functions (a critical-parameter tag's unit) that bear on patient safety.

Where this leads

We have arrived at the word ontology by necessity: flat vocabularies fix names and units but cannot carry the relationships that meaning actually depends on. The next chapter, Ontologies and FAIR Data, builds that idea from the ground up — what a class and a relation are, how BFO and the IOF stack fit together, how Allotrope's AFO and the IOF Biopharma ontologies apply it to our field, and how the FAIR principles turn all of this from a philosophy of meaning into a working discipline of data management.

What this chapter covers​

Two kinds of "talking": syntax versus meaning​

The many faces of heterogeneity​

Six kinds of mismatch​

Why "just map them" does not scale​

The n-squared tangle​

A first remedy: reference data and controlled vocabularies​

The promise: a shared model of meaning​

Standards and governance: BFO, IOF, Allotrope, FAIR​

Why it matters​

Regulatory traceability and the cost of wrangling​

Anatomy of a semantic-mapping record​

The mapping record as triples, and the gate that guards it​

In the real world​

What is still unsolved: silent semantic drift​

Semantic drift is the data-management twin of model drift​

Key terms​

Where this leads​