Skip to main content

Naming Things: Tags, Hierarchies, and the Unified Namespace

๐Ÿ“ Where we are: Chapter 3 gave us the ISA-88/95 batch and equipment model in PostgreSQL. Before we capture a single live reading in Part II, this chapter gives every signal a name โ€” one that the historian, the broker, the knowledge graph, and a regulator can all agree on.

Phil Karlton, a famous engineer at Netscape, supposedly said there are only two hard things in computer science: cache invalidation and naming things. In a biomanufacturing plant the second one quietly decides whether the first book's promise โ€” trace any number back to its source โ€” is achievable at all. A bioreactor temperature probe does not emit knowledge. It emits a tag: a short string and a float, like TT-101 = 37.02. If that same probe is called TIC_101.PV in the DeltaV controller, BR1_TEMP in the historian, and Reactor1 Temperature in the spreadsheet a scientist keeps, you do not have one signal โ€” you have three, and no machine can tell they are the same thing. Multiply that by ten thousand tags across a SCADA, an MES, and an ERP, and the result is the single most common, least glamorous cause of data-management failure in the industry [1].

This chapter is the cure, and it is mostly discipline rather than code. We design one naming convention, ground it in ISA-95, project it into a Unified Namespace (UNS) topic tree and a Sparkplug B topic, store it as governed data, and โ€” because this is the hands-on book โ€” ship a linter that refuses to let a bad tag into the platform. The code is small. The payoff is everything downstream.

The simple version

A tag name is a postal address for data. BR101.Temp.PV is "the present value of the temperature measurement on bioreactor 101" โ€” and newark/upstream/BR101/Temp is the same address written as a folder path the whole plant can browse. Pick the address format once, write it down, and have a robot at the door turn away anything that doesn't match. After that, every later chapter just files its letters in the right box.

What this chapter coversโ€‹

We start with why one signal must have exactly one canonical name, then build the convention on top of the ISA-95 hierarchy from Chapter 3. We turn each tag into a UNS path and a Sparkplug B topic, look at how ISA-5.1 instrument tags and ISA-95 Part 7 aliasing let the floor and the cloud disagree on labels without losing the thread, and store the whole dictionary as governed data in PostgreSQL. Finally we run the real linter from the companion repo against the simulator's 16 tags, watch it pass, and watch it reject the kinds of drift that happen in real plants.

One signal, one canonical nameโ€‹

Open the running case's golden data and look at the raw tags. This is a slice of datasets/fedbatch_timeseries_10min.sample.csv from the companion repo โ€” the long-format time series our 14-day fed-batch CHO run produces:

datasets/fedbatch_timeseries_10min.sample.csv (excerpt)
ts,tag,value,unit,quality,batch_id
2026-01-05 00:00:00+00:00,BR101.Agitation.PV,81.4323,rpm,192,BATCH-2026-001
2026-01-05 00:00:00+00:00,BR101.Agitation.SP,81.6008,rpm,192,BATCH-2026-001
2026-01-05 00:00:00+00:00,BR101.DO.PV,40.8224,%sat,192,BATCH-2026-001
2026-01-05 00:00:00+00:00,BR101.DO.SP,40.0,%sat,192,BATCH-2026-001

Every row already carries a tag and a unit. The discipline question is: who guarantees that BR101.DO.PV always means the same probe, in the same units, forever โ€” and that nobody invents BR101.DissolvedO2 next quarter? The answer is a convention plus a register plus a gate. The convention says what a legal tag looks like; the register is the single list of every legal tag; the gate is automation that fails loudly when reality drifts from the register.

The convention the repo uses is deliberately boring, three dotted segments:

<UNIT>.<Measurement>.<Attr> e.g. BR101.Temp.PV
  • UNIT is a piece of equipment from the ISA-95 model we seeded in Chapter 3 โ€” BR101 (the production bioreactor), N1SEED (a seed-train vessel), PA01 (the Protein A capture skid).
  • Measurement is the physical thing being measured โ€” Temp, pH, DO, Agitation, Titer.
  • Attr is the role of this particular number: PV (present/process value), SP (setpoint), MV (manipulated/output value), Rate, or State.

That last segment matters more than it looks. The difference between a setpoint (what we asked for) and a process value (what we got) is the difference between an instruction and a measurement โ€” and conflating them is a classic data-integrity error. Keeping BR101.Temp.SP and BR101.Temp.PV as distinct, named signals is the naming-layer expression of an ALCOA+ principle: a value must be attributable to its true source, whether that source is a person or, here, an automatically generated reading from a specific instrument [2][3].

Grounding the names in ISA-95โ€‹

A convention with no map underneath it is just a string format. The reason BR101 is meaningful is that it resolves to a real position in the ISA-95 equipment hierarchy โ€” enterprise โ†’ site โ†’ area โ†’ work-center/unit โ€” the technology-agnostic address space the standard defines for exactly this purpose [4]. Our seeded model places the production bioreactor at Newark site โ†’ upstream area โ†’ BR101, and the Protein A skid at Newark โ†’ downstream โ†’ PA01. The tag dictionary's job is to carry that placement around with every signal, so a number is never an orphan.

In the companion code, that mapping lives at the top of examples/chapters/04-naming-uns/naming.py as a small, explicit table โ€” the same hierarchy the seeded Postgres model encodes:

examples/chapters/04-naming-uns/naming.py
TAG_RE = re.compile(r"^[A-Z][A-Z0-9]+\.[A-Za-z][A-Za-z0-9]+\.(PV|SP|MV|Rate|State)$")

# the equipment hierarchy the tags live under (matches the seeded ISA-95 model)
UNIT_AREA = {"BR101": ("newark", "upstream"), "N1SEED": ("newark", "upstream"),
"PA01": ("newark", "downstream"), "PBR201": ("newark", "upstream")}
QUDT = {"degC": "DEG_C", "pH": "PH", "%sat": "PERCENT", "rpm": "REV-PER-MIN",
"kg": "KiloGM", "bar": "BAR", "L": "L", "%": "PERCENT", "g/L": "GM-PER-L"}

Two design decisions are worth pausing on. First, TAG_RE is the machine-readable contract: a tag is legal only if it is an uppercase-led unit code, a measurement, and one of a closed set of attribute roles. That closed set is what stops the slow rot of synonyms. Second, the QUDT table maps each plant unit-of-measure string to a QUDT unit IRI (for example degC โ†’ http://qudt.org/vocab/unit/DEG_C). Units are part of a value's identity โ€” "37" is meaningless until you know it is degrees Celsius โ€” and pinning them to a global vocabulary now is what lets the Chapter 16 knowledge graph reason across systems later. The %sat versus plain % distinction in the data above is exactly the kind of unit ambiguity this table forces you to resolve once, on paper, instead of arguing about it during a deviation investigation.

From tag to topic: the Unified Namespaceโ€‹

A tag dictionary makes signals consistent. A Unified Namespace makes them browsable. The UNS idea, popularized in the manufacturing-IoT community, is a single real-time hierarchy โ€” semantically organized like the business itself, and broker-agnostic โ€” that becomes the one place any system goes to find the current state of anything [5]. The strong recommendation, which we follow, is to shape that hierarchy on the ISA-95 levels โ€” Enterprise / Site / Area / Line / Cell โ€” rather than inventing a parallel taxonomy [6].

Concretely, the UNS is a tree of MQTT topics. MQTT is the lightweight publish/subscribe transport standardized by OASIS and as ISO/IEC 20922 [7]; a topic is just a slash-delimited path, like a folder. The community best practice is hierarchical, general-to-specific levels, all lowercase, no spaces โ€” so the path is predictable and the single-level (+) and multi-level (#) wildcards do something useful [8]. With that path in hand, a dashboard can subscribe to newark/upstream/+/Temp and instantly receive every reactor temperature in the upstream area without naming each one.

naming.py derives the UNS path directly from the canonical tag, so the address space can never disagree with the dictionary:

examples/chapters/04-naming-uns/naming.py
def uns_path(tag: str) -> str:
unit, measurement, _attr = tag.split(".")
site, area = UNIT_AREA[unit]
return f"{site}/{area}/{unit}/{measurement}"


def sparkplug_topic(tag: str, group: str = "newark", edge: str = "edge1") -> str:
unit = tag.split(".")[0]
return f"spBv1.0/{group}/DDATA/{edge}/{unit}"

The figure below shows how one signal projects into all three views at once.

A bioreactor temperature signal shown projected into three aligned views: the canonical dotted tag, the slash-delimited UNS topic tree grounded in the ISA-95 enterprise-site-area-unit hierarchy, and the fixed-arity Sparkplug B topic.

One signal, three coordinated addresses: the canonical tag is the system of record, the UNS path is the human- and dashboard-browsable tree, and the Sparkplug B topic is the wire format. All three are generated from the same dictionary, so they cannot drift apart.

Original diagram by the authors, created with AI assistance.

Sparkplug B: a stricter cousinโ€‹

Plain MQTT topics are free-form, which is their charm and their danger: nothing stops two teams from organizing the tree differently. Sparkplug B (the Eclipse Sparkplug 3.0 specification) is an opinionated profile on top of MQTT that fixes the topic structure to exactly five levels โ€” spBv1.0/<group_id>/<message_type>/<edge_node_id>/<device_id> โ€” and adds birth/death session-state management so subscribers always know whether a publisher is alive [9]. That is why sparkplug_topic() returns spBv1.0/newark/DDATA/edge1/BR101: the newark group, a DDATA (device-data) message, from edge node edge1, for device BR101.

The two topic shapes are not redundant โ€” they serve different audiences. The UNS path is the human-and-dashboard-browsable single source of truth; the Sparkplug topic is the disciplined, fixed-arity wire format an edge gateway actually publishes on, which we wire up for real in the next chapter. Generating both from one dictionary is how a plant keeps a flexible browsing tree and a rigid transport contract without the two ever contradicting each other [6].

Two names for one thing: ISA-5.1 and Part 7 aliasingโ€‹

So far we have assumed a clean slate. Real plants are not clean slates. The instrument loop on the P&ID has been called TT-101 since before the historian existed โ€” a tag in the form defined by ISA-5.1, the instrumentation symbols and identification standard, where a first letter denotes the measured variable and the rest identifies the loop (TT = temperature transmitter, PIC = pressure indicating controller) [10]. The DCS exposes it as TIC_101.PV. The ERP knows it only as a cost-center asset number. Demanding that everyone rename to BR101.Temp.PV is a multi-year change-control project nobody will finance.

The standards-blessed answer is not to force one name but to declare them equivalent. ISA-95 Part 7, the Alias Service Model, exists precisely to reconcile the different identifiers different systems use for the same object, so the platform stays self-consistent even when one physical signal carries several names [11]. In our world this means the canonical BR101.Temp.PV is the system of record, and TT-101, TIC_101.PV, and the ERP asset number are aliases that all resolve to it. The tag dictionary is the natural home for that resolution table, and storing it as data (rather than burying it in glue code) is what makes the mapping auditable.

Storing the dictionary as governed dataโ€‹

A convention that lives only in a Python file is a suggestion. A convention that lives in a governed database table, with a primary key the database itself enforces and a linter the test suite runs, is policy. The companion repo's examples/platform/db/40-gov.sql creates the schema where the dictionary lands once the full stack is up โ€” note the comments tying each column back to its downstream consumer:

examples/platform/db/40-gov.sql
-- The tag dictionary (Ch 4): every signal's canonical name, asset, unit, the
-- QUDT unit IRI, its UNS path and Sparkplug topic, and a deadband. The naming
-- linter (chapters/04-naming-uns/naming.py, exercised by tests/test_chapters.py
-- under `make test`) rejects any tag not matching the convention.
CREATE TABLE gov.tag_dictionary (
tag text PRIMARY KEY, -- BR101.Temp.PV
asset text NOT NULL, -- BR101
measurement text NOT NULL, -- Temperature
unit text NOT NULL, -- degC
qudt_unit text, -- http://qudt.org/vocab/unit/DEG_C
uns_path text NOT NULL, -- newark/upstream/BR101/Temperature
sparkplug_topic text NOT NULL, -- spBv1.0/newark/DDATA/edge1/BR101
data_type text NOT NULL DEFAULT 'Double',
deadband numeric NOT NULL DEFAULT 0
);

The tag column is the primary key: the database itself now enforces "one canonical name, used once." uns_path and sparkplug_topic are stored, not recomputed at read time, so the Chapter 5 publisher and the Chapter 14 contextualization views read the same topic the linter approved. The deadband column is a quiet but important detail โ€” it is the change threshold below which a reading is not worth publishing, and storing it next to the tag keeps capture policy governed rather than scattered across collector configs. The same gov schema also holds the jurisdiction policy (Chapter 23) and the supplier register (Chapter 22), so all the data about the data sits in one place.

build_dictionary() in naming.py produces exactly the rows this table expects โ€” asset, measurement, unit, QUDT IRI, UNS path, and Sparkplug topic โ€” from the live tag set, which is how the running stack, the seed data, and the book can never quietly disagree.

The linter: a robot at the doorโ€‹

Here is where the hands-on book earns its title. A convention is only real if something enforces it automatically, on every change, forever. That something is the linter โ€” and it is genuinely tiny. From examples/chapters/04-naming-uns/naming.py:

examples/chapters/04-naming-uns/naming.py
def lint_tag(tag: str) -> str | None:
"""Return None if the tag is valid, else a reason string."""
if not TAG_RE.match(tag):
return f"'{tag}' does not match <UNIT>.<Measurement>.<Attr>"
unit = tag.split(".")[0]
if unit not in UNIT_AREA:
return f"unit '{unit}' is not in the equipment hierarchy"
return None


def lint_dataset() -> tuple[pd.DataFrame, list[str]]:
df = pd.read_parquet(DATA / "fedbatch_timeseries.parquet")
tags_units = dict(df.groupby("tag")["unit"].first())
problems = [f"{t}: {lint_tag(t)}" for t in tags_units if lint_tag(t)]
return build_dictionary(tags_units), problems

Two checks, two failure modes. First, does the tag match the structural contract TAG_RE? Second, even if well-formed, does its unit code correspond to a real piece of equipment in the hierarchy? A perfectly-shaped tag for a vessel that does not exist is just as wrong as a malformed one โ€” it is an orphan with good handwriting.

Run it against the real golden dataset (python chapters/04-naming-uns/naming.py) and you get the actual, tested output โ€” the first eight rows of the generated dictionary and a clean bill of health:

tag asset measurement unit qudt_unit uns_path sparkplug_topic
BR101.Agitation.PV BR101 Agitation rpm http://qudt.org/vocab/unit/REV-PER-MIN newark/upstream/BR101/Agitation spBv1.0/newark/DDATA/edge1/BR101
BR101.Agitation.SP BR101 Agitation rpm http://qudt.org/vocab/unit/REV-PER-MIN newark/upstream/BR101/Agitation spBv1.0/newark/DDATA/edge1/BR101
BR101.DO.PV BR101 DO %sat http://qudt.org/vocab/unit/PERCENT newark/upstream/BR101/DO spBv1.0/newark/DDATA/edge1/BR101
BR101.DO.SP BR101 DO %sat http://qudt.org/vocab/unit/PERCENT newark/upstream/BR101/DO spBv1.0/newark/DDATA/edge1/BR101
BR101.FeedA.PV BR101 FeedA kg http://qudt.org/vocab/unit/KiloGM newark/upstream/BR101/FeedA spBv1.0/newark/DDATA/edge1/BR101
BR101.FeedB.PV BR101 FeedB kg http://qudt.org/vocab/unit/KiloGM newark/upstream/BR101/FeedB spBv1.0/newark/DDATA/edge1/BR101
BR101.OffgasCO2.PV BR101 OffgasCO2 % http://qudt.org/vocab/unit/PERCENT newark/upstream/BR101/OffgasCO2 spBv1.0/newark/DDATA/edge1/BR101
BR101.OffgasO2.PV BR101 OffgasO2 % http://qudt.org/vocab/unit/PERCENT newark/upstream/BR101/OffgasO2 spBv1.0/newark/DDATA/edge1/BR101

16 tags; lint problems: 0

All sixteen of the fed-batch tags pass. More importantly, the repo's test suite asserts both halves of the contract โ€” that the real tags pass and that the linter actually rejects garbage. From examples/tests/test_chapters.py:

examples/tests/test_chapters.py
def test_ch04_naming_linter_passes_on_real_tags():
import naming

dictionary, problems = naming.lint_dataset()
assert len(dictionary) == 16
assert problems == [] # every simulator tag obeys the convention
assert naming.lint_tag("badtag") is not None # and the linter actually rejects bad ones
assert naming.uns_path("BR101.Temp.PV") == "newark/upstream/BR101/Temp"

To see why the gate matters, here is lint_tag() run on the kinds of drift that actually show up in plants:

'BR101.Temp.PV' -> None
'TT101' -> 'TT101' does not match <UNIT>.<Measurement>.<Attr>
'br101.temp.pv' -> 'br101.temp.pv' does not match <UNIT>.<Measurement>.<Attr>
'TK205.Temp.PV' -> unit 'TK205' is not in the equipment hierarchy
'BR101.Temp.Value' -> 'BR101.Temp.Value' does not match <UNIT>.<Measurement>.<Attr>

The raw P&ID tag (TT101), the lowercase typo (br101.temp.pv), the well-formed-but-unknown vessel (TK205), and the synonym creep (Value instead of PV) are each caught with a specific reason. In the companion repo this contract is wired into the test suite โ€” make test runs test_ch04_naming_linter_passes_on_real_tags, which fails if any real tag breaks the convention โ€” so naming is not a wiki page everyone ignores but an assertion that goes red. Promoting that same check to a pre-commit hook or a CI gate in your own deployment is the next step, and the one a GxP SOP will ultimately require.

Why it mattersโ€‹

Naming is the first place ALCOA+ either succeeds or quietly fails. If you cannot point at a number and say which instrument produced it, in what units, on what equipment, in which batch, you cannot make it attributable or traceable to its source โ€” and both MHRA and FDA data-integrity guidance treat that traceability as foundational [2][3]. Every later capability in this book โ€” the historian's joins, the chromatography phase events, the knowledge graph's lineage queries, the audit trail's hash chain โ€” assumes that the address on each packet of data is correct and unique. Get naming right and those layers compose. Get it wrong and you spend the rest of the program reconciling synonyms by hand during deviation investigations, which is precisely the failure mode the peer-reviewed Bioprocessing 4.0 literature identifies as endemic across MES/DCS/SCADA estates [1].

In the real worldโ€‹

The honest picture is that open source gets you most of the way here, and unusually cleanly. The convention is just discipline; the UNS is a topic tree on a free MQTT broker; the dictionary is a Postgres table; the linter is fifty lines of Python you fully own (MIT-licensed in the repo). There is no commercial product you need to buy to name things well โ€” which is why naming is one of the few layers where the pure-OSS answer is genuinely complete.

What OSS does not hand you is the surrounding governance, and this is the candid part. Commercial UNS and namespace tooling โ€” HighByte Intelligence Hub, the Ignition platform's tag system, AVEVA's asset framework โ€” ship with graphical modelers, role-based change control, and bundled aliasing services that our gov.tag_dictionary + linter approach reproduces only if you wire the change-control around it. The standards we lean on are real and current: ISA-95 Part 1 is current as of its 2025 edition (ANSI/ISA-95.00.01-2025) [4], Part 7's Alias Service Model is the blessed way to reconcile names across systems [11], ISA-5.1 still governs the loop tags on the P&ID [10], and Sparkplug 3.0 and MQTT are the de-facto UNS transport [9][7]. But a standard is a contract, not an enforcer. The enforcer is your CI gate โ€” and in a GxP context, the validated SOP that says no tag enters production without passing it. NIIMBL's SABRE facility โ€” the NIIMBL / University of Delaware pilot-scale current Good Manufacturing Practice (cGMP) plant under construction in Newark, Delaware since its April 2024 groundbreaking โ€” is exactly the kind of greenfield site where settling one namespace before the first probe is wired is the difference between a coherent data platform and a decade of reconciliation. We name our simulated site newark in quiet homage to that reality.

One more candid note for production: our linter checks structure and equipment membership, not semantic duplication. It will happily accept both BR101.DO.PV and a hypothetical BR101.DissolvedOxygen.PV if someone adds the second measurement name to the hierarchy. Catching that โ€” true synonym detection โ€” needs the Part 7 alias table populated and reviewed by a human, which is governance, not regex.

Key termsโ€‹

  • Tag โ€” a named signal, here <UNIT>.<Measurement>.<Attr> (e.g. BR101.Temp.PV); the postal address of a data point.
  • Canonical name โ€” the one official tag for a signal; the primary key of the dictionary and the system of record.
  • Unified Namespace (UNS) โ€” a real-time, broker-agnostic hierarchy that is the single source of truth for current plant state, organized like the business and (here) shaped on ISA-95 levels [5].
  • MQTT topic โ€” a slash-delimited publish/subscribe path; supports + (single-level) and # (multi-level) wildcards [7][8].
  • Sparkplug B โ€” an opinionated MQTT profile fixing the topic to spBv1.0/<group>/<msg_type>/<edge>/<device> and adding birth/death state [9].
  • ISA-5.1 instrument tag โ€” the P&ID loop label (e.g. TT-101, PIC) where the first letter is the measured variable [10].
  • ISA-95 Part 7 aliasing โ€” the Alias Service Model that declares different system identifiers equivalent to one canonical object [11].
  • QUDT unit IRI โ€” a global identifier for a unit of measure (e.g. .../unit/DEG_C) that pins a value's units machine-readably.
  • Tag dictionary โ€” the governed register (gov.tag_dictionary) of every legal tag, its hierarchy placement, unit, UNS path, and Sparkplug topic.
  • Linter (naming.py) โ€” the check that rejects any tag not matching the convention or not present in the equipment hierarchy; in the repo it runs via the test_ch04_naming_linter_passes_on_real_tags test under make test, and is the natural thing to promote to a pre-commit hook or CI gate in production.

Where this leadsโ€‹

Every signal now has a name the whole platform agrees on, a place in the ISA-95 tree, a UNS path to be browsed by, a Sparkplug topic to ride on, and a robot at the door that keeps it all honest. We have built the address book. The next chapter, Speaking OT: OPC UA, MQTT, and Sparkplug B, picks up the wire format we just designed and stands up the real connectivity backbone โ€” a self-describing OPC UA server, a Mosquitto broker, and a Sparkplug B publisher with genuine certificate security โ€” so those addressed signals start flowing live from the simulated bioreactor into the stack.