Skip to main content

Glossary

📍 Quick reference: This is your pocket dictionary for the whole stack — every protocol, layer, and regulation the book builds with. Bookmark it and come back anytime a word stops making sense.

Building an open-source bioprocess data platform pulls in three vocabularies at once: plant-floor protocols, database engineering, and pharmaceutical regulation. Here are the most important terms from this book, in plain words, listed alphabetically so they are easy to find. Each entry is a plain-language starting point; the chapter it points to gives the full, precise picture.

Alias (ISA-95 Part 7) — A way to declare that two different names for the same signal are equivalent, so one system can keep calling a probe TT-101 while another knows it as BR101.Temp.PV; the canonical name stays the system of record and the others resolve to it. (See Naming and the Unified Namespace.)

ALCOA+ — The regulators' shorthand for what makes data trustworthy: Attributable, Legible, Contemporaneous, Original, and Accurate, plus Complete, Consistent, Enduring, and Available. It is the recurring yardstick the whole book builds toward. (See ALCOA+ by construction.)

Annex 11 (EU) — The European Union's rules for computerised systems used in regulated manufacturing; the EU counterpart to the US Part 11, and in places stricter (for example, it expects a documented reason for every change). (See Electronic records and signatures.)

Applicability domain — The region of inputs a model was trained on and can be trusted inside; a reading outside it (or flagged uncertain) is something the model should refuse to score rather than guess at. (See Process analytics.)

Asset framework (AF) — A vendor-maintained layer that organizes raw historian tags into a hierarchy of equipment, attributes, and units, so data is queried by asset rather than by cryptic point name; AVEVA PI's AF is the commercial example, and our two SQL views are the open-source counterpart. (See Bridging to commercial historians.)

Audit trail — A secure, time-stamped, append-only record of who changed what data, when, and why, without hiding the prior value; here a PostgreSQL trigger fills an audit.change_log table automatically. (See ALCOA+ by construction.)

Batch / Lot — One manufacturing run: a recipe executed on a unit, with a lot number, status, and start/end times; everything made together that shares one history and quality record. (See The batch and equipment data model.)

Competency question — A plain-English question the data model or ontology must be able to answer, used as a pass/fail acceptance test (for example, "what was dissolved oxygen, by phase, for the golden batch?"). (See Semantics and the digital thread.)

Contextualization — The join that ties a raw sensor reading to the batch, equipment, ISA-88 phase, and recipe it belongs to, turning a bare number into process knowledge; here a single SQL view does it. (See Contextualization.)

CSA (Computer Software Assurance) — The FDA's risk-based, least-burdensome successor to heavy validation: spend assurance effort in proportion to patient and data risk, and lean on existing evidence such as logs and supplier records. (See Validating an open-source stack.)

Data residency — The requirement that certain data physically stay within a region; enforced here with database row-level security and, for China, a separate in-region deployment. (See Data across jurisdictions.)

DCS (Distributed Control System) — The validated control layer (Emerson DeltaV, Siemens) that runs the process loops in real time at ISA-95 Levels 1–2; the open-source stack reads from it but never writes into it. (See Bridging to DCS, MES and ERP.)

Deadband — The minimum change a reading must show before the historian bothers to store a new point; a storage-versus-fidelity trade-off that is also a data-integrity decision. (See Naming and the Unified Namespace.)

Digital thread — The end-to-end chain of linked records tracing a product across its lifecycle; here the genealogy that walks from drug product back to the cell bank. (See Semantics and the digital thread.)

Edge gateway — The device or software at the OT/IT seam that reads field protocols, transforms the data, buffers it, and routes it onward without touching the control loop. (See The edge gateway.)

ERP (Enterprise Resource Planning) — The Level-4 business system (SAP) that owns materials, lots, and work orders; the top of the ISA-95 ladder, above everything this book builds. (See Bridging to DCS, MES and ERP.)

ETL (Extract-Transform-Load) — The scheduled pipeline that copies and reconciles data between two separate products; the fragile alternative this stack avoids by keeping the historian and the batch model in one joinable database. (See The reference architecture.)

Golden batch — A reference run, here BATCH-2026-001, whose trajectory new batches are compared against; the book trends everything against it (with BATCH-2026-004 as the deliberate out-of-spec counterexample). (See The batch and equipment data model.)

GAMP 5 — The industry's risk-based framework for validating computerised systems; its software categories run from 1 (infrastructure) to 5 (custom code you wrote yourself), and the effort scales with the category. (See Validating an open-source stack.)

GxP — The umbrella for the "Good x Practice" quality regulations — Good Manufacturing, Laboratory, and Clinical Practice — that govern any record a regulator may inspect. (See ALCOA+ by construction.)

GMP / cGMP — Good Manufacturing Practice, the legally required rules for making medicine safely and consistently; the "c" means current. (See The batch and equipment data model.)

Grafana — The open-source dashboarding tool used to trend and alert on the stack's data; the dashboards, data sources, and alert rules are defined as version-controlled files, so a trend is a reproducible question and not a stored picture. (See Visualization and trending with Grafana.)

Hash chain — A sequence of audit rows where each stores a SHA-256 hash built from the previous row's hash, so any deleted, reordered, or relinked entry becomes detectable; it makes tampering evident, not impossible. (See ALCOA+ by construction.)

High availability (HA) — A design that survives a single component failure without losing service, usually through a redundant standby that takes over automatically; TimescaleDB's built-in HA is a licensed feature, so the single-node stack here does without it. (See Operating, scaling and securing.)

Historian — A database specialized for storing and serving high-rate, timestamped process signals; the open-source analogue of a commercial AVEVA/OSIsoft PI system. (See The open-source historian.)

Honest hybrid — The book's central stance: pure open source cleanly covers roughly the first 80% of the stack, while the regulated last mile (validation, electronic signatures, high availability, vendor accountability) is met with hardening or commercial systems. (See The honest verdict.)

Hypertable — A TimescaleDB table that behaves like one ordinary table but is automatically partitioned by time into chunks, so high-rate sensor data stays fast at scale while living inside PostgreSQL. (See The reference architecture.)

ISA-88 — The batch procedural standard (recipe → operation → phase) that defines how a batch is made, separate from the equipment it runs on. (See The batch and equipment data model.)

ISA-95 (IEC 62264) — The standard layered model (Levels 0–4) for integrating enterprise and control systems, used to place each tool at a legitimate level and to draw the boundaries between them. (See The reference architecture.)

IQ / OQ / PQ — Installation, Operational, and Performance Qualification: documented proof, in three stages, that a system is installed right, operates right, and performs for its real job over time. (See Validating an open-source stack.)

Knowledge graph — A web of RDF triples linking batch, equipment, material, recipe, and result into one navigable whole, so lineage can be queried across systems rather than reconstructed by hand. (See Semantics and the digital thread.)

LIMS (Laboratory Information Management System) — Sample-centric software that registers samples, schedules tests, captures results against specifications, and drives the release decision; here SENAITE. (See The analytical lab.)

MES (Manufacturing Execution System) — The Level-3 software that executes the recipe, captures electronic signatures, and produces the electronic batch record; there is no credible open-source GxP option, so this slot stays commercial. (See Bridging to DCS, MES and ERP.)

Modbus — A simple 1979 request/reply protocol using function codes and 16-bit registers, with no built-in authentication or encryption; still common on legacy skids, balances, and pumps. (See Connecting legacy and commercial skids.)

MQTT — A lightweight publish/subscribe messaging protocol where devices post to named topics and a broker fans the messages out to whoever subscribed; the OT-side transport this book runs on. (See Speaking OT.)

NAMUR Open Architecture (NOA) — The concept of a second, read-mostly data channel for monitoring and optimization that taps process data without altering the validated control system; the standard that blesses "we never write into the DCS." (See The reference architecture.)

Node-RED — A browser-based, low-code flow editor (flows stored as JSON, running on Node.js) used to wire and prototype edge data routing. (See The edge gateway.)

OPC UA — The platform-independent, self-describing protocol (IEC 62541) that carries each value plus its data type, engineering unit, timestamp, and quality flag from sensor to application; a client can browse a server it has never seen. (See Speaking OT.)

OT / IT — Operational Technology (the controllers, skids, and sensors on isolated control networks) versus Information Technology (the databases, dashboards, and analytics above them); the seam the edge gateway straddles. (See The edge gateway.)

PAT (Process Analytical Technology) — Real-time, in-process measurement (such as an in-line Raman analyzer) used to make quality decisions as the process runs. (See Process analytics.)

Phase — The smallest procedural step in an ISA-88 batch (Growth, Capture, Elution, and so on); the unit the contextualization view brackets each reading against. (See The batch and equipment data model.)

PLC (Programmable Logic Controller) — The ruggedized industrial computer that drives a single piece of process equipment; the device read over Modbus or Siemens S7. (See Connecting legacy and commercial skids.)

PLS (Partial Least Squares) — A regression method for wide, highly correlated data such as spectra; the workhorse behind the titer soft sensor. (See Process analytics.)

Process drift vs. model drift — Process drift is the living culture genuinely wandering batch to batch (a real signal to preserve); model drift is a deployed predictor going stale against that moving process (a defect to detect); conflating them makes a monitor cry wolf or miss a real shift. (See Process analytics.)

Protein A capture — The platform affinity step that selectively binds an antibody by its Fc stem, giving high purity in a single chromatography step; here it runs on skid PA01. (See Downstream capture.)

Quality flag — The per-reading trust code carried with every value; the historian stores the compact legacy OPC DA encoding (192 Good, 64 Uncertain, 0 Bad), while OPC UA's native Good is simply zero. Preserving it is the ALCOA+ "Original" attribute made into a column. (See The reference architecture.)

QUDT unit IRI — A global identifier for a unit of measure (for example .../unit/DEG_C) that pins a value's units machine-readably, so "37" is never ambiguous and a graph can reason across systems. (See Naming and the Unified Namespace.)

RDF triple — A single subject-predicate-object fact, the atom of a knowledge graph; the form one contextualized reading takes once its tag, unit, quality, and batch become explicit facts. (See Semantics and the digital thread.)

Reference architecture — The layered blueprint of the whole platform, where each layer takes the data below, adds one kind of meaning, and hands it up; every later chapter is a thin slice over it. (See The reference architecture.)

Row-level security (RLS) — A PostgreSQL feature that filters every query by session context, used here to enforce data residency so a record can be read or written only within its own region. (See Data across jurisdictions.)

SBOM (Software Bill of Materials) — A machine-readable inventory of every component, version, provenance, and license in the stack; the open-source operator's substitute for "call the vendor" and the basis for watching for new vulnerabilities. (See The honest verdict.)

SHACL (Shapes Constraint Language) — The W3C standard that validates an RDF graph against required structure; it is closed-world, so a missing required fact is a failure now, not an open question — the graph-side mirror of a database NOT NULL. (See Semantics and the digital thread.)

Siemens S7 — Siemens' proprietary PLC protocol stack; insecure by design and read here over the nested TPKT/COTP/S7comm layers using an open library like snap7. (See Connecting legacy and commercial skids.)

Skid — A packaged process unit — pumps, valves, sensors, and a controller mounted on one frame — delivered and qualified as a single machine, such as a chromatography or TFF skid, usually fronted by an OPC UA server. (See Upstream capture.)

Soft sensor (virtual sensor) — A model that infers a hard-to-measure quantity, such as titer, from easy-to-measure inputs like Raman spectra. (See Process analytics.)

Sparkplug B — An opinionated profile on top of MQTT that fixes the topic structure to five levels and adds birth/death session state, so a consumer always knows whether a device is alive and what it reports; the basis of the Unified Namespace on the wire. (See Naming and the Unified Namespace.)

SPARQL — The query language for a knowledge graph; a property path such as (derivedFrom)+ can walk a lineage chain of arbitrary depth in one statement. (See Semantics and the digital thread.)

SPC (Statistical Process Control) — Using a process's own variation to set control limits and flag special-cause deviations; control limits describe what the process does, separate from the specification of what it must do. (See Process analytics.)

System of record — The single authoritative, validated source for a kind of data (the DCS for control, the MES for execution, the LIMS for release); the open-source layer is a mirror or true copy, never the original record. (See Bridging to DCS, MES and ERP.)

Tag — A named signal in the form asset-dot-measurement-dot-attribute (for example BR101.Temp.PV); the postal address of a single data point. (See Naming and the Unified Namespace.)

Tag dictionary — The governed register (gov.tag_dictionary) holding every legal tag, its hierarchy placement, unit, UNS path, and Sparkplug topic, with the tag itself as the database-enforced primary key. (See Naming and the Unified Namespace.)

Tech transfer — Moving a validated process to a new scale or site; the ISA-88/95 model, contextualization, and genealogy travel unchanged, so the receiving site re-qualifies the equipment and load rather than the vocabulary. (See Capstone.)

Unified Namespace (UNS) — A single, real-time, broker-agnostic hierarchy that is the one place any system goes to find the current state of anything, organized like the business and shaped on the ISA-95 levels. (See Naming and the Unified Namespace.)

Validation — The documented, risk-based evidence that a computerised system does exactly what it is specified to do, maintained across its whole life; the regulated burden open source does not arrive with. (See Validating an open-source stack.)

21 CFR Part 11 — The US FDA rule setting the conditions under which electronic records and electronic signatures are trustworthy equivalents of paper and handwriting; no open-source tool is compliant out of the box. (See Electronic records and signatures.)

If a term here still feels fuzzy, follow it back into the chapter where it lives, and it will make far more sense in context.