Glossary
📍 Quick reference: This is your pocket dictionary for the whole book. Bookmark it and come back anytime a word — an acronym, a standard, a piece of jargon — stops making sense.
Data management has at least as much vocabulary as the manufacturing it documents, and much of it is acronyms. Here are the most important terms from this book, in plain words, listed alphabetically so they are easy to find. Each entry is a plain-language starting point; the chapter it points to gives the full, precise picture.
ALCOA / ALCOA+ — the shorthand for what trustworthy data must be. ALCOA is Attributable, Legible, Contemporaneous, Original, and Accurate; ALCOA+ adds Complete, Consistent, Enduring, and Available. Regulators treat these nine attributes as the design requirements every record must satisfy. (See Data Integrity and ALCOA+.)
Alarms and events (A&E) — the timestamped log a control system raises when a value crosses a limit, a pump starts, or an operator acts; one of the four data streams the automation layer emits, governed by the ISA-18.2 alarm standard. (See Automation and Process Control Data.)
Applicability domain — the envelope of conditions a model was actually trained on. A prediction made on an input outside that envelope is extrapolation and should be flagged rather than trusted — the modeling cousin of a quality flag on a sensor reading. (See Machine Learning and Soft Sensors.)
Audit trail — a secure, computer-generated, time-stamped record of who created or changed a record, when, and why, protected so it cannot be switched off or quietly altered; guidance now expects it to be reviewed, not just kept. (See Data Integrity and ALCOA+.)
Batch / Lot — one defined quantity of product, run as a sequence of ordered steps; the unit every record in the plant is ultimately keyed to. (See The Biologic and Its Data Shadow.)
Batch genealogy (lineage) — the chained, traceable record that links a finished vial back to every bioreactor run, media lot, method, and condition that shaped it; the join key that turns scattered system islands into one story. (See Where Process Data Is Born.)
Contextualization — attaching the surrounding meaning — unit, time, equipment, batch, phase, method — that turns a bare number into information you can trust and act on. (See The Lifecycle of a Data Point.)
Continuous / integrated continuous bioprocessing — manufacturing in which material flows nonstop through connected unit operations rather than stopping in discrete batches, so the data streams never stop and "the batch" has to be defined by a slice of time. (See Real-Time Integration and Pharma 4.0.)
CPP (Critical Process Parameter) — a process setting, such as bioreactor temperature or feed rate, that will change the product if it varies too much; the "how we made it" numbers. (See The Biologic and Its Data Shadow.)
CPV (Continued Process Verification) — Stage 3 of process validation: the perpetual, data-driven monitoring of every commercial batch to prove the process stays in a state of control. (See SPC, MVDA, and CPV.)
CQA (Critical Quality Attribute) — a property of the product itself — purity, potency, the pattern of sugar chains — that must stay within limits for the medicine to be safe and effective; the "what we made" numbers. (See The Biologic and Its Data Shadow.)
CSV / CSA — Computerized System Validation, the documented evidence that a regulated system does what it should, structured as the V-model's IQ/OQ/PQ rungs; and Computer Software Assurance, the FDA-led shift toward risk-based critical thinking that spends rigorous proof where patient risk is highest instead of testing every function identically. (See Validating Computerized Systems.)
Data governance — the system of decision rights and accountabilities over data: who may do what, with which data, under what rules, enforced through owners, stewards, and custodians. (See Data Governance.)
Data integrity — the degree to which data stays complete, consistent, and accurate throughout its entire life; regulators treat it as the bedrock of product quality. (See Data Integrity and ALCOA+.)
Data leakage — the cardinal modeling error in which information from the test set bleeds into training — for example, rows from one batch landing on both sides of a random split — inflating a model's score into a number it can never reproduce in production. (See The Lifecycle of a Data Point.)
Data lifecycle — the full journey of a data point through seven stages: generation/capture, processing, contextualization, review and use, reporting, retention and archival, and disposal. (See The Lifecycle of a Data Point.)
Data shadow — the full body of records a batch casts as it is made and tested: sensor traces, the batch record, test results, and signatures; as essential to the product as the molecule itself. (See The Biologic and Its Data Shadow.)
Data silo — valuable data trapped in a system that cannot easily share it; the opposite of FAIR, and the central problem the seams between plant systems create. (See Plant Information Systems.)
DCS (Distributed Control System) — a coordinated network of controllers that runs an entire process or plant, spanning basic control and supervision (ISA-95 Levels 1-2). (See Automation and Process Control Data.)
Deviation record / CAPA — the formal investigation opened when a process departs from approved limits, linking back to the batch, its genealogy, and the audit-trail rows that are its evidence; the corrective and preventive action (CAPA) is the permanent fix that follows and must close before release. (See Data Integrity and ALCOA+.)
Digital thread — one connected, traceable data record linking a medicine's whole lifecycle — development, process, product, patient — so genealogy becomes queryable end to end. (See The Digital Thread and Digital Twin.)
Digital twin — a virtual representation of a real process or asset, continuously updated by live data from its physical counterpart and able to feed decisions back to it (distinct from a one-way digital shadow or a hand-updated digital model). (See The Digital Thread and Digital Twin.)
EBR (electronic batch record) — the digital, signed account of how one batch was actually made — the recipe followed, materials added, parameters confirmed, and human sign-offs; the official as-executed record, held in the MES. (See Automation and Process Control Data.)
Electronic signature — a computer-based signing method, legally equivalent to a handwritten signature when the controls are met, that shows the signer's printed name, the date and time, and the meaning of the signing (reviewed, approved, or authored). (See 21 CFR Part 11 and EU Annex 11.)
ELN (Electronic Lab Notebook) — the digital descendant of the bound paper notebook, capturing the exploratory, narrative lab work — what was tried and why — as opposed to the LIMS's routine, structured testing. (See Plant Information Systems.)
ERP (Enterprise Resource Planning) — the enterprise-level (Level 4) business system that owns orders, inventory, materials, and finance, and exchanges information with the MES at the operations boundary. (See Plant Information Systems.)
FAIR — the principle that data should be Findable, Accessible, Interoperable, and Reusable, centered on rich, machine-actionable metadata; FAIR is about being usable by machines, and is not the same as being open. (See Ontologies and FAIR Data.)
GAMP 5 — ISPE's widely used, risk-based guide for validating regulated computerized systems, including the software categories that scale the validation effort to a system's type and risk. (See Validating Computerized Systems.)
GMP / cGMP — Good Manufacturing Practice (the "c" for current): the legally enforced rules for how medicines must be made, under which every record in this book must hold up to inspection. (See The Biologic and Its Data Shadow.)
Golden batch / fingerprint — a library of past successful batch trajectories, modeled (often by multiway PCA) into a multivariate fingerprint of what a good run looks like at every moment, so a new batch can be monitored against it in real time. (See SPC, MVDA, and CPV.)
Grouped (leave-one-batch-out) cross-validation — the honest way to score a bioprocess model: hold out whole batches, never individual rows, because rows within one run are correlated; only possible because every row carries its batch_id. (See The Lifecycle of a Data Point.)
Historian — a database built to ingest and store the plant's high-volume time-series — tens of thousands of timestamped tags — and answer queries across years of them, using compression to keep it affordable. (See Plant Information Systems.)
In-line / on-line / at-line / off-line — the four locations a measurement can happen, from a probe inside the stream (in-line), through a diverted side-stream (on-line) and a sample read nearby (at-line), to a sample carried to a distant lab (off-line); closer means faster, farther means more definitive. (See Instruments and Sensors.)
IQ / OQ / PQ — Installation, Operational, and Performance Qualification: the staged, documented proof that a system or instrument was installed right, operates right, and performs right on its real workload. (See Validating Computerized Systems.)
ISA-88 (IEC 61512) — the batch-control standard that separates the equipment (what you have) from the procedure (what you do with it) and defines recipe types and the phase, the smallest reusable action; it gives a batch its common structure and is the s88.batch anchor every source carries. (See Automation and Process Control Data.)
ISA-95 (IEC 62264) — the standard, built on the Purdue model, that defines the plant's Levels 0-4 and how the systems that control a factory connect to the systems that run the business. (See Architecture and Integration: ISA-95.)
The islands problem — the difficulty of joining the same batch across disconnected systems whose batch keys, clocks, and unit conventions do not match, forcing the same fact to be captured redundantly and reconciled by hand. (See The Lifecycle of a Data Point.)
LIMS (Laboratory Information Management System) — the QC lab's system of record, which assigns each sample an identity, routes it to tests, holds the specifications, and judges each result against them. (See Plant Information Systems.)
Master data — the stable reference information that does not change batch to batch: approved recipes, product specifications, material lists, the equipment register; the unchanging cast every batch shares. (See The Biologic and Its Data Shadow.)
Metadata — data about data: the unit, timestamp, equipment, batch, method, and operator that give a value its meaning and history. A bare "37" is meaningless; the metadata is what makes it a fact. (See The Lifecycle of a Data Point.)
MES (Manufacturing Execution System) — the operations-level (Level 3) system that governs how a batch is made: it dispatches instructions, enforces the recipe step by step, and assembles the electronic batch record; the system of record for batch execution. (See Plant Information Systems.)
Model drift — a deployed model going quietly stale as the world moves away from its training data, distinct from genuine process drift (the living process actually changing); telling the two apart needs the contextualized, timestamped data shadow. (See Where Process Data Is Born.)
Monoclonal antibody (mAb) — a Y-shaped protein medicine, every copy identical, engineered to lock onto one specific target; the book's recurring example product. (See The Biologic and Its Data Shadow.)
MQTT / Sparkplug B — a lightweight publish/subscribe transport (MQTT), where devices publish to topics and a broker fans messages out, disciplined for industry by Sparkplug B's strict topic structure and its birth/death lifecycle that announces a device's arrival and its loss. (See Connectivity and Interoperability Standards.)
MTP (Module Type Package) — a vendor-neutral digital manifest (VDI/VDE/NAMUR 2658) describing a skid's services, tags, screens, and alarms so a module integrates "plug-and-produce" rather than by custom mapping. (See Connectivity and Interoperability Standards.)
MVDA (Multivariate Data Analysis) — analyzing many correlated process variables together rather than one at a time, using latent-variable methods such as PCA and PLS, and charts like Hotelling's T-squared and SPE to ask "am I in a normal region, and do I still behave as the model expects?" (See SPC, MVDA, and CPV.)
OPC UA (Open Platform Communications Unified Architecture) — the dominant, vendor-neutral process-connectivity standard (IEC 62541) that carries not just a number but a self-describing information model — the value plus its type, units, timestamp, and quality — across system boundaries. (See Connectivity and Interoperability Standards.)
OT / IT — Operational Technology (the computers that directly run physical equipment, prioritizing availability and safety) versus Information Technology (databases and networks, prioritizing confidentiality); bringing them together is OT/IT convergence, with a DMZ and zones-and-conduits guarding the seam. (See Architecture and Integration: ISA-95.)
PAT (Process Analytical Technology) — the FDA framework for building quality in by measuring critical attributes as a process runs rather than only testing the final product; a system for designing, analyzing, and controlling, not merely a set of sensors. (See Instruments and Sensors.)
PLC (Programmable Logic Controller) — a rugged industrial computer at the basic-control level (ISA-95 Level 1) that reads sensors and drives actuators many times a second. (See Automation and Process Control Data.)
Quality by Design (QbD) — the development philosophy of building quality in deliberately by understanding which process parameters and product attributes truly matter, rather than testing quality in afterward (ICH Q8). (See The Biologic and Its Data Shadow.)
Raw data vs. processed data — the original, unaltered values exactly as the instrument first recorded them (raw, which regulators treat as sacred) versus the calibrated, averaged, or calculated results derived from them (processed); for a chromatogram the raw form is the complete electronic data file, not the printed peak table. (See The Lifecycle of a Data Point.)
Retention — keeping records readable for their required life; for a US-regulated medicine the floor is at least one year past the batch's expiry date, and firms in practice keep records far longer. (See The Lifecycle of a Data Point.)
SCADA / HMI — the supervisory layer (Supervisory Control and Data Acquisition) and its visible face, the Human-Machine Interface, the screens through which operators watch trends, change setpoints, and respond to alarms. (See Automation and Process Control Data.)
Semantic interoperability — agreement on meaning, not just format: two systems agree that one's pH and another's acidity name the same physical quantity in the same units. Moving the bytes (syntactic interoperability) is not the same as preserving the meaning. (See Connectivity and Interoperability Standards.)
SHACL (Shapes Constraint Language) — a way to gate graph data with required-structure rules in a closed world, where a missing required fact is a failure now rather than an open question; the graph analogue of a database NOT NULL constraint, used to enforce a release gate. (See Ontologies and FAIR Data.)
Single-use / skid — sterile, disposable plastic process assemblies (single-use) and pre-assembled, self-contained units that arrive with their own local controllers and tags (skids); each skid is a small island of automation the site must integrate. (See Automation and Process Control Data.)
Soft sensor — a software "virtual instrument" that infers a value no physical probe measures directly — biomass, growth rate, titer — by fusing frequent online signals with sparse lab values; the fast online stream gives it rhythm, the occasional lab value keeps it honest. (See Machine Learning and Soft Sensors.)
SPARQL — the standard query language for graph (RDF) data, as SQL is for tables; it turns a traceability or audit question — "show me everything about this batch" — into a one-line query, the executable form of a competency question. (See The Lifecycle of a Data Point.)
SPC (Statistical Process Control) — using statistics to tell ordinary background variation (common cause) from a real, assignable change (special cause), plotted on a control chart with statistical limits; control limits describe how the process behaves, distinct from the specification limits it must meet. (See SPC, MVDA, and CPV.)
Specification — the agreed pass/fail limits a result must meet, set using analytical procedures developed under ICH Q14 and validated under ICH Q2(R2); the formal definition of "good enough to release." (See Plant Information Systems.)
Titer — the concentration of product (here, antibody) the cells have made, in grams per litre; the headline measure of how productive a culture is. (See Instruments and Sensors.)
UNS (Unified Namespace) — a single, organized, real-time map of the whole plant where every piece of data lives at a meaningful address, and every system publishes to it and subscribes to what it needs instead of wiring point-to-point. (See Architecture and Integration: ISA-95.)
21 CFR Part 11 / EU Annex 11 — the US rule and its EU counterpart that make electronic records and electronic signatures the legal equals of paper and ink, under specified controls (audit trails, access control, validation); Annex 11 is being modernized for networked, cloud, and AI/ML systems. (See 21 CFR Part 11 and EU Annex 11.)
If a term here still feels fuzzy, follow it back into the chapter where it lives, and it will make far more sense in context.