Glossary

📍 Quick reference: This is your pocket dictionary for the whole book. Bookmark it and come back anytime a word — IRI, SHACL, continuant, competency question — stops making sense.

Building ontologies has its own vocabulary, layered on top of the manufacturing terms it models. Here are the most important terms from this book, in plain words, listed alphabetically so they are easy to find. Each entry is a plain-language starting point; the chapter it points to gives the full, precise picture.

affectsQuality — The Quality-by-Design link, written as one edge in the graph: a process parameter (such as feed rate) affects a quality attribute (such as monomer purity). It turns a finding buried in a development report into a fact you can query, and carries its evidence — the operating ranges and the study that proved it — alongside. (See Relations and genealogy.)

Alignment — Declaring that a local term is a more specific kind of a term in a shared public ontology, by an rdfs:subClassOf edge up the stack, so a partner's tool recognizes your class without your having to invent a bridge. The book keeps all such edges in one file, align.ttl. (See Reuse survey and alignment.)

Allotrope (AFO / ASM) — A formal ontology and data formats for analytical-lab data, organized into Equipment, Material, Process, and Result. AFO is the ontology; ASM (the Allotrope Simple Model) is its lightweight JSON form, increasingly piped into knowledge graphs. (See The ontologies in use.)

Axiom — A standing rule written into the model that lets a machine check and extend it, not just label it — for example, that derivedFrom is transitive, or that two classes can never overlap. Axioms are what turn a list of nouns into an ontology a reasoner can act on. (See Axioms and restrictions.)

B2MML — A royalty-free XML (and JSON) format for exchanging batch recipes and records between systems, the on-the-wire serialization of the ISA-95 / ISA-88 manufacturing standards; widely used as the de facto exchange layer between plant systems. (See From the wire to the graph.)

BFO (Basic Formal Ontology) — The small, domain-neutral upper ontology, published as an international standard (ISO/IEC 21838-2), that science and industry have largely settled on. It contains no Bioreactor and no Antibody, only the most general categories everything falls under, so that ontologies built on it fit together. (See The upper spine.)

Class — A category of thing in the model, such as Batch, Bioreactor, or DrugSubstance. Every class is placed under one of BFO's top categories before it is defined, which is what makes it checkable. (See Classes and taxonomy.)

Closed-world assumption — The rule SHACL follows: whatever is not present in the data is treated as false (missing, a failure). It is the opposite of OWL's open world, and it is exactly what a release decision needs — a missing sterility test is a failed lot, not an open question. (See The release gate and SHACL.)

Competency question (CQ) — A plain-English question, written before any modeling, that the finished ontology must be able to answer ("when a lot fails, which other products share its lineage?"). It is the unit of requirement and, in this book, also the unit of test. There are 23, numbered CQ-01 to CQ-23. (See Specification and ORSD.)

contains — The transitive packing hierarchy (a carton contains vials, a case contains cartons, a pallet contains cases), kept deliberately separate from derivedFrom, because what a vial is packed inside now is mutable, while what it was made from is permanent. Confusing the two would break a recall trace. (See Relations and genealogy.)

Continuant — In BFO, a thing that persists through time as a whole, present in full at every instant it exists: a cell, a bioreactor, a vial, a batch, a purity, a role. The opposite of an occurrent. (See The upper spine.)

CPP (Critical Process Parameter) — A setting you control on the line — such as culture temperature or feed rate — whose variation has a demonstrated effect on product quality, so it is held within a proven range. The model links it by affectsQuality to the attribute it moves. (See Specification and ORSD.)

CQA (Critical Quality Attribute) — A measured property of the product — such as monomer purity or aggregate level — that must stay within its limits for the medicine to be safe and to work. The release gate checks a panel of these. (See The release gate and SHACL.)

Datatype property — A relation that links a thing to a literal value you can read (such as monomerPct to the number 98.611), as opposed to an object property, which links to another thing you can walk. The fork between them decides which one you author. (See Relations and genealogy.)

Deprecate, don't delete — The governance rule that an identifier, once issued, is never removed but marked obsolete and pointed at its replacement, so historical records stay interpretable. (See Governance and change.)

derivedFrom — The single most valuable relation in the book: the transitive genealogy edge that roots every material in the working cell bank. Only the immediate parent links are stated; because it is transitive, a reasoner infers the whole chain, so one query walks a drug-substance lot back to its frozen origin. (See Relations and genealogy.)

Disjointness — A declared rule that two categories can never overlap, so an entity typed as both is a flagged error — a batch typed also as a bioreactor, or a continuant typed as an occurrent. The guards that protect traceability. (See Axioms and restrictions.)

Disposition — A realizable entity in BFO: a real tendency a thing bears whether or not it is being measured — a resin's tendency to bind antibody, a molecule's tendency to aggregate — distinct from the assay result that is its evidence. (See Classes and taxonomy.)

Drug product (DP) / Drug substance (DS) — In the running example, the drug substance (DS-001) is the purified bulk antibody where eleven ancestors converge and release is decided; the drug products (DP-001, DP-002) are the filled vials made from it. (See Instances and the graph.)

Executable ORSD — This book's central move: the competency questions are not left as prose but compiled into a runnable artifact (cq-catalog.json), so the requirements document and the test suite are the same file and can never silently drift apart. (See Specification and ORSD.)

FAIR — The principle that data be Findable, Accessible, Interoperable, and Reusable. Global identifiers make data Findable; shared vocabularies and unit-bearing values make it Interoperable. Note that FAIR is not the same as open — restricted regulated data can still be fully FAIR. (See Publication and FAIR.)

Functional property — A relation that may hold at most one value per subject (a cell line is created by exactly one transfection; a bank has exactly one host). If two values are asserted, a reasoner concludes they must be the same thing — a built-in way to merge a double-entered record. (See Axioms and restrictions.)

GraphRAG — Retrieval-augmented generation whose trusted store is a knowledge graph, so a model answers by walking typed edges (such as derivedFrom) and citing them, rather than guessing from training memory. The retrieval step is the very same lineage query the harness already runs. (See Ontologies and AI.)

Ground truth (for AI) — The verified, curated facts a learning or retrieval system is anchored to. In this book it is the reasoned, SHACL-gated knowledge graph, which supplies the substance a fluent model cannot supply for itself. (See Ontologies and AI.)

Grouped / leave-one-batch-out cross-validation — The honest way to evaluate a model that learns over campaign data: every record of one batch goes wholly to training or to test, never split row-wise, because sibling lots off one cell bank are near-twins, not independent samples. The derivedFrom lineage supplies the grouping key. (See Classes and taxonomy.)

Individual / instance — A specific named thing in the graph (WCB-CHO-001, DS-001), as opposed to the class it belongs to. Filling the classes with individuals is what turns a vocabulary into a manufacturing record you can query. (See Instances and the graph.)

IOF (Industrial Ontologies Foundry) — The manufacturing-side counterpart of the OBO Foundry. Its IOF Core is a BFO-grounded mid-level ontology supplying shared concepts (equipment, material, process), and its biopharma module adds unit-operation, equipment, and Quality-by-Design terms the running example reuses. (See The upper spine.)

IRI (Internationalized Resource Identifier) — A globally unique web name, like a URL, that RDF gives every subject, predicate, and resource object. Unlike a local database key, it means the same thing across systems and sites, so two systems agree on meaning by pointing at the same IRI. (See Identifiers and units.)

ISA-95 / ISA-88 — The standards that model the plant: ISA-95 (IEC 62264) keeps material lot, equipment, and activity as distinct objects, and ISA-88 (IEC 61512) structures a batch recipe into procedures, operations, and phases. The ontology re-expresses these distinctions on the BFO spine. (See Classes and taxonomy.)

ISO IDMP — The ISO family of standards for the Identification of Medicinal Products, giving a substance or product a machine-readable regulatory identity (such as a UNII code). The book attaches it to the very same node the release gate validated. (See Regulatory semantics.)

Knowledge graph — A store of facts as typed links — node, edge, node — that you can walk and query, for example lot — derivedFrom — batch. The whole dataset of the running example is one such graph. (See Ontologies and AI.)

LOT (Linked Open Terms) — An industry-oriented methodology for the back end of an ontology's life: requirements, implementation, publication, versioning, and FAIR release of a reusable vocabulary. (See Publication and FAIR.)

LRV (Log Reduction Value) — A measure of how far a purification step lowers virus levels, in powers of ten; because the values are logs, two independent (orthogonal) steps' values add — 4.5 plus 4.2 gives a total clearance of 8.7. Modeled as a validated capability of the step, not a per-batch measurement. (See Competency questions as queries.)

NeOn — A scenario-based ontology-engineering methodology that frames building as flexible phases (specification, reuse, conceptualization, and so on) rather than one rigid waterfall; the source of the ORSD this book is built around. (See Specification and ORSD.)

NOR / PAR — The normal operating range (the tighter window a parameter is held to day-to-day) sitting inside the proven acceptable range (the wider region demonstrated to still yield acceptable product). Both are modeled as typed ranges rather than prose. (See Relations and genealogy.)

OBO Foundry — The life-sciences community whose coordinated, principle-based biomedical ontologies (such as the Gene Ontology and Protein Ontology) interlock instead of overlap because they share an upper grounding and design rules; it inspired the industrial equivalent, IOF. (See The upper spine.)

Object property — A relation that links a thing to another thing — an edge you can walk, like derivedFrom back to the cell bank — as opposed to a datatype property, which links to a literal value. (See Relations and genealogy.)

occursIn — The object property tying a process to the persisting equipment it ran in (the culture run to vessel BR-101), used instead of wrongly re-typing the batch material as the vessel. It keeps the run, the batch, and the vessel three separate things. (See Relations and genealogy.)

Occurrent — In BFO, a thing that happens and unfolds in time, never present all at once because it has temporal parts: a cell-culture run, a capture step, the whole campaign. The opposite of a continuant. (See The upper spine.)

Open-world assumption — The rule OWL reasoning follows: whatever you have not stated is treated as merely not yet known, never as false. This is why a missing required result reads as "unknown" to a reasoner, and why a closed-world SHACL gate is needed to catch the gap. (See The release gate and SHACL.)

ORSD (Ontology Requirements Specification Document) — The short brief, agreed before modeling, that fixes the ontology's purpose, scope, users, intended uses, functional requirements (the competency-question catalog), non-functional requirements, and a pre-glossary. (See Specification and ORSD.)

owl:sameAs — The strong assertion that two identifiers denote the identical individual, fusing all their facts in both directions. Powerful and easily misused; the book deliberately asserts none, reconciling differently-named records with attributed claims instead. (See Identifiers and units.)

OWL (Web Ontology Language) — The standard logic language the model's classes, relations, and axioms are written in, layered on RDF. A reasoner uses OWL to derive new facts that follow from the ones written down. (See Classes and taxonomy.)

OWL-RL closure — The reasoning step that spells out facts already implied by the written-down ones — every transitive ancestor, every equipment item tagged a material entity — before the queries run; this is why the running example's triple count grows from 2120 to 7137. (See The running example.)

Passage number / passage limit — How many times a living culture has been split and regrown, which tracks how long the cells have been propagated. A validated passage limit (here 40) bounds it, because cells drift as they divide; the seed train accumulates passage forward from the bank. (See Instances and the graph.)

Profile limit (transitivity) — The honest constraint that because derivedFrom is transitive, OWL 2 DL forbids also declaring it acyclic — so "nothing is its own ancestor" is enforced by SHACL and convention, not by the logic. Named openly rather than faked. (See Axioms and restrictions.)

PROV-based reconciliation — Resolving records that name one thing differently not by fusing identifiers but by keeping each source's assertion as an attributed claim and recording a steward's curation decision as a prov:Activity; keeps the audit trail and avoids an over-merge. (See Identifiers and units.)

Proof harness (validate.py) — The small program that parses the dataset into one graph, reasons over it, and runs all 23 competency questions as pass/fail acceptance tests, exiting cleanly only if every one passes. The inspector that keeps the model honest. (See The running example.)

QUDT — The Quantities, Units, Dimensions and Types vocabulary, which lets a value carry its unit and its quantity kind as machine-readable identifiers rather than a string suffix, so 98.611 unambiguously means 98.611 percent. (See Identifiers and units.)

RDF (Resource Description Framework) — The graph data model the whole dataset is expressed in: every fact is a subject-predicate-object link, so the dataset is a web of connected nodes. (See The running example.)

Reasoner — Inference software that derives new facts implied by the model (for example, the transitive lineage edges of derivedFrom) rather than only reading the facts asserted by hand. (See Specification and ORSD.)

Relation Ontology (RO) — The biomedical community's effort to define relations (part of, participates in, derives from) as carefully as classes, so that two ontologies do not use one phrase to mean two things. The book's derivedFrom aligns up to RO's derives from. (See The upper spine.)

Restriction — A necessary condition written on a class that a reasoner can enforce — for example, that every cell line is the output of some transfection, or that a working cell bank bears at least one characterization result. (See Axioms and restrictions.)

Running example — The single CHO monoclonal-antibody campaign every chapter models, from the working cell bank WCB-CHO-001 to the filled vials DP-001/DP-002, with a deliberately out-of-spec sibling DP-004 that exercises the impact and release questions. (See The running example.)

SAMOD — The Simplified Agile Methodology for Ontology Development: a small, test-first loop — write the competency question, model the slice, test it against the data, refactor — repeated until the model is whole. The rhythm of every modeling chapter. (See Specification and ORSD.)

SHACL (Shapes Constraint Language) — The rule language that validates a graph against required shapes, asking "is anything required missing or out of spec?" — a closed-world question. The release, finish, and cell-bank gates are SHACL shapes. (See The release gate and SHACL.)

SPARQL — The query language for RDF graphs, the graph equivalent of SQL. A property path like (derivedFrom)+ walks a relationship one or more hops, so one query reconstructs a whole lineage. (See Competency questions as queries.)

Subsumption (subClassOf) — The honest, weaker claim that one class is a kind of another (a Batch is a kind of material artifact), rather than that the two are equivalent; asserting equivalence would over-commit and force a reasoner into false conclusions. (See The upper spine.)

Taxonomy — The tree of classes arranged by subsumption, with each class hanging from one BFO top category. A taxonomy of well-named classes is still only a taxonomy until relations and axioms wire it into a graph. (See Classes and taxonomy.)

Triple — One subject-predicate-object fact (DS-001 — derivedFrom — POLpool-001); the atomic unit of an RDF graph and the thing the harness counts. (See The running example.)

Turtle (.ttl) — The human-readable text syntax for writing an RDF graph; a means "is a," a semicolon continues another statement about the same subject, and a prefix like bp: is shorthand for a long web address. (See The running example.)

UCUM (Unified Code for Units of Measure) — A set of case-sensitive unit codes (%, Cel, g/L) designed to be unambiguous across software, the unit grammar embedded in clinical and laboratory data exchange and emitted on the wire by historians and OPC UA servers. (See Identifiers and units.)

Upper (foundational) ontology — A small, domain-neutral vocabulary of the most general categories — things that persist, things that happen, qualities, roles — that every domain term slots into, so that ontologies built by different teams stay compatible. BFO is the one this book uses. (See The upper spine.)

Validation paradox — The asymmetry that a learned model is checked against held-out data and can be confidently wrong off-distribution, whereas a reasoned, shape-validated graph is checked against axioms and a category conflation is caught with certainty — which is why the ontology grounds the AI rather than the reverse. (See Classes and taxonomy.)

If a term here still feels fuzzy, follow it back into the chapter where it lives, and it will make far more sense in context.