The Batch & Equipment Data Model: ISA-88/95 in PostgreSQL

📍 Where we are: Part I · The Blueprint, Chapter 4. The stack is running (Chapter 2); now we give it a spine — the relational batch and equipment model (the tables describing every machine and recipe step) that every later number hangs from.

In Chapter 2 you ran make up and watched a CHO (Chinese Hamster Ovary cell) bioreactor simulator start pushing numbers into a historian (the time-series database that stores every sensor reading — not a person). Those numbers are real, but right now they are orphans: a value of 37.05 (degrees Celsius — a CHO culture is held near 37 degC, body temperature, right on its 37.0 degC setpoint) with a tag of BR101.Temp.PV (BR101 is the equipment, Temp the measurement, and PV the process value — the actual reading, as opposed to the setpoint) and a timestamp. Which batch was that? Which piece of equipment? Which step of the recipe was running at that instant? Without answers, you have telemetry, not a record.

This chapter builds the answers. We model the process the way the automation world already agrees to model it — ISA-88 for the recipe and ISA-95 for the equipment — and we do it in plain PostgreSQL, in about a hundred lines of SQL you can read in one sitting.

The simple version

Think of a theatre. ISA-95 is the building: the company owns several theatres (sites), each theatre has rooms (areas), each room has a stage (a unit like the bioreactor BR101). ISA-88 is the script: a play (recipe) has acts (operations) and scenes (phases), performed in order. A batch is one night's performance — a specific cast on a specific stage running a specific script. Our database stores the building, the script, and a log of exactly what happened on which stage, on which night. Get those three right and every sensor reading suddenly knows where it belongs.

What this chapter covers

Why two standards — ISA-88 and ISA-95 — and how they divide the world between them.
The PostgreSQL schema for the equipment hierarchy (enterprise → site → area → unit) and the procedural model (recipe → operation → phase), drawn straight from the companion repo.
The batch itself, its genealogy (the material family tree — seed culture → bioreactor → capture pool → drug substance → drug product, where drug substance is the purified antibody in bulk and drug product is the filled, finished vial), and the normalized-vs-JSONB tradeoff for recipe parameters.
Seeding one real fed-batch CHO + Protein A line — one production process making a single monoclonal antibody from Chinese-hamster-ovary cells (the physical steps are Book 1's subject; here it is the worked example the rest of this book reuses).
How a sensor reading finally meets its batch — and the test that proves it.
Where this model is honest open source and where the GMP record forces hard choices.

Two standards, one spine

Batch manufacturing has two interlocking ANSI/ISA standards — published documents agreed by the International Society of Automation (ISA) and ratified as US national standards by ANSI — both also published internationally as IEC (International Electrotechnical Commission) standards, and it helps enormously to keep them in separate mental boxes.

ISA-88 (ANSI/ISA-88.00.01, also IEC 61512-1) describes how a batch is made — the procedural side. It gives us a clean nesting: a recipe decomposes into a procedure, which decomposes into unit procedures, then operations, then phases, the smallest meaningful step [1]. Each tier is simply a coarser-to-finer grouping of the one below — like chapters → sections → paragraphs → sentences — with the phase the indivisible action at the bottom. "Add Feed A at 50 mL/min for 30 minutes" is a phase. ISA-88 deliberately separates this procedural logic from the physical equipment, so the same recipe can run on different reactors.

ISA-95 (ANSI/ISA-95.00.01, also IEC 62264-1) describes where it is made — the physical and organizational hierarchy that integrates the plant floor with the business: enterprise → site → area → unit [2]. (ISA-95 also defines a work-center / process-cell tier between area and unit; we collapse it here — that is, we simply leave that intermediate tier out, which is safe because a single-product line has only one process cell, so the tier would carry no extra information — just as we collapse ISA-88's deeper procedural nesting below.) The ISO/IEC catalog entry for IEC 62264-1:2013 is the authoritative source for these object models, and it is what we normalize into tables [3].

The two standards meet at the unit. ISA-88 says a phase runs on a unit; ISA-95 says what a unit is. The academic literature has long noted that the terminology between the two overlaps and occasionally clashes, and that a reconciled, formalized entity model is needed to bridge them cleanly — which is exactly the modeling decision we are about to make [4].

We do not need the full B2MML object graph — the complete web of interlinked objects (equipment, recipes, segments and their relationships) that B2MML, the XML data format introduced next, defines. It is large, and most of it is for cross-enterprise messaging, not storage. B2MML/BatchML is the royalty-free XML schema implementation of ISA-95 and ISA-88, maintained by MESA International, and it is the right reference when you must exchange a recipe or batch record with another company's system [5]. For our internal store we borrow its entities — Equipment, Recipe, Process Segment — but flatten its deeply recursive procedural tree (a tree being a structure where each item can nest more items beneath it, to any depth) into a far simpler parent-FK + seq_no pattern: every row stores a foreign key (FK) naming its parent row — a column whose value must point at an existing row in another table — plus a seq_no (sequence number) that fixes its order among siblings. That single decision is what keeps the schema readable.

The equipment hierarchy in SQL

Everything lives in one PostgreSQL database. Because Chapter 2's image is timescale/timescaledb (PostgreSQL with the TimescaleDB extension), the historian hypertable (a hypertable is TimescaleDB's name for a large time-series table it automatically splits into time-chunked pieces, but which you still query as one ordinary table) and this relational model (data held as tables of rows and columns, the equipment and recipe tables related to one another by foreign keys — the classic relational-database shape) share one database and one transaction boundary — meaning a single write can update both atomically, all-or-nothing: if the sensor write succeeds but the batch update fails, neither is kept, so the historian and the batch model can never end up half-written and drift out of sync — which is exactly the risk you take on the moment they live in two separate databases that have to be kept in step by hand — so there are no cross-database joins and no second connection to keep in sync.

The schemas are created first, one per concern, in examples/platform/db/00-init.sql (here a PostgreSQL schema is a namespace — a named folder that groups related tables, like s88.unit — not a table design):

-- examples/platform/db/00-init.sql
CREATE EXTENSION IF NOT EXISTS timescaledb;
CREATE EXTENSION IF NOT EXISTS pgcrypto;     -- digest() for the ALCOA+ hash chain

-- One schema per concern, mirroring the book's chapters.
CREATE SCHEMA IF NOT EXISTS s88;    -- ISA-88/95 batch + equipment model   (Ch 4)
CREATE SCHEMA IF NOT EXISTS ts;     -- time-series historian (hypertable)   (Ch 16)
CREATE SCHEMA IF NOT EXISTS lab;    -- samples, tests, results              (Ch 10/14)
CREATE SCHEMA IF NOT EXISTS events; -- operation events / equipment states  (Ch 9/13/15)
CREATE SCHEMA IF NOT EXISTS audit;  -- system-versioned history + hash chain(Ch 23/24)
CREATE SCHEMA IF NOT EXISTS gov;    -- tag dictionary, jurisdictions, suppliers (Ch 5/25/26)

(The pgcrypto extension and the audit schema's hash chain named in those comments are there for ALCOA+ — the data-integrity properties regulators expect of a GMP record, defined in full in Key terms below; we build that audit log in Chapter 23.) The schema name s88 is a small nod to the batch world's habit of calling these standards "S88" and "S95." This chapter owns s88; later chapters fill in their own. The physical hierarchy itself is four tables, each pointing at its parent, in examples/platform/db/10-isa88-95.sql:

-- examples/platform/db/10-isa88-95.sql  (ISA-95 equipment hierarchy)
CREATE TABLE s88.enterprise (
    enterprise_id text PRIMARY KEY,
    name          text NOT NULL
);

CREATE TABLE s88.site (
    site_id       text PRIMARY KEY,
    enterprise_id text NOT NULL REFERENCES s88.enterprise,
    name          text NOT NULL,
    country       text NOT NULL DEFAULT 'US'
);

CREATE TABLE s88.area (
    area_id text PRIMARY KEY,
    site_id text NOT NULL REFERENCES s88.site,
    name    text NOT NULL
);

CREATE TABLE s88.unit (                       -- the equipment a phase runs on
    unit_id   text PRIMARY KEY,               -- e.g. BR101
    area_id   text NOT NULL REFERENCES s88.area,
    name      text NOT NULL,
    unit_type text NOT NULL,                  -- bioreactor | chromatography | tff | fill_line ...
    vendor    text,
    model     text
);

Notice how unremarkable this is. There is no clever inheritance, no entity-attribute-value table (the EAV anti-pattern, where every fact is crammed into generic (name, value) rows that lose all type-checking — explained more below), no XML column. The REFERENCES s88.enterprise (and REFERENCES s88.site, REFERENCES s88.area) clauses are foreign keys: each child row must point at an existing parent row, so a site cannot exist without an enterprise to belong to — that is how the four tables snap together into the hierarchy. Each level has a stable text primary key — its unique row identifier, here a readable code like BR101 rather than an opaque, meaningless integer — because these identifiers are the same ones the operators, the SCADA (the Supervisory Control And Data Acquisition system that runs the plant floor), and the batch record already use. They are business keys — identifiers that carry real-world meaning — so using them as primary keys means a human can read a row and know what it refers to without a second query to look the code up. The unit_type column is the hinge to ISA-88: a phase declares the type of unit it needs (bioreactor), and the batch binds it — connects that abstract requirement to a specific unit (BR101).

The procedural model: recipe, operation, phase

Now the script. ISA-88's full nesting is recipe → procedure → unit procedure → operation → phase, but for a single-product mAb line that depth is mostly ceremony. We collapse it to recipe → operation → phase, each child carrying a seq_no so order is data, not table position — rows in a relational table have no inherent order, so the step sequence has to be stored in an explicit column rather than relied on from how rows happen to sit on disk — again in examples/platform/db/10-isa88-95.sql:

-- examples/platform/db/10-isa88-95.sql  (ISA-88 recipe / procedure)
CREATE TABLE s88.recipe (
    recipe_id   text PRIMARY KEY,
    product_id  text NOT NULL,
    name        text NOT NULL,
    version     int  NOT NULL DEFAULT 1
);

CREATE TABLE s88.operation (                  -- an ordered step of the recipe
    operation_id text PRIMARY KEY,
    recipe_id    text NOT NULL REFERENCES s88.recipe,
    seq_no       int  NOT NULL,
    name         text NOT NULL,               -- Inoculation | Fed-batch | Harvest | ProteinA ...
    unit_type    text NOT NULL
);

CREATE TABLE s88.phase (                       -- the smallest procedural element
    phase_id     text PRIMARY KEY,
    operation_id text NOT NULL REFERENCES s88.operation,
    seq_no       int  NOT NULL,
    name         text NOT NULL
);

This parent-FK-plus-seq_no shape is the whole simplification the chapter promises. A recipe's full procedural graph becomes two ordinary one-to-many joins. Reordering the steps is an UPDATE to seq_no, not a schema migration. And because operation.unit_type matches unit.unit_type, the model already knows that the ProteinA operation belongs on a chromatography unit, not the bioreactor — a constraint we can enforce or validate later.

Normalized vs JSONB — and where each wins

Recipes carry parameters: setpoints, durations, tolerances. Here the book makes a deliberate, slightly opinionated choice. The handful of parameters that are queried, trended, or compared across batches — temperature setpoint, pH setpoint, dissolved-oxygen setpoint — get their own typed, normalized table — "normalized" meaning each fact lives in exactly one place, as its own typed column, rather than being buried inside a larger blob — and that table is effective-dated so Chapter 27 can version a recipe in place (change the recipe while keeping every prior value, by adding a new dated row rather than overwriting the old one) without destroying history:

-- examples/platform/db/10-isa88-95.sql  (effective-dated recipe parameters)
CREATE TABLE s88.recipe_parameter (
    recipe_id  text NOT NULL REFERENCES s88.recipe,
    name       text NOT NULL,
    value      numeric NOT NULL,
    unit       text NOT NULL,
    valid_from timestamptz NOT NULL DEFAULT now(),
    valid_to   timestamptz NOT NULL DEFAULT 'infinity',
    PRIMARY KEY (recipe_id, name, valid_from)
);

The valid_from/valid_to pair is a classic effective-dated (valid-time) trick: "what was the temperature setpoint as of the day BATCH-2026-004 started?" is a WHERE 'date' BETWEEN valid_from AND valid_to query, and old values are never overwritten. (This tracks one of the two time axes a database can record — valid time, meaning the period in the real world during which a value was true. The second axis, transaction time — when the change was actually entered into the system — is supplied separately by the Chapter 23 ALCOA+ audit log. Carrying just the first makes the pair effective-dated; carrying both at once would make it fully bitemporal.) That matters because a recipe change is a controlled, audited event in a GMP shop — you are not allowed to silently forget what the old setpoint was.

So where does JSONB come in? For the long tail of loosely structured, rarely queried attributes — vendor-specific phase options, free-form notes, a nested table of bolus-feed times — a jsonb column is the honest answer (JSON — JavaScript Object Notation — is a common text format for nested name/value data, like a structured note; jsonb is PostgreSQL's efficient binary form of it) instead of forcing fifty sparse columns (a table where most columns are empty on most rows) or an entity-attribute-value swamp (the EAV anti-pattern, where you store "one fact per row" as a (name, value) pair and lose all type-checking). PostgreSQL's jsonb type stores parsed binary JSON and supports GIN indexing (an index is a side structure the database keeps so it can find matching rows without scanning every one; GIN is the index kind suited to searching inside JSON documents) so even those documents stay queryable when you need them [6]. Which of the two homes a given attribute belongs in is a question with a crisp answer, and we promote it to a rule of thumb in its own section below.

A layered diagram showing the ISA-95 physical equipment hierarchy on the left (enterprise, site, area, unit) and the ISA-88 procedural model on the right (recipe, operation, phase), joined in the middle by a batch row that binds a recipe to a unit, with lot-genealogy edges flowing from seed bioreactor through Protein A pool to drug substance.

The two ISA standards meet at the batch: ISA-95 says where (BR101 in Newark Upstream), ISA-88 says how (the Fed-batch CHO mAb recipe), and the batch row binds them to one manufacturing run, with genealogy edges tracing material from seed train to drug substance and drug product.

Original diagram by the authors, created with AI assistance.

The batch — and its family tree

A batch is one manufacturing run: a specific recipe, on a specific unit, with a lot number and a status. Two more tables capture when each phase actually ran and the material genealogy, all in examples/platform/db/10-isa88-95.sql:

-- examples/platform/db/10-isa88-95.sql  (the batch and its genealogy)
CREATE TABLE s88.batch (
    batch_id  text PRIMARY KEY,
    product_id text NOT NULL,
    recipe_id text NOT NULL REFERENCES s88.recipe,
    unit_id   text NOT NULL REFERENCES s88.unit,
    lot       text,
    status    text NOT NULL DEFAULT 'in_progress',  -- in_progress | complete | released | rejected
    start_ts  timestamptz NOT NULL,
    end_ts    timestamptz
);

CREATE TABLE s88.batch_phase (                 -- when each phase actually ran for a batch
    batch_id     text NOT NULL REFERENCES s88.batch,
    phase_id     text NOT NULL REFERENCES s88.phase,
    unit_id      text NOT NULL REFERENCES s88.unit,
    start_ts     timestamptz NOT NULL,
    end_ts       timestamptz,
    PRIMARY KEY (batch_id, phase_id)
);

-- lot genealogy: directed edges child -> parent (seed -> bioreactor -> pool -> DS -> DP)
CREATE TABLE s88.genealogy (
    batch_id    text REFERENCES s88.batch,
    child       text NOT NULL,
    child_type  text NOT NULL,
    parent      text NOT NULL,
    parent_type text NOT NULL,
    PRIMARY KEY (child, parent)
);

The split between phase and batch_phase is the difference between plan and actuals: phase says the recipe has a Growth phase; batch_phase records that for BATCH-2026-001, Growth ran from noon on Jan 5 to midnight on Jan 12. That table of actuals is what later turns a raw timestamp into "this reading happened during Growth."

Anatomy of a batch record: one s88.batch row

The batch table is the whole chapter in miniature, so it rewards reading one row field by field rather than as a CREATE TABLE block. Below is the golden batch's actual seeded row — BATCH-2026-001 from seed_cho_line.sql — dissected column by column. Notice how little of it is data in the ordinary sense: most columns are either a readable business key or a foreign key that binds this run to a thing defined elsewhere. A batch row does not describe the recipe or the equipment; it points at them, and that pointing is what makes the record both compact and auditable.

Identity card dissecting one row of the s88.batch table for BATCH-2026-001, listing each column — batch_id, product_id, recipe_id, lot, status, start_ts, end_ts — with its meaning, a highlighted unit_id row marking the ISA-88 / ISA-95 join, and a panel decoding the foreign-key edges to s88.recipe and s88.unit.

One row of s88.batch: the readable business key BATCH-2026-001, a recipe foreign key that pins the procedure to a version, the unit_id hinge to the equipment, and a status that decides whether the lot may ship.

Original diagram by the authors, created with AI assistance.

Where this row comes from

This s88.batch row is the code that closes the trilogy's loop. The manufacturing run it records is the physical journey Book 1 walks end to end in Bioprocessing Overview — seed train, production bioreactor, Protein A capture, fill. The decision to model it as one auditable row that points instead of duplicates is the open problem Book 2 frames as the data shadow, and the genealogy edges below are exactly the lineage Book 2 calls the digital thread. Those chapters pose the data-point; this one is the SQL that makes it real.

Reading the columns in order: batch_id is the PRIMARY KEY, and it is a human-readable business key (BATCH-2026-001) for the same reason unit_id is BR101 — so a row is legible without a join. product_id (MAB-001) records which product the run makes. recipe_id (CHO-MAB-001) is a foreign key into s88.recipe, and because that recipe row carries a version column, the batch is implicitly pinned to a recipe version — the procedure that was in force when the run started, not whatever the recipe looks like today. lot (L26001) is the lot number — a lot being the specific produced quantity of product that is released and shipped together — and it is the number that appears on the certificate of analysis, the quality document listing the test results for that lot. status walks a small controlled vocabulary — in_progress | complete | released | rejected — and it is the column a reviewer reads to decide whether the lot may ship: the golden batch is released, its sibling BATCH-2026-004 is rejected. start_ts and end_ts are timestamptz bookends; end_ts is NULL while a batch is still in_progress, which is also how a query finds the run that is happening right now.

The ISA-88 / ISA-95 hinge: unit_type

The single most load-bearing column in the batch row is unit_id, the green-highlighted field on the card, because it is where the two standards physically meet in one foreign key. unit_id = BR101 resolves through s88.unit to what the equipment is — ISA-95's answer (a bioreactor in the Newark Upstream area). The recipe it runs, reached through recipe_id, supplies how — ISA-88's answer. The same join key therefore carries both halves of the manufacturing fact, which is exactly the reconciliation the standards literature argues a bridging model has to make explicit [4].

The hinge is enforced by type, not just by identity. Recall that s88.operation carries a unit_type and so does s88.unit. The ProteinA operation declares unit_type = 'chromatography'; BR101 is unit_type = 'bioreactor'; PA01 is chromatography. So the model already knows — independent of any batch — that the Protein A capture step belongs on PA01, not on the bioreactor. The batch binds an abstract requirement ("this phase needs a chromatography unit") to a concrete asset ("PA01"), and the matching unit_type strings are the cheap, declarative constraint that keeps the binding honest. This is the difference ISA-88 draws between a recipe (equipment-independent) and its execution on real equipment, expressed as two text columns that happen to compare equal.

The genealogy table is a deceptively small thing with a large regulatory weight. It stores directed child → parent edges, so a drug-product lot can be traced back through drug substance, the Protein A capture pool, the production bioreactor, and the seed train. This is not optional bookkeeping. U.S. cGMP — current Good Manufacturing Practice — requires that a batch production and control record exist for every batch, reproducing the master record [7]; the structure of that record is, quite literally, the skeleton our batch and batch_phase tables encode [8]. And 21 CFR 211.184 — a section of Title 21 of the US Code of Federal Regulations, the FDA's binding rules — requires component and reconciliation records sufficient to trace each finished batch back to the lots of material that went into it, which is precisely what the genealogy edges give you [9]. A self-join or recursive CTE over those edges reconstructs the full lineage on demand — a recursive CTE (Common Table Expression) being a query that follows the edges over and over, hopping from each material to its parent, then that parent's parent, until it reaches the top.

Genealogy: the child-to-parent edges

It is worth dissecting a genealogy row as carefully as we dissected the batch row, because its shape is unusual: it is an edge table, not an entity table. Each row in s88.genealogy is one directed edge — (batch_id, child, child_type, parent, parent_type) — and the primary key is the pair (child, parent), so a given material can have at most one recorded edge to any one direct parent — its more-distant ancestors are reached not by another row but by following these direct edges step by step, which is exactly what the recall walk below does. The lot_genealogy.csv dataset that make load ingests records five edges for the golden batch, and reading them as child → parent traces the run backward from the vial to the cell bank. (Note: make seed creates the s88.genealogy table but leaves it empty; the edges land with make load in Chapter 17, so select * from s88.genealogy returns nothing right after seeding.)

# examples/datasets/lot_genealogy.csv  (the five edges for BATCH-2026-001)
batch_id        child            child_type      parent           parent_type
BATCH-2026-001  SEED-001         seed_train      WCB-CHO-001      wcb
BATCH-2026-001  BATCH-2026-001   bioreactor      SEED-001         seed_train
BATCH-2026-001  PApool-001       capture_pool    BATCH-2026-001   bioreactor
BATCH-2026-001  DS-001           drug_substance  PApool-001       capture_pool
BATCH-2026-001  DP-001           drug_product    DS-001           drug_substance

Material flows forward — working cell bank (the frozen, qualified stock of cells every campaign is started from) → seed train (the staged series of ever-larger cultures that grows enough cells to fill the production bioreactor) → bioreactor → Protein A pool (the purified antibody collected off the Protein A capture column) → drug substance (the purified antibody in bulk) → drug product (the filled, finished vial); the physical steps behind these names are Book 1's subject. The table stores the backward pointer, child to parent, because that is the direction a recall walks. When a regulator asks "lot DP-001 is suspect; what else shares its lineage?", you start at DP-001 and follow parent edges up; when they ask "this cell bank vial was contaminated; which products are downstream?", you follow child edges down. One edge table answers both questions with the same recursive CTE, which is precisely the component-and-reconciliation traceability 21 CFR 211.184 requires [9]. Note one subtlety the data makes concrete: the row BATCH-2026-001 / bioreactor → SEED-001 / seed_train uses the batch id as a material node — the bioreactor harvest is the batch — so the same identifier names both the run and the bulk it produced. We also collapse the multi-stage seed train (vial thaw → shake flask → wave bag → N-3 → N-2 → N-1 — the successive scale-up cultures, numbered backward from the production bioreactor N, so N-1 is the last seed step before it) into a single SEED-001 node here; a production system would record each expansion stage as its own genealogy edge. The downstream side is collapsed the same way: the single PApool-001 → DS-001 edge stands in for the viral-inactivation, polishing chromatography, viral-filtration, and UF/DF (final concentration and buffer exchange, on TFF01) steps that a real process runs between Protein A capture and drug substance — it is in fact UF/DF, not capture, that yields the drug substance — and a production system would record each of those as its own edge too.

Genealogy diagram for BATCH-2026-001: six material nodes in a row — working cell bank, seed train, the batch itself in the bioreactor, Protein A pool, drug substance, drug product — with a dashed forward material-flow arrow above and five violet child-to-parent edges below, each pointing leftward to mark how a recall walks the chain back.

The genealogy of BATCH-2026-001: material flows forward from working cell bank to drug product, but the table stores the backward child-to-parent edges a recall walks.

Original diagram by the authors, created with AI assistance.

The same edge as a triple: where this model meets the graph

The relational genealogy row is one shape of a fact that Book 4 models a second way, and it is worth seeing both at once because the page's central artifact — a directed child → parent edge — is exactly a graph edge wearing a foreign-key costume. The row DS-001 / drug_substance → PApool-001 / capture_pool is, read as a knowledge-graph triple (a single fact written as subject — predicate — object), simply bp:DS-001 bp:derivedFrom bp:PApool-001 in RDF (the Resource Description Framework — the W3C data model where every fact is one such triple). This book builds that graph for real in Semantics and the Digital Thread, loading these same edges into RDFLib and walking them in SPARQL; Book 4's Conceptualization chapter is where derivedFrom is defined — pinned with rdfs:domain bp:Material and rdfs:range bp:Material, and declared an owl:TransitiveProperty so a reasoner infers the full lineage from only the immediate edges we store.

Three things the relational model leaves implicit, the ontology makes explicit, and naming them sharpens what our genealogy table is really promising:

Transitivity becomes an axiom, not a query. Our recursive CTE computes the multi-hop ancestry each time it runs; declaring derivedFrom transitive lets a reasoner entail DS-001 derivedFrom WCB-CHO-001 once, from the same five stored edges. Same edges, the long-range link earned by an axiom instead of re-walked.
The domain/range pins become a SHACL shape. Our child → parent columns are both plain text; nothing in the SQL stops a careless load from pointing a lot's parent at the operator who ran it. Book 4 closes that hole with SHACL (the Shapes Constraint Language — RDF's validation layer), whose shape says a derivedFrom edge must run Material-to-Material, turning that silent corruption into a flagged violation — the graph-side equivalent of a foreign-key constraint our text columns cannot express.
The recall walk becomes a competency question. "Given a suspect drug-product lot, what does it derive from, to any depth?" is, in Book 4's ORSD, a numbered competency question (CQ — a plain-English question the ontology must answer, used as a pass/fail acceptance test) answered by one SPARQL property-path query — bp:DP-001 bp:derivedFrom+ ?ancestor, where the + means "follow this edge one or more hops." The genealogy table and the graph answer the same regulatory question; they differ only in whether the recursion is hand-written SQL or a path operator.

None of this replaces the relational store — the historian join still lives in SQL, and a text-column edge loaded at material-creation time is the cheap, durable record. The ontology is the second projection of the very same edges, and the bridge is deliberate: the genealogy you author here is the genealogy Book 4 reasons over. Map your (child_type, parent_type) strings onto the shared IOF/BFO classes once, and a regulator's lineage question is answerable in either language.

Normalized columns vs JSONB: the rule of thumb

Earlier we made a deliberate choice about where a recipe parameter lives — a typed, normalized, effective-dated column versus a jsonb document. That choice deserves promoting to a rule, because it is the decision most likely to come back to bite a batch record. The rule of thumb the book follows: if you will filter, join, or trend on it, normalize it; if you will only ever read it back whole, JSONB it. A temperature setpoint that Chapter 27 versions and that statistical-process-control charts trend across six campaigns is a normalized recipe_parameter row. A nested table of bolus-feed times that no query ever reaches inside is an honest jsonb column. PostgreSQL's jsonb is not a dumping ground — it stores parsed binary JSON and supports GIN indexing so those documents stay queryable when you genuinely need them [6] — but burying a critical, queryable setpoint in JSONB "to save a migration" is exactly how a batch record becomes un-reviewable, because the reviewer can no longer see, trend, or constrain the value that matters.

When the genealogy edge is missing: a field-failure note

The reason genealogy is a first-class table from the schema's first migration — rather than something reconstructed after the fact — is that broken traceability is one of the most consequential data-integrity failures in GMP manufacturing. The canonical A-Mab case study — an industry-published worked example of a fictional monoclonal antibody (mAb) process, widely used as a shared reference — treats end-to-end lot traceability as a baseline expectation for a CHO-derived monoclonal antibody process, not an optional feature [10]. When the edge is absent or wrong, the failure mode is brutal: a contaminated working cell bank vial cannot be tied to the downstream lots that consumed it, so a recall that should touch one lineage instead implicates every campaign that drew from that working cell bank, or — worse — a suspect lot ships because no one could prove which finished products descended from it. 21 CFR 211.184 exists precisely because reconstructing this lineage from paper after an excursion — a measured value straying outside its acceptable range — is slow and error-prone [9]. Storing the edge at the moment the material is created — make load writes the lot_genealogy.csv rows the same way the historian and lab data land — means the answer is a query, not an investigation.

Seeding one real line

Schema without data is an empty theatre. The seed in examples/platform/db/seed/seed_cho_line.sql stands up the equipment hierarchy, recipe, batches, and phase windows for the exact fed-batch CHO + Protein A line the entire book reuses — a process modeled on the canonical A-Mab case study, the industry's shared reference for a CHO-derived monoclonal antibody made with a Protein A capture step [10]. (The genealogy edges are not in this seed; they are loaded later by make load, alongside the historian and lab data, from lot_genealogy.csv — see Chapter 17.) The equipment, recipe, and batches load like this:

-- examples/platform/db/seed/seed_cho_line.sql  (equipment + recipe)
INSERT INTO s88.unit VALUES
    ('BR101',         'UPSTREAM',   'Production Bioreactor 101', 'bioreactor',     'Sartorius', 'Biostat STR 50'),
    ('N1SEED',        'UPSTREAM',   'N-1 Seed Bioreactor',       'bioreactor',     'Sartorius', 'Biostat STR 10'),
    ('PA01',          'DOWNSTREAM', 'Protein A Capture Skid',    'chromatography', 'Cytiva',    'AKTA process'),
    ('TFF01',         'DOWNSTREAM', 'UF/DF Skid',                'tff',            'Cytiva',    'AKTA flux'),
    ('FILL-LINE-01',  'FILL',       'Aseptic Fill Line',         'fill_line',      'Bausch+Stroebel', 'KSF')
    ON CONFLICT DO NOTHING;

INSERT INTO s88.operation VALUES
    ('OP1', 'CHO-MAB-001', 1, 'Inoculation', 'bioreactor'),
    ('OP2', 'CHO-MAB-001', 2, 'Fed-batch',   'bioreactor'),
    ('OP3', 'CHO-MAB-001', 3, 'Harvest',     'bioreactor'),
    ('OP4', 'CHO-MAB-001', 4, 'ProteinA',    'chromatography') ON CONFLICT DO NOTHING;

INSERT INTO s88.phase VALUES
    ('PH1', 'OP1', 1, 'Inoculate'),
    ('PH2', 'OP2', 1, 'Growth'),
    ('PH3', 'OP2', 2, 'Production'),
    ('PH4', 'OP3', 1, 'Harvest'),
    ('PH5', 'OP4', 1, 'Capture') ON CONFLICT DO NOTHING;

The last two columns of each s88.unit row are the equipment's real-world vendor and model (for example Cytiva / AKTA process), and unit_type names what the equipment does: a chromatography unit is a skid (a self-contained equipment module on a frame) that purifies the antibody by passing it through a packed column — the Protein A Capture Skid PA01 is the capture step the whole "Protein A line" is named for — and a tff unit performs tangential-flow filtration, here the UF/DF Skid TFF01 that concentrates and buffer-exchanges (ultrafiltration / diafiltration) the bulk into drug substance. What these machines physically do is Book 1's subject; here they are just rows.

Every INSERT ends with ON CONFLICT DO NOTHING so make seed is idempotent — running it twice does not duplicate the line, which matters when you re-seed between chapters. (make seed, like make up, make load, and make test used below, is a command you type at the shell; each runs a named target in the repo's Makefile.) The seed then loads six campaign batches, one of which is deliberately the cautionary tale:

-- examples/platform/db/seed/seed_cho_line.sql  (the six campaign batches; -004 is OOS)
INSERT INTO s88.batch (batch_id, product_id, recipe_id, unit_id, lot, status, start_ts, end_ts) VALUES
    ('BATCH-2026-001', 'MAB-001', 'CHO-MAB-001', 'BR101', 'L26001', 'released', '2026-01-05T00:00:00Z', '2026-01-19T00:00:00Z'),
    ('BATCH-2026-004', 'MAB-001', 'CHO-MAB-001', 'BR101', 'L26004', 'rejected', '2026-01-05T00:00:00Z', '2026-01-19T00:00:00Z'),
    ('BATCH-2026-006', 'MAB-001', 'CHO-MAB-001', 'BR101', 'L26006', 'complete', '2026-01-05T00:00:00Z', '2026-01-19T00:00:00Z')
    ON CONFLICT DO NOTHING;

BATCH-2026-001 is the golden batch the book trends everything against; BATCH-2026-004 carries a deliberate out-of-specification (OOS — a measured result outside the allowed range) excursion and a rejected status, so later chapters have a real failure to detect, investigate, and explain. The remaining batches give statistical-process-control charts something to chew on.

Finally the seed records the phase windows for the golden batch — the actuals that let a timestamp find its phase:

-- examples/platform/db/seed/seed_cho_line.sql  (phase windows for the golden batch)
INSERT INTO s88.batch_phase (batch_id, phase_id, unit_id, start_ts, end_ts) VALUES
    ('BATCH-2026-001', 'PH1', 'BR101', '2026-01-05T00:00:00Z', '2026-01-05T12:00:00Z'),
    ('BATCH-2026-001', 'PH2', 'BR101', '2026-01-05T12:00:00Z', '2026-01-12T00:00:00Z'),
    ('BATCH-2026-001', 'PH3', 'BR101', '2026-01-12T00:00:00Z', '2026-01-18T00:00:00Z'),
    ('BATCH-2026-001', 'PH4', 'BR101', '2026-01-18T00:00:00Z', '2026-01-19T00:00:00Z')
    ON CONFLICT DO NOTHING;

When a sensor reading meets its batch

Here is the payoff. Once the model and seed are in place, the orphan reading from Chapter 2 can be contextualized. The view that does it lives in examples/platform/db/60-views.sql (built in full in Chapter 17, shown here because it is the reason the model exists):

-- examples/platform/db/60-views.sql  (a reading with its full batch + phase context)
CREATE OR REPLACE VIEW s88.v_batch_sensor AS
SELECT r.ts, r.tag, r.value, r.unit, r.quality, r.batch_id,
       b.product_id, b.recipe_id, b.unit_id,
       bp.phase_id, ph.name AS phase_name
FROM ts.sensor_reading r
JOIN s88.batch b              ON b.batch_id = r.batch_id
LEFT JOIN s88.batch_phase bp  ON bp.batch_id = r.batch_id
     AND r.ts >= bp.start_ts AND (bp.end_ts IS NULL OR r.ts < bp.end_ts)
LEFT JOIN s88.phase ph        ON ph.phase_id = bp.phase_id;

The LEFT JOIN ... AND r.ts >= bp.start_ts AND r.ts < bp.end_ts is the time-window join — a join stitches rows from two tables together wherever the ON condition matches, so here each sensor reading is matched to the one batch phase whose start/end window contains its timestamp (a LEFT join keeps the reading even when no phase matches, its phase columns just coming back empty) — that maps each instant to whatever phase was active then. That orphan BR101.Temp.PV = 37.05 degC now reads as a row that knows it was Product MAB-001, Recipe CHO-MAB-001, on unit BR101, during the Growth phase. A bare tag has become knowledge.

We do not just assert that this works — the companion repo proves it. In examples/tests/test_db.py, a pytest run (pytest is the standard Python testing tool; each def test_… below is one automated check) against the live stack — the set of services make up starts in Docker (the PostgreSQL/TimescaleDB database, the MQTT broker, Grafana, and the bioreactor simulator), running for real, not a mock — checks that the hierarchy seeded and that every reading in the golden batch resolves to a named phase:

# examples/tests/test_db.py
def test_schema_and_hypertable(conn):
    assert _scalar(conn, "select count(*) from timescaledb_information.hypertables "
                         "where hypertable_name='sensor_reading'") == 1
    assert _scalar(conn, "select count(*) from s88.batch") >= 6

def test_contextualization_joins_phase(conn):
    # every reading in the golden batch should resolve to a named phase
    rows = _scalar(conn, "select count(distinct phase_name) from s88.v_batch_sensor "
                         "where batch_id='BATCH-2026-001' and phase_name is not null")
    assert rows >= 4   # Inoculate, Growth, Production, Harvest

Run make up && make seed && make load, then make test, and these pass on a laptop and again on a clean CI runner — a fresh machine in a continuous-integration service that rebuilds the whole stack from scratch on every change — which is the book's standing promise that the model is real, not a diagram. The same make test also exercises the audit chain that Chapter 23 layers on top of this schema, but that is a later story.

Why the `batch_id` column is also a machine-learning safeguard

The batch_id that v_batch_sensor stamps onto every reading does a second job that is invisible here but decisive later: it is the grouping key any honest model on this data must split by. Book 5's chapter on models and validation makes the point that a bioprocess dataset is not a bag of independent rows — readings within one batch are heavily correlated, so a naive random train/test split leaks the answer, scattering rows from the same run across both sides and letting a model "predict" a value it has effectively already seen. The fix is grouped cross-validation (validation that holds out whole batches at a time — splitting by batch_id, not by row) and its strict form, leave-one-batch-out: the held-out batch is one the model never touched, so the score is the score you would get on the next campaign. A contextualization model that cannot name which batch a reading belongs to cannot split this way — which is precisely why getting batch_id onto the row, in this chapter, is a prerequisite for any trustworthy model two books downstream.

The same row carries the seeds of three more ML disciplines, each of which fails without the model built here:

Applicability domain. A soft sensor is only trustworthy on inputs that look like its training data; the golden batch's phase windows are what let Book 5 fence that domain per phase, because a Raman model calibrated on Growth has no licence to extrapolate into Harvest.
Process drift versus model drift. When predictions start to wander, the MLOps chapter must tell apart the process genuinely shifting (a new raw-material lot, a scale move) from the model going stale — and both are diagnosed against the batch-and-phase context this table supplies. Without the contextualized record there is no baseline to measure a drift against.
Model lineage. The genealogy and effective-dated recipe_parameter patterns above are the data-side rehearsal for model lineage — a deployed model must record which exact recipe version and which batches it was trained on, the same auditable "points instead of duplicates" discipline, so that a retrained model is a versioned, traceable artifact rather than a silent in-place edit. The BATCH-2026-004 OOS run is deliberately kept in the data not only as an SPC counterexample but as a labelled failure a future classifier can learn from — a governed dataset is what makes a traceable model possible.

In short, this chapter does not do machine learning, but it builds the one thing ML in a GMP plant cannot be trusted without: a record where every number knows its batch, its phase, and its version.

Why it matters

A batch record is the legal artifact a regulator reviews to decide whether a lot of medicine may be released. Everything else in this book — historian, dashboards, the knowledge graph (Book 4, Ontologies for Biopharmaceutical Manufacturing, lifts this same edge-and-table model into a queryable graph), analytics — is, in a sense, decoration hung on the spine you built in this chapter. If the spine is wrong, every downstream number inherits the error.

Modeling on ISA-88 and ISA-95 buys two concrete things. First, portability of meaning: an engineer who knows the standards can read your operation and unit tables without a tour, and a future MES (Manufacturing Execution System, the software that drives and records each recipe step on the floor) or commercial historian can map to them. Second, traceability by construction: genealogy and phase actuals are not bolted on after an audit finding — they are first-class tables, present from the schema's first migration (10-isa88-95.sql, before any data lands), which is exactly the posture cGMP expects.

In the real world

In a real plant, this relational model rarely lives alone. The batch's procedural execution is usually owned by a commercial Manufacturing Execution System (MES, introduced above) or electronic batch record (EBR — the digital equivalent of the paper batch record). The named products here — Werum PAS-X, Körber, Tulip, or a DeltaV/Syncade configuration — are examples of such commercial MES/EBR systems; they are validated, vendor-supported, and decidedly not open source. The honest position this book takes is that PostgreSQL is an excellent system of context and analysis — the place you join time-series to batch to phase and ask questions across campaigns — but it is not, out of the box, a Part-11-compliant electronic batch record — Part 11 being 21 CFR Part 11, the FDA rule that sets the requirements (audit trails, signatures, access control) electronic records must meet to be trusted in place of paper. No open-source database is compliant by itself. Compliance is a property of a validated system plus procedures — GAMP 5 (the pharma industry's Good Automated Manufacturing Practice guide, which has a dedicated open-source software appendix) calls this out explicitly — not a property you can CREATE TABLE your way into. We build the data-integrity scaffolding — system-versioned history, an audit log, a tamper-evident hash chain — in Chapters 23 and 24, and we are candid there about what a superuser can still bypass.

The standards themselves are genuinely industry baseline, not aspiration. ISA-88 and ISA-95 underpin essentially every MES and batch-historian integration in pharma, and B2MML is the lingua franca when two companies' systems must exchange a recipe or batch record. One newer standard is worth naming for the single-use, modular skids this trilogy keeps gesturing at: the Module Type Package (MTP) — VDI/VDE/NAMUR 2658, a standard from the German engineering and process-automation bodies — which describes a skid's services, operator interface, and communication so a modular unit can be orchestrated plug-and-produce (connected and put to work with little custom integration, the way a USB device just works when plugged in) by a higher-level system. We model equipment statically here (a unit row with a vendor and a model); a modular facility would additionally describe each skid with an MTP so the orchestration layer can discover what the unit offers without bespoke integration. Chapter 7 set MTP in its wider context — a modular-automation standard whose runtime rides on an OPC UA information model, sitting beside the OPC UA companion specifications proper (PA-DIM, PackML, LADS). For the intensified / continuous variant of our line — where instead of one batch at a time the process runs nonstop: perfusion upstream (the bioreactor is continuously fed fresh medium while spent medium and product are continuously drawn off) feeding multi-column continuous capture — the equipment hierarchy barely changes (you add a perfusion unit and more chromatography columns), but the procedural model strains: "phases" become harder to delimit when the process never stops. That tension is a recurring theme later in the book; the parent-FK + seq_no model absorbs it more gracefully than a rigidly nested one would, which is one more reason we chose the flat shape.

Key terms

ISA-88 (S88, IEC 61512) — the batch procedural standard: recipe → procedure → unit procedure → operation → phase. Separates how a batch is made from the equipment it runs on.
ISA-95 (S95, IEC 62264) — the physical/organizational standard: enterprise → site → area → unit (with a work-center / process-cell tier collapsed here for a single-product line). Integrates plant floor with the business.
B2MML / BatchML — MESA International's royalty-free XML schema implementation of ISA-95/ISA-88, used to exchange equipment, recipe, and batch records between systems.
Unit — the piece of equipment a phase runs on (e.g. BR101); the join point where ISA-88 and ISA-95 meet.
Phase — the smallest procedural step (e.g. Growth, Capture).
Batch — one manufacturing run: a recipe on a unit, with a lot number, status, and start/end times.
Batch record (s88.batch row) — one manufacturing run as a single row: a business-key batch_id, a product_id, foreign keys recipe_id (version-pinned) and unit_id, plus lot, status, and start/end timestamps. Mostly foreign keys: it binds, it does not duplicate.
Genealogy / lot traceability — directed child → parent edges linking a drug-product lot back through drug substance, capture pool, bioreactor, and seed train.
Edge table — a table whose rows are relationships rather than entities; s88.genealogy stores one (child, parent) edge per row, and a recursive CTE walks it up (recall) or down (impact) on demand.
Effective-dated (valid-time) parameter — a value with valid_from/valid_to so a recipe can be versioned without overwriting history; the separate transaction-time axis is supplied by the Chapter 23 ALCOA+ audit log.
JSONB — PostgreSQL's binary JSON type with GIN indexing, used for the loosely structured long tail of attributes, never for critical, queryable setpoints.
Contextualization — joining a raw sensor reading to its batch, equipment, and active phase; the view s88.v_batch_sensor does this.
RDF triple / derivedFrom — the same genealogy edge written as a graph fact, subject — predicate — object (bp:DS-001 bp:derivedFrom bp:PApool-001); Book 4 declares derivedFrom an owl:TransitiveProperty and validates it with a SHACL Material-to-Material shape, the graph-side twin of this chapter's foreign key.
Grouped / leave-one-batch-out cross-validation — model validation that holds out whole batches (split by batch_id, not by row) so correlated readings from one run never straddle train and test; the leakage-safe split the batch_id on every contextualized row makes possible.
Golden batch — BATCH-2026-001, the reference run the book trends everything against; BATCH-2026-004 is the deliberate OOS (out-of-specification) counterexample.
cGMP — current Good Manufacturing Practice, the regulatory expectation a batch record exists for, and is traceable per, every batch.
ALCOA+ — the data-integrity standard regulators expect of a GMP record: data must be Attributable, Legible, Contemporaneous, Original, and Accurate (the original ALCOA), plus Complete, Consistent, Enduring, and Available (the "+"). The Chapter 23 audit log and the pgcrypto hash chain are what implement it here.

Where this leads

The model now knows about BR101 and the recipe — but Chapter 2's reading still arrived tagged as the bare string BR101.Temp.PV, and nothing yet guarantees that string is spelled the same way in the historian, the dashboard, and the MQTT topic. In the next chapter, Naming Things: Tags, Hierarchies, and the Unified Namespace, we build the controlled tag dictionary and the ISA-95-aligned Unified Namespace that turn ad-hoc tag strings into a governed, machine-checkable address space — and we write the linter that fails the build when someone invents a name that does not fit.

What this chapter covers​

Two standards, one spine​

The equipment hierarchy in SQL​

The procedural model: recipe, operation, phase​

Normalized vs JSONB — and where each wins​

The batch — and its family tree​

Anatomy of a batch record: one s88.batch row​

The ISA-88 / ISA-95 hinge: unit_type​

Genealogy: the child-to-parent edges​

The same edge as a triple: where this model meets the graph​

Normalized columns vs JSONB: the rule of thumb​

When the genealogy edge is missing: a field-failure note​

Seeding one real line​

When a sensor reading meets its batch​

Why the batch_id column is also a machine-learning safeguard​

Why it matters​

In the real world​

Key terms​

Where this leads​

What this chapter covers

Two standards, one spine

The equipment hierarchy in SQL

The procedural model: recipe, operation, phase

Normalized vs JSONB — and where each wins

The batch — and its family tree

Anatomy of a batch record: one s88.batch row

The ISA-88 / ISA-95 hinge: unit_type

Genealogy: the child-to-parent edges

The same edge as a triple: where this model meets the graph

Normalized columns vs JSONB: the rule of thumb

When the genealogy edge is missing: a field-failure note

Seeding one real line

When a sensor reading meets its batch

Why the `batch_id` column is also a machine-learning safeguard

Why it matters

In the real world

Key terms

Where this leads