The Batch & Equipment Data Model: ISA-88/95 in PostgreSQL
π Where we are: Part I, the blueprint. The stack is running (Chapter 2); now we give it a spine β the relational batch and equipment model every later number hangs from.
In Chapter 2 you ran make up and watched a CHO bioreactor simulator start pushing numbers into a historian. Those numbers are real, but right now they are orphans: a value of 37.05 with a tag of BR101.Temp.PV and a timestamp. Which batch was that? Which piece of equipment? Which step of the recipe was running at that instant? Without answers, you have telemetry, not a record.
This chapter builds the answers. We model the process the way the automation world already agrees to model it β ISA-88 for the recipe and ISA-95 for the equipment β and we do it in plain PostgreSQL, in about a hundred lines of SQL you can read in one sitting.
Think of a theatre. ISA-95 is the building: the company owns several theatres (sites), each theatre has rooms (areas), each room has a stage (a unit like the bioreactor BR101). ISA-88 is the script: a play (recipe) has acts (operations) and scenes (phases), performed in order. A batch is one night's performance β a specific cast on a specific stage running a specific script. Our database stores the building, the script, and a log of exactly what happened on which stage, on which night. Get those three right and every sensor reading suddenly knows where it belongs.
What this chapter coversβ
- Why two standards β ISA-88 and ISA-95 β and how they divide the world between them.
- The PostgreSQL schema for the equipment hierarchy (enterprise β site β area β unit) and the procedural model (recipe β operation β phase), drawn straight from the companion repo.
- The batch itself, its genealogy (seed β bioreactor β capture pool β drug substance), and the normalized-vs-JSONB tradeoff for recipe parameters.
- Seeding one real fed-batch CHO + Protein A line the rest of the book reuses.
- How a sensor reading finally meets its batch β and the test that proves it.
- Where this model is honest open source and where the GMP record forces hard choices.
Two standards, one spineβ
Batch manufacturing has two interlocking ANSI/ISA standards, both also published as IEC standards, and it helps enormously to keep them in separate mental boxes.
ISA-88 (ANSI/ISA-88.00.01, also IEC 61512-1) describes how a batch is made β the procedural side. It gives us a clean nesting: a recipe decomposes into a procedure, which decomposes into unit procedures, then operations, then phases, the smallest meaningful step [1]. "Add Feed A at 50 mL/min for 30 minutes" is a phase. ISA-88 deliberately separates this procedural logic from the physical equipment, so the same recipe can run on different reactors.
ISA-95 (ANSI/ISA-95.00.01, also IEC 62264-1) describes where it is made β the physical and organizational hierarchy that integrates the plant floor with the business: enterprise β site β area β work center / unit [2]. The ISO/IEC catalog entry for IEC 62264-1:2013 is the authoritative source for these object models, and it is what we normalize into tables [3].
The two standards meet at the unit. ISA-88 says a phase runs on a unit; ISA-95 says what a unit is. The academic literature has long noted that the terminology between the two overlaps and occasionally clashes, and that a reconciled, formalized entity model is needed to bridge them cleanly β which is exactly the modeling decision we are about to make [4].
We do not need the full B2MML object graph β it is large, and most of it is for cross-enterprise messaging, not storage. B2MML/BatchML is the royalty-free XML schema implementation of ISA-95 and ISA-88, maintained by MESA International, and it is the right reference when you must exchange a recipe or batch record with another company's system [5]. For our internal store we borrow its entities β Equipment, Recipe, Process Segment β but flatten its deeply recursive procedural tree into a far simpler parent-FK + seq_no pattern. That single decision is what keeps the schema readable.
The equipment hierarchy in SQLβ
Everything lives in one PostgreSQL database. Because Chapter 2's image is timescale/timescaledb (PostgreSQL with the TimescaleDB extension), the historian hypertable and this relational model share one database and one transaction boundary β no cross-database joins, no second connection.
The schemas are created first, one per concern, in examples/platform/db/00-init.sql:
-- examples/platform/db/00-init.sql
CREATE EXTENSION IF NOT EXISTS timescaledb;
CREATE EXTENSION IF NOT EXISTS pgcrypto; -- digest() for the ALCOA+ hash chain
-- One schema per concern, mirroring the book's chapters.
CREATE SCHEMA IF NOT EXISTS s88; -- ISA-88/95 batch + equipment model (Ch 3)
CREATE SCHEMA IF NOT EXISTS ts; -- time-series historian (hypertable) (Ch 13)
CREATE SCHEMA IF NOT EXISTS lab; -- samples, tests, results (Ch 8/11)
CREATE SCHEMA IF NOT EXISTS events; -- operation events / equipment states (Ch 7/10/12)
CREATE SCHEMA IF NOT EXISTS audit; -- system-versioned history + hash chain(Ch 20/21)
CREATE SCHEMA IF NOT EXISTS gov; -- tag dictionary, jurisdictions, suppliers (Ch 4/22/23)
The schema name s88 is a small nod to the batch world's habit of calling these standards "S88" and "S95." This chapter owns s88; later chapters fill in their own. The physical hierarchy itself is four tables, each pointing at its parent, in examples/platform/db/10-isa88-95.sql:
-- examples/platform/db/10-isa88-95.sql (ISA-95 equipment hierarchy)
CREATE TABLE s88.enterprise (
enterprise_id text PRIMARY KEY,
name text NOT NULL
);
CREATE TABLE s88.site (
site_id text PRIMARY KEY,
enterprise_id text NOT NULL REFERENCES s88.enterprise,
name text NOT NULL,
country text NOT NULL DEFAULT 'US'
);
CREATE TABLE s88.area (
area_id text PRIMARY KEY,
site_id text NOT NULL REFERENCES s88.site,
name text NOT NULL
);
CREATE TABLE s88.unit ( -- the equipment a phase runs on
unit_id text PRIMARY KEY, -- e.g. BR101
area_id text NOT NULL REFERENCES s88.area,
name text NOT NULL,
unit_type text NOT NULL, -- bioreactor | chromatography | tff | fill_line ...
vendor text,
model text
);
Notice how unremarkable this is. There is no clever inheritance, no entity-attribute-value table, no XML column. Each level has a stable text primary key (BR101, not an opaque integer) because these identifiers are the same ones the operators, the SCADA, and the batch record already use β they are business keys, and using them as primary keys means a human can read a row without a lookup. The unit_type column is the hinge to ISA-88: a phase declares the type of unit it needs (bioreactor), and the batch binds it to a specific unit (BR101).
The procedural model: recipe, operation, phaseβ
Now the script. ISA-88's full nesting is recipe β procedure β unit procedure β operation β phase, but for a single-product mAb line that depth is mostly ceremony. We collapse it to recipe β operation β phase, each child carrying a seq_no so order is data, not table position, again in examples/platform/db/10-isa88-95.sql:
-- examples/platform/db/10-isa88-95.sql (ISA-88 recipe / procedure)
CREATE TABLE s88.recipe (
recipe_id text PRIMARY KEY,
product_id text NOT NULL,
name text NOT NULL,
version int NOT NULL DEFAULT 1
);
CREATE TABLE s88.operation ( -- an ordered step of the recipe
operation_id text PRIMARY KEY,
recipe_id text NOT NULL REFERENCES s88.recipe,
seq_no int NOT NULL,
name text NOT NULL, -- Inoculation | Fed-batch | Harvest | ProteinA ...
unit_type text NOT NULL
);
CREATE TABLE s88.phase ( -- the smallest procedural element
phase_id text PRIMARY KEY,
operation_id text NOT NULL REFERENCES s88.operation,
seq_no int NOT NULL,
name text NOT NULL
);
This parent-FK-plus-seq_no shape is the whole simplification the chapter promises. A recipe's full procedural graph becomes two ordinary one-to-many joins. Reordering the steps is an UPDATE to seq_no, not a schema migration. And because operation.unit_type matches unit.unit_type, the model already knows that the ProteinA operation belongs on a chromatography unit, not the bioreactor β a constraint we can enforce or validate later.
Normalized vs JSONB β and where each winsβ
Recipes carry parameters: setpoints, durations, tolerances. Here the book makes a deliberate, slightly opinionated choice. The handful of parameters that are queried, trended, or compared across batches β temperature setpoint, pH setpoint, dissolved-oxygen setpoint β get their own typed, normalized table, and that table is effective-dated so Chapter 24 can version a recipe in place without destroying history:
-- examples/platform/db/10-isa88-95.sql (effective-dated recipe parameters)
CREATE TABLE s88.recipe_parameter (
recipe_id text NOT NULL REFERENCES s88.recipe,
name text NOT NULL,
value numeric NOT NULL,
unit text NOT NULL,
valid_from timestamptz NOT NULL DEFAULT now(),
valid_to timestamptz NOT NULL DEFAULT 'infinity',
PRIMARY KEY (recipe_id, name, valid_from)
);
The valid_from/valid_to pair is a classic bitemporal trick: "what was the temperature setpoint as of the day BATCH-2026-004 started?" is a WHERE 'date' BETWEEN valid_from AND valid_to query, and old values are never overwritten. That matters because a recipe change is a controlled, audited event in a GMP shop β you are not allowed to silently forget what the old setpoint was.
So where does JSONB come in? For the long tail of loosely structured, rarely queried attributes β vendor-specific phase options, free-form notes, a nested table of bolus-feed times β a jsonb column is the honest answer instead of forcing fifty sparse columns or an entity-attribute-value swamp. PostgreSQL's jsonb type stores parsed binary JSON and supports GIN indexing (via jsonb_ops or jsonb_path_ops) so even those documents stay queryable when you need them [6]. The rule of thumb the book follows: if you will filter, join, or trend on it, normalize it; if you will only read it back whole, JSONB it. Putting a critical setpoint in JSONB to "save a migration" is exactly the kind of shortcut that makes a batch record un-reviewable.
The two ISA standards meet at the batch: ISA-95 says where (BR101 in Newark Upstream), ISA-88 says how (the Fed-batch CHO mAb recipe), and the batch row binds them to one manufacturing run, with genealogy edges tracing material from seed train to drug substance.
Original diagram by the authors, created with AI assistance.
The batch β and its family treeβ
A batch is one manufacturing run: a specific recipe, on a specific unit, with a lot number and a status. Two more tables capture when each phase actually ran and the material genealogy, all in examples/platform/db/10-isa88-95.sql:
-- examples/platform/db/10-isa88-95.sql (the batch and its genealogy)
CREATE TABLE s88.batch (
batch_id text PRIMARY KEY,
product_id text NOT NULL,
recipe_id text NOT NULL REFERENCES s88.recipe,
unit_id text NOT NULL REFERENCES s88.unit,
lot text,
status text NOT NULL DEFAULT 'in_progress', -- in_progress | complete | released | rejected
start_ts timestamptz NOT NULL,
end_ts timestamptz
);
CREATE TABLE s88.batch_phase ( -- when each phase actually ran for a batch
batch_id text NOT NULL REFERENCES s88.batch,
phase_id text NOT NULL REFERENCES s88.phase,
unit_id text NOT NULL REFERENCES s88.unit,
start_ts timestamptz NOT NULL,
end_ts timestamptz,
PRIMARY KEY (batch_id, phase_id)
);
-- lot genealogy: directed edges child -> parent (seed -> bioreactor -> pool -> DS -> DP)
CREATE TABLE s88.genealogy (
batch_id text REFERENCES s88.batch,
child text NOT NULL,
child_type text NOT NULL,
parent text NOT NULL,
parent_type text NOT NULL,
PRIMARY KEY (child, parent)
);
The split between phase and batch_phase is the difference between plan and actuals: phase says the recipe has a Growth phase; batch_phase records that for BATCH-2026-001, Growth ran from noon on Jan 5 to midnight on Jan 12. That table of actuals is what later turns a raw timestamp into "this reading happened during Growth."
The genealogy table is a deceptively small thing with a large regulatory weight. It stores directed child β parent edges, so a drug-product lot can be traced back through drug substance, the Protein A capture pool, the production bioreactor, and the seed train. This is not optional bookkeeping. U.S. cGMP β current Good Manufacturing Practice β requires that a batch production and control record exist for every batch, reproducing the master record [7]; the structure of that record is, quite literally, the skeleton our batch and batch_phase tables encode [8]. And 21 CFR 211.184 requires component and reconciliation records sufficient to trace each finished batch back to the lots of material that went into it β which is precisely what the genealogy edges give you [9]. A self-join or recursive CTE over those edges reconstructs the full lineage on demand.
Seeding one real lineβ
Schema without data is an empty theatre. The seed in examples/platform/db/seed/seed_cho_line.sql stands up the equipment hierarchy, recipe, batches, and phase windows for the exact fed-batch CHO + Protein A line the entire book reuses β a process modeled on the canonical A-Mab case study, the industry's shared reference for a CHO-derived monoclonal antibody made with a Protein A capture step [10]. (The genealogy edges are not in this seed; they are loaded later by make load, alongside the historian and lab data, from lot_genealogy.csv β see Chapter 14.) The equipment, recipe, and batches load like this:
-- examples/platform/db/seed/seed_cho_line.sql (equipment + recipe)
INSERT INTO s88.unit VALUES
('BR101', 'UPSTREAM', 'Production Bioreactor 101', 'bioreactor', 'Sartorius', 'Biostat STR 50'),
('N1SEED', 'UPSTREAM', 'N-1 Seed Bioreactor', 'bioreactor', 'Sartorius', 'Biostat STR 10'),
('PA01', 'DOWNSTREAM', 'Protein A Capture Skid', 'chromatography', 'Cytiva', 'AKTA process'),
('TFF01', 'DOWNSTREAM', 'UF/DF Skid', 'tff', 'Cytiva', 'AKTA flux'),
('FILL-LINE-01', 'FILL', 'Aseptic Fill Line', 'fill_line', 'Bausch+Stroebel', 'KSF')
ON CONFLICT DO NOTHING;
INSERT INTO s88.operation VALUES
('OP1', 'CHO-MAB-001', 1, 'Inoculation', 'bioreactor'),
('OP2', 'CHO-MAB-001', 2, 'Fed-batch', 'bioreactor'),
('OP3', 'CHO-MAB-001', 3, 'Harvest', 'bioreactor'),
('OP4', 'CHO-MAB-001', 4, 'ProteinA', 'chromatography') ON CONFLICT DO NOTHING;
INSERT INTO s88.phase VALUES
('PH1', 'OP1', 1, 'Inoculate'),
('PH2', 'OP2', 1, 'Growth'),
('PH3', 'OP2', 2, 'Production'),
('PH4', 'OP3', 1, 'Harvest'),
('PH5', 'OP4', 1, 'Capture') ON CONFLICT DO NOTHING;
Every INSERT ends with ON CONFLICT DO NOTHING so make seed is idempotent β running it twice does not duplicate the line, which matters when you re-seed between chapters. The seed then loads six campaign batches, one of which is deliberately the cautionary tale:
-- examples/platform/db/seed/seed_cho_line.sql (the six campaign batches; -004 is OOS)
INSERT INTO s88.batch (batch_id, product_id, recipe_id, unit_id, lot, status, start_ts, end_ts) VALUES
('BATCH-2026-001', 'MAB-001', 'CHO-MAB-001', 'BR101', 'L26001', 'released', '2026-01-05T00:00:00Z', '2026-01-19T00:00:00Z'),
('BATCH-2026-004', 'MAB-001', 'CHO-MAB-001', 'BR101', 'L26004', 'rejected', '2026-01-05T00:00:00Z', '2026-01-19T00:00:00Z'),
('BATCH-2026-006', 'MAB-001', 'CHO-MAB-001', 'BR101', 'L26006', 'complete', '2026-01-05T00:00:00Z', '2026-01-19T00:00:00Z')
ON CONFLICT DO NOTHING;
BATCH-2026-001 is the golden batch the book trends everything against; BATCH-2026-004 carries a deliberate out-of-specification (OOS) excursion and a rejected status, so later chapters have a real failure to detect, investigate, and explain. The remaining batches give statistical-process-control charts something to chew on.
Finally the seed records the phase windows for the golden batch β the actuals that let a timestamp find its phase:
-- examples/platform/db/seed/seed_cho_line.sql (phase windows for the golden batch)
INSERT INTO s88.batch_phase (batch_id, phase_id, unit_id, start_ts, end_ts) VALUES
('BATCH-2026-001', 'PH1', 'BR101', '2026-01-05T00:00:00Z', '2026-01-05T12:00:00Z'),
('BATCH-2026-001', 'PH2', 'BR101', '2026-01-05T12:00:00Z', '2026-01-12T00:00:00Z'),
('BATCH-2026-001', 'PH3', 'BR101', '2026-01-12T00:00:00Z', '2026-01-18T00:00:00Z'),
('BATCH-2026-001', 'PH4', 'BR101', '2026-01-18T00:00:00Z', '2026-01-19T00:00:00Z')
ON CONFLICT DO NOTHING;
When a sensor reading meets its batchβ
Here is the payoff. Once the model and seed are in place, the orphan reading from Chapter 2 can be contextualized. The view that does it lives in examples/platform/db/60-views.sql (built in full in Chapter 14, shown here because it is the reason the model exists):
-- examples/platform/db/60-views.sql (a reading with its full batch + phase context)
CREATE OR REPLACE VIEW s88.v_batch_sensor AS
SELECT r.ts, r.tag, r.value, r.unit, r.quality, r.batch_id,
b.product_id, b.recipe_id, b.unit_id,
bp.phase_id, ph.name AS phase_name
FROM ts.sensor_reading r
JOIN s88.batch b ON b.batch_id = r.batch_id
LEFT JOIN s88.batch_phase bp ON bp.batch_id = r.batch_id
AND r.ts >= bp.start_ts AND (bp.end_ts IS NULL OR r.ts < bp.end_ts)
LEFT JOIN s88.phase ph ON ph.phase_id = bp.phase_id;
The LEFT JOIN ... AND r.ts >= bp.start_ts AND r.ts < bp.end_ts is the time-window join that maps each instant to whatever phase was active then. That orphan BR101.Temp.PV = 37.05 now reads as a row that knows it was Product MAB-001, Recipe CHO-MAB-001, on unit BR101, during the Growth phase. A bare tag has become knowledge.
We do not just assert that this works β the companion repo proves it. In examples/tests/test_db.py, a pytest run against the live stack checks that the hierarchy seeded and that every reading in the golden batch resolves to a named phase:
# examples/tests/test_db.py
def test_schema_and_hypertable(conn):
assert _scalar(conn, "select count(*) from timescaledb_information.hypertables "
"where hypertable_name='sensor_reading'") == 1
assert _scalar(conn, "select count(*) from s88.batch") >= 6
def test_contextualization_joins_phase(conn):
# every reading in the golden batch should resolve to a named phase
rows = _scalar(conn, "select count(distinct phase_name) from s88.v_batch_sensor "
"where batch_id='BATCH-2026-001' and phase_name is not null")
assert rows >= 4 # Inoculate, Growth, Production, Harvest
Run make up && make seed && make load, then make test, and these pass on a laptop and again on a clean CI runner β which is the book's standing promise that the model is real, not a diagram. The same make test also exercises the audit chain that Chapter 20 layers on top of this schema, but that is a later story.
Why it mattersβ
A batch record is the legal artifact a regulator reviews to decide whether a lot of medicine may be released. Everything else in this book β historian, dashboards, knowledge graph, analytics β is, in a sense, decoration hung on the spine you built in this chapter. If the spine is wrong, every downstream number inherits the error.
Modeling on ISA-88 and ISA-95 buys two concrete things. First, portability of meaning: an engineer who knows the standards can read your operation and unit tables without a tour, and a future MES or commercial historian can map to them. Second, traceability by construction: genealogy and phase actuals are not bolted on after an audit finding β they are first-class tables, present from the schema's first migration (10-isa88-95.sql, before any data lands), which is exactly the posture cGMP expects.
In the real worldβ
In a real plant, this relational model rarely lives alone. The batch's procedural execution is usually owned by a commercial Manufacturing Execution System or electronic batch record β Werum PAS-X, KΓΆrber, Tulip, or a DeltaV/Syncade configuration β and those systems are validated, vendor-supported, and decidedly not open source. The honest position this book takes is that PostgreSQL is an excellent system of context and analysis β the place you join time-series to batch to phase and ask questions across campaigns β but it is not, out of the box, a Part-11-compliant electronic batch record. No open-source database is. Compliance is a property of a validated system plus procedures (GAMP 5 calls this out explicitly, including its open-source software appendix), not a property you can CREATE TABLE your way into. We build the data-integrity scaffolding β system-versioned history, an audit log, a tamper-evident hash chain β in Chapters 20 and 21, and we are candid there about what a superuser can still bypass.
The standards themselves are genuinely industry baseline, not aspiration. ISA-88 and ISA-95 underpin essentially every MES and batch-historian integration in pharma, and B2MML is the lingua franca when two companies' systems must exchange a recipe or batch record. NIIMBL β the U.S. public-private Institute for Innovation in Manufacturing Biopharmaceuticals β and pilot-scale cGMP facilities such as SABRE, the NIIMBL / University of Delaware facility that broke ground in April 2024, run on exactly this kind of standards-based equipment-and-batch modeling; an interoperable data platform has no chance without it. For the intensified / continuous variant of our line β perfusion upstream with multi-column continuous capture β the equipment hierarchy barely changes (you add a perfusion unit and more chromatography columns), but the procedural model strains: "phases" become harder to delimit when the process never stops. That tension is a recurring theme later in the book; the parent-FK + seq_no model absorbs it more gracefully than a rigidly nested one would, which is one more reason we chose the flat shape.
Key termsβ
- ISA-88 (S88, IEC 61512) β the batch procedural standard: recipe β procedure β unit procedure β operation β phase. Separates how a batch is made from the equipment it runs on.
- ISA-95 (S95, IEC 62264) β the physical/organizational standard: enterprise β site β area β work center / unit. Integrates plant floor with the business.
- B2MML / BatchML β MESA International's royalty-free XML schema implementation of ISA-95/ISA-88, used to exchange equipment, recipe, and batch records between systems.
- Unit β the piece of equipment a phase runs on (e.g.
BR101); the join point where ISA-88 and ISA-95 meet. - Phase β the smallest procedural step (e.g. Growth, Capture).
- Batch β one manufacturing run: a recipe on a unit, with a lot number, status, and start/end times.
- Genealogy / lot traceability β directed child β parent edges linking a drug-product lot back through drug substance, capture pool, bioreactor, and seed train.
- Effective-dated (bitemporal) parameter β a value with
valid_from/valid_toso a recipe can be versioned without overwriting history. - JSONB β PostgreSQL's binary JSON type with GIN indexing, used for the loosely structured long tail of attributes, never for critical, queryable setpoints.
- Contextualization β joining a raw sensor reading to its batch, equipment, and active phase; the view
s88.v_batch_sensordoes this. - Golden batch β
BATCH-2026-001, the reference run the book trends everything against;BATCH-2026-004is the deliberate OOS (out-of-specification) counterexample. - cGMP β current Good Manufacturing Practice, the regulatory expectation a batch record exists for, and is traceable per, every batch.
Where this leadsβ
The model now knows about BR101 and the recipe β but Chapter 2's reading still arrived tagged as the bare string BR101.Temp.PV, and nothing yet guarantees that string is spelled the same way in the historian, the dashboard, and the MQTT topic. In the next chapter, Naming Things: Tags, Hierarchies, and the Unified Namespace, we build the controlled tag dictionary and the ISA-95-aligned Unified Namespace that turn ad-hoc tag strings into a governed, machine-checkable address space β and we write the linter that fails the build when someone invents a name that does not fit.