Managing Change: Process Changes, Equipment Swaps & Schema Evolution

📍 Where we are: Part VI, operating at scale. The platform runs, holds a tamper-evident audit trail (Chapter 23), and respects jurisdictional residency (Chapter 26). Now the process itself changes — and we must change the data without breaking the record.

A bioprocess platform is never finished. Six months after go-live the science team lifts the production pH setpoint by 0.1 (from 6.95 to 7.05) — a setpoint being the target value a controller holds the process at — the maintenance team swaps a worn-out Protein A skid — a self-contained, skid-mounted process unit — for a newer model, and an instrument vendor ships a firmware update that quietly renames a column in the exported assay file. Each of these is, on the floor, a routine event. Each of them, in your database, is a chance to silently corrupt the historical record — to make the book's running example batch BATCH-2026-001 (the single manufacturing run this series follows end to end) look like it ran on a recipe it never ran, or to orphan three years of chromatography data (the time-series readings from the Protein A purification step) behind a tag name — the string under which each sensor signal is stored — that no longer exists.

This chapter treats change as what it really is: a first-class data problem with a regulatory deadline attached. We will version a recipe without overwriting history, swap a skid while keeping its genealogy intact, and migrate a changed data format with verification and a working rollback — all under the discipline that the audit trail must survive every one of these moves.

The simple version

Think of your database as a published book that regulators can re-read at any time. You are never allowed to erase a printed page. When the recipe changes, you do not paint over the old setpoint — you add a dated errata page that says "from 12 March, read pH 7.05 instead of 6.95," and the old page stays legible forever. When you replace a machine, you do not throw out its chapters — you write "this story continues in the new machine" and keep both. When the file format changes, you keep the old edition on the shelf until you have proven, line by line, that the new edition says exactly the same thing. Change control is just the rule that nobody edits the book without a signed, dated, reversible note saying what changed and why.

What this chapter covers

Why change control is a GMP requirement, not a nicety, and how Annex 11 (an EU rule for computerised systems), ICH Q10 (a quality-system guideline), and ICH Q12 (a guideline for changes after a medicine is approved) frame process, equipment, and data changes — defined where each is first used below.
Effective-dated recipes: versioning a setpoint in place using valid_from/valid_to, and the PostgreSQL (the open-source database) exclusion constraint that stops two versions from overlapping in time.
Reversible, validated schema migrations with Sqitch (and how Flyway compares) so the schema evolves without a break in the chain.
Swapping a skid or instrument while preserving genealogy and re-mapping tags so years of history stay joinable.
A data-format migration — legacy CSV to Parquet — with byte-level verification and a rollback path, plus where lakeFS/DVC fit.
Why pure open source gets you most of the way here, and where the GxP (the umbrella for the Good-x-Practice regulations — GMP, GLP, GCP — that govern any data a regulator may inspect) last mile stays hybrid — part open source, part commercial — as the chapter's close explains.

Change is a regulated event

Before any code, the framing. In a GMP (Good Manufacturing Practice — the legally enforceable rules for making medicines) shop you are not free to change a production system on a whim. Annex 11 of the EU GMP guidelines — the European counterpart to Part 11 (FDA 21 CFR Part 11, the US rule on electronic records and electronic signatures) — is explicit: computerised systems must run a documented change and configuration management process (Annex 11 clause 10), and when data is transferred to another format or system, that migration must be checked to confirm the value and meaning of the data were not altered (Annex 11 clause 4.8) [1]. ICH Q10 makes this structural rather than incidental: a change management system is one of the four named elements of a pharmaceutical quality system, sitting alongside process performance monitoring, corrective action, and management review [2]. And ICH Q12 gives the post-approval machinery — the rules for changing a process after a medicine has been approved for sale (its marketing authorisation granted by the regulator) — Established Conditions that define what is legally fixed (the parameters and methods a firm cannot change without telling the regulator), and Post-Approval Change Management Protocols (PACMP) that pre-agree how a future change will be made and reported [3].

For us, three engineering rules fall straight out of those documents. Together they operationalise the data-integrity expectations regulators summarise as ALCOA+ — that records be Attributable, Legible, Contemporaneous, Original, and Accurate, plus Complete, Consistent, Enduring, and Available — and the three rules below are simply ALCOA+ made enforceable: never destroying history keeps records Original and Enduring, reversibility keeps them Available, and proving old data is still readable keeps them Legible and Accurate:

Never destroy history. A change adds a new, dated truth; it does not overwrite the old one. The FDA's data-integrity guidance defines the audit trail as a secure, computer-generated, time-stamped record that allows reconstruction of the creation, modification, and deletion of a record — a property every migration and equipment swap must preserve, not break [4].
Make every change reversible. If a migration fails verification, you must be able to return to the prior, known-good state.
Prove old data is still readable. PIC/S (the Pharmaceutical Inspection Co-operation Scheme — an international cooperation of medicines inspectorates) guidance PI 041-1 is direct here: when software is updated, the firm must confirm that old data can still be read — either in its existing format or by a validated migration to a new one — and must retain the old system where migration is not possible [5].

The rest of the chapter is those three rules, in SQL and Python.

Effective-dating a recipe, for real

Recall the recipe-parameter table from Chapter 4. It was built effective-dated from day one — meaning each row carries the date range during which it was the truth — precisely so this chapter could exist. From examples/platform/db/10-isa88-95.sql:

-- examples/platform/db/10-isa88-95.sql  (effective-dated recipe parameters)
-- effective-dated recipe parameters (Ch 27 versions these in place)
CREATE TABLE s88.recipe_parameter (
    recipe_id  text NOT NULL REFERENCES s88.recipe,
    name       text NOT NULL,
    value      numeric NOT NULL,
    unit       text NOT NULL,
    valid_from timestamptz NOT NULL DEFAULT now(),
    valid_to   timestamptz NOT NULL DEFAULT 'infinity',
    PRIMARY KEY (recipe_id, name, valid_from)
);

The whole point of valid_from/valid_to is that a recipe change is an INSERT plus an UPDATE, never a destructive UPDATE alone. Here recipe_id CHO-MAB-001 (the recipe for a monoclonal antibody, MAB, grown in CHO — Chinese-hamster-ovary — cells) is the formula, while BATCH-2026-001 is a run that executed that recipe — so asking what setpoint governed the batch is the same as asking what the recipe said on the batch's start date. Suppose on 12 March 2026 the science team raises the production-phase pH setpoint from 6.95 to 7.05 under change control CC-2026-018 — a 0.1-unit move that can shift charge-variant and aggregate (HMW, high-molecular-weight) profiles, two product-quality attributes (measurable properties that define whether the drug is the right molecule at the right purity) of the antibody — charge variants are slightly altered copies of the protein, and aggregates are clumps of antibody molecules stuck together, both of which a 0.1-unit pH move can change (Book 1, Biologic Drug Manufacturing, develops why). Because even small pH moves can shift these, pH is a controlled process parameter and the change goes through change control rather than a quiet edit. The correct move is to close the old row by setting its valid_to, then open a new row:

-- close the outgoing version at the effective instant, open the new one
UPDATE s88.recipe_parameter
   SET valid_to = '2026-03-12T00:00:00Z'
 WHERE recipe_id = 'CHO-MAB-001' AND name = 'pH_setpoint'
   AND valid_to = 'infinity';

INSERT INTO s88.recipe_parameter (recipe_id, name, value, unit, valid_from, valid_to)
VALUES ('CHO-MAB-001', 'pH_setpoint', 7.05, 'pH', '2026-03-12T00:00:00Z', 'infinity');

The history now reads cleanly. The question "what pH setpoint applied when BATCH-2026-001 started, back on 5 January?" is a point-in-time query, and it returns 6.95 because that is what was true then:

SELECT value, unit
  FROM s88.recipe_parameter
 WHERE recipe_id = 'CHO-MAB-001' AND name = 'pH_setpoint'
   AND '2026-01-05T00:00:00Z' >= valid_from
   AND '2026-01-05T00:00:00Z' <  valid_to;
--  value | unit
-- -------+------
--   6.95 | pH

Two rows now exist for the same parameter, and that is exactly right — both are true, each in its own window. A batch keeps showing the setpoint that governed it, and an auditor can reconstruct the recipe as of any date without a separate archive.

Stopping overlaps at the database

There is a subtle failure mode: a fat-fingered migration could leave two rows whose [valid_from, valid_to) windows overlap, and now the point-in-time query returns two pHs and the model lies. A plain UNIQUE constraint cannot catch this, because the conflict is not equality — it is range overlap. PostgreSQL's answer is an exclusion constraint, which uses a GiST index — an index being the lookup structure a database keeps so it can check a condition fast, and a GiST one being the flexible kind that can index ranges, not just plain values — to guarantee that no two rows satisfying the predicate can have overlapping values under an operator you choose [6]. Paired with a range type — tstzrange (a timestamp-with-time-zone range) and the related daterange/int4range types model a span with inclusive/exclusive bounds and an overlap operator && — it does exactly what we need; the documentation is explicit that UNIQUE is unsuitable for ranges while an exclusion constraint enforcing non-overlap is the right pattern [7].

In the companion stack we add this as a migration (below) rather than baking it into the day-one schema, because it is precisely the kind of integrity tightening that arrives after go-live. Because the equality predicates on recipe_id and name share the index with a range overlap — and plain text columns and range columns normally cannot live in the same GiST index — the GiST index needs the btree_gist extension to bridge the two column types; without it PostgreSQL rejects the constraint with "text has no default operator class for access method gist" — so the migration enables it first:

-- examples/platform/db/migrations/deploy/recipe_param_no_overlap.sql
-- btree_gist lets the text equality predicates share one GiST index with the range overlap
CREATE EXTENSION IF NOT EXISTS btree_gist;

-- a GiST exclusion constraint: no two versions of the same parameter may overlap in time
ALTER TABLE s88.recipe_parameter
  ADD CONSTRAINT recipe_parameter_no_overlap
  EXCLUDE USING gist (
    recipe_id WITH =,
    name      WITH =,
    tstzrange(valid_from, valid_to, '[)') WITH &&
  );

Now the database itself refuses to accept a second pH_setpoint row whose window touches an existing one. The valid-time (effective-dated) discipline stops being a convention engineers must remember and becomes a rule the engine enforces — which is exactly the posture a data-integrity reviewer wants to see. (Pairing this valid-time table with the transaction-time audit log from Chapter 23 is what yields a fully bitemporal record — two independent time axes — matching the framing Chapter 4 set out.)

Anatomy of an effective-dated recipe-parameter row (the central record, field by field)

The whole chapter turns on one record shape, so it is worth dissecting field by field. A row of s88.recipe_parameter is not a value — it is a value with a validity window, and a change produces a matched pair of rows: the OLD one closed, the NEW one open. The card below is the CC-2026-018 pH change as the database actually stores it.

One setpoint change stored as an audit-safe pair of rows — the old window bounded, the new window open — with the GiST exclusion predicate that guarantees the two never overlap. Original diagram by the authors, created with AI assistance.

Read it column by column. The recipe_id (CHO-MAB-001) and name (pH_setpoint) are the stable identity of the parameter; they never change across a version. The value (6.95 then 7.05) and its unit (pH) are the payload — and a value is never stored without its unit, the same discipline the earlier anatomy of reading the historian (the time-series database that holds process-sensor data) showed for sensor readings. valid_from and valid_to are the entire mechanism: the OLD row's valid_to, formerly 'infinity', is closed to 2026-03-12T00:00:00Z, and the NEW row opens at exactly that instant with valid_to = 'infinity'. The composite PRIMARY KEY (recipe_id, name, valid_from) is what permits two rows for the same parameter to coexist — a primary key being the column set a database requires to be unique per row, and a composite one spanning several columns. Because valid_from is part of the key, the new version carries a different key, so it is a new row, not a duplicate conflict.

The two derived pieces are where the integrity lives. The window is tstzrange(valid_from, valid_to, '[)') — a half-open interval that includes its start and excludes its end, so the boundary instant 2026-03-12 belongs unambiguously to the NEW row and to it alone; there is no instant that two windows both claim. The GiST EXCLUDE predicate — recipe_id WITH =, name WITH =, tstzrange(...) WITH && — reads as "for the same recipe and the same parameter, two time-ranges may not overlap (&&)." That single line is what a point-in-time query relies on: because no two windows overlap, the query for 5 January matches exactly one row and returns 6.95, never two pHs.

Reversible, validated migrations with Sqitch

Adding that constraint is a schema change — the schema being the structural definition of a database's tables, columns, and constraints, and migrating it meaning applying a recorded, repeatable change to that structure — and schema changes need the same change-control rigour as recipe changes. The companion repo manages them with Sqitch, a database-change framework whose entire model is built around the three rules above. Each change is a named trio of scripts — a deploy that applies it, a revert that undoes it, and a verify that asserts it actually took. By default sqitch deploy does not run verify scripts; with sqitch deploy --verify (or with deploy.verify enabled in sqitch.conf) Sqitch runs each verify during the deploy and reverts the change in the same run if a verify fails [8]. The companion sqitch.conf enables deploy.verify, so the gate is on by default for the reader. Sqitch is MIT-licensed — a permissive open-source licence that lets the book ship and modify it freely — which is why the book reaches for it rather than the more commercial-adjacent alternative (Flyway, compared below).

The migration directory lives at examples/platform/db/migrations, managed by Sqitch, and ships a real sqitch.conf and sqitch.plan with the recipe_param_no_overlap change committed. A change is added with sqitch add recipe_param_no_overlap -n 'enforce non-overlapping recipe versions', which scaffolds the trio. The deploy script holds the ALTER TABLE shown above; the revert and verify are its bookends. The committed scripts (Sqitch wraps each change in its own transaction on PostgreSQL, so there is no explicit BEGIN/COMMIT — that would interfere with the auto-revert this section relies on):

-- examples/platform/db/migrations/deploy/recipe_param_no_overlap.sql
CREATE EXTENSION IF NOT EXISTS btree_gist;
ALTER TABLE s88.recipe_parameter
  ADD CONSTRAINT recipe_parameter_no_overlap
  EXCLUDE USING gist (
    recipe_id WITH =, name WITH =,
    tstzrange(valid_from, valid_to, '[)') WITH &&
  );

-- examples/platform/db/migrations/revert/recipe_param_no_overlap.sql   (the reversibility rule, in one line)
ALTER TABLE s88.recipe_parameter DROP CONSTRAINT recipe_parameter_no_overlap;

-- examples/platform/db/migrations/verify/recipe_param_no_overlap.sql   (assert the change actually took)
SELECT 1 / CASE WHEN count(*) = 1 THEN 1 ELSE 0 END  -- divides by zero (fails) unless the constraint exists
  FROM pg_constraint
 WHERE conname = 'recipe_parameter_no_overlap';

The operator runs sqitch deploy --verify db:pg://..., and Sqitch applies the deploy, immediately runs the verify, and — if the verify raises — reverts in the same transaction so the database is never left half-changed. To back the change out deliberately you run sqitch revert --to @HEAD^1 — in this Git-style notation @HEAD is the most recent change in sqitch.plan and @HEAD^1 is the one before it, so this reverts every change applied after that point and leaves the database at the named change. This is the engineering expression of "reversible, validated": every forward step has a tested backward step, and the verify is a gate, not a hope.

Anatomy of a Sqitch change (deploy, revert, verify, and the plan line)

Where the recipe row encoded "reversible truth" in data, a Sqitch change encodes it in files. One change is not one script; it is a trio of scripts plus one append-only line in sqitch.plan. Dissecting that artifact shows where each of the three engineering rules physically lives.

A single change as Sqitch stores it — one plan line and three scripts — so the verify gate can promote a good deploy to live or auto-revert a bad one to a known-good state in the same run. Original diagram by the authors, created with AI assistance.

The sqitch.plan line is the project's ledger: recipe_param_no_overlap 2026-03-12T00:00:00Z <committer> # enforce non-overlapping recipe versions. Field by field, it carries the change name (which is also the basename of all three scripts), the planned timestamp, the committer identity, and a # note that records why the change exists — the audit-trail fields a reviewer reads first. A second plan line, tag_alias (the skid-swap change introduced later), declares requires: recipe_param_no_overlap, so the plan also encodes dependency order between changes, not just a flat list. Here the dependency is an ordering choice rather than a hard technical coupling — it simply pins tag_alias to deploy after the change that precedes it in the plan, so a fresh database always rebuilds the migrations in one deterministic sequence.

The three scripts are the chapter's three rules made executable. deploy/ holds the forward DDL — CREATE EXTENSION btree_gist then the ALTER TABLE … ADD CONSTRAINT … EXCLUDE USING gist. revert/ is the reversibility rule in one line — DROP CONSTRAINT recipe_parameter_no_overlap. verify/ is the proof — SELECT 1 / CASE WHEN count(*) = 1 THEN 1 ELSE 0 END FROM pg_constraint, which deliberately divides by zero unless exactly one matching constraint exists; the resulting database error is how the script signals failure, and Sqitch treats any error from a verify script as a failed verify. The fourth field is not in any script: it is the deploy.verify = true setting in sqitch.conf, the gate that makes the verify run inside the deploy. With the gate on, a passing verify promotes the change to live with the audit trail intact; a failing verify triggers an auto-revert in the same run; either way the database lands on a known-good state and is never left half-changed.

It is worth being honest about the alternative. Flyway applies versioned migrations exactly once, fingerprints each with a checksum so an already-applied script cannot be silently edited, and offers paired Undo (U-prefixed) scripts — but its own documentation cautions that undo plus a restorable backup are both needed for true reversibility, because some DDL is not cleanly undoable [9]. That caveat applies to Sqitch too. The mature posture is: reversible migration script and a point-in-time-recovery backup taken immediately before the change. We configure that backup in the next chapter.

Flow diagram of a Sqitch schema change: a change request under change control leads to sqitch add (deploy, revert, verify), then a sqitch deploy decision; on verify passes the change goes live with the audit trail intact, on verify fails it auto-reverts to known-good, and a deliberate rollback runs sqitch revert which also returns to the auto-revert known-good state.

Swapping a skid without orphaning history

The hardest change is physical. In March, PA01 — the Cytiva ÄKTA process Protein A skid seeded back in Chapter 4 — is retired and replaced by a newer unit, PA02. Three things must remain true after the swap: every old batch must still point at the equipment that actually made it; new batches must point at the new skid; and the time-series tags from the old skid — each tag being the string name under which one sensor signal is stored — must remain joinable (matchable by that tag string) to the years of history they carry. (Note that this is more than a like-for-like swap: PA01 runs the Protein A capture step — the first chromatography purification stage, in which the antibody-containing liquid is pumped through a column packed with resin (porous beads that selectively grab the antibody and let impurities wash past), explained in Book 1's capture-chromatography chapter — as a single-column batch process (a Cytiva ÄKTA process skid), while PA02 (a Cytiva ÄKTA pcc 75) runs periodic counter-current (PCC) chromatography — loading the resin across several smaller columns running in sequence rather than one column at a time. Because it is a fundamentally different capture method, quality must formally re-prove the product is the same — a comparability assessment (the formal study showing the antibody made the new way is equivalent to the old), touching loading regime (how much material is run onto the column and how fast), yield, and impurity clearance — how well the step removes contaminants such as host-cell protein (HCP, residual proteins from the production cells) and aggregate — the equivalence check Book 1's capture-chromatography and quality chapters describe. The data techniques here ride on top of that comparability work, not in place of it.)

The equipment hierarchy makes the first two trivial, because unit_id is a stable business key — a real-world identifier that never changes once assigned — and batches reference it. We never rename PA01; we retire it and add PA02:

-- retire the old skid (keep the row — old batches still reference it), add the new one
INSERT INTO s88.unit VALUES
  ('PA02', 'DOWNSTREAM', 'Protein A Capture Skid 2', 'chromatography', 'Cytiva', 'AKTA pcc 75')
  ON CONFLICT DO NOTHING;

-- record the equipment lineage so reports know PA02 succeeded PA01
INSERT INTO s88.genealogy (batch_id, child, child_type, parent, parent_type)
VALUES (NULL, 'PA02', 'equipment', 'PA01', 'equipment');

BATCH-2026-001 keeps pointing at PA01; the April batches point at PA02; and the genealogy edge records that PA02 succeeded PA01 so an equipment-history report can walk the lineage. Nothing was overwritten.

The genuinely fiddly part is tag re-mapping. The old skid published tags like PA01.UV280.PV (the skid's UV-absorbance-at-280-nm reading, where .PV is the live process value); the new skid publishes PA02.UV280.PV. The governed tag dictionary from Chapter 5 (gov.tag_dictionary) — the single registry that lists every approved tag name and its metadata — is the one place that decides what is a legal tag (a tag the governance layer recognises, versus an ad-hoc name it will reject) — but as shipped (examples/platform/db/40-gov.sql) it keys on tag and carries no retire or effective-date columns, so it cannot by itself express "this signal used to be called something else." The swap therefore needs a companion table, added as a Sqitch migration the reader deploys (examples/platform/db/migrations/deploy/tag_alias.sql):

-- examples/platform/db/migrations/deploy/tag_alias.sql  (records old->new tag correspondence)
CREATE TABLE gov.tag_alias (
    old_tag   text NOT NULL,                 -- PA01.UV280.PV
    new_tag   text NOT NULL,                 -- PA02.UV280.PV
    effective timestamptz NOT NULL,          -- when the new skid took over
    reason    text,                          -- e.g. CC-2026-024 (skid swap)
    PRIMARY KEY (old_tag, new_tag)
);

-- resolution view: resolve any historic or current tag to its current canonical name
CREATE VIEW gov.v_tag_current AS
SELECT d.tag AS tag, d.tag AS current_tag
  FROM gov.tag_dictionary d
UNION
SELECT a.old_tag AS tag, a.new_tag AS current_tag
  FROM gov.tag_alias a;

The new tags are registered in the dictionary and the old ones are left in place (the dictionary has no retire column, and the old rows must stay so historic tags remain legal). The companion repo carries a small remapper, examples/tools/tag-remap/tag_remap.py, that reads an old→new mapping CSV, validates it, and applies it to gov.tag_alias (cloning the old tag's governed metadata onto the new tag where a dictionary row exists — in the shipped seed only the BR101.* fed-batch tags are dictionary-governed, so the three chromatography signals here have no source row to clone and only the alias is written). The idea of "one physical signal, several names over time" is the same logical-asset aliasing that ISA-95 Part 7 — part of the ISA-95 manufacturing-integration standard — formalises in its Alias Service Model, exercised in the naming-and-UNS chapter. The mapping file itself is plain, reviewable data:

# examples/tools/tag-remap/remap_PA01_to_PA02.csv  (old_tag,new_tag,effective,reason)
old_tag,new_tag,effective,reason
PA01.UV280.PV,PA02.UV280.PV,2026-03-15T00:00:00Z,CC-2026-024 skid swap
PA01.Cond.PV,PA02.Cond.PV,2026-03-15T00:00:00Z,CC-2026-024 skid swap
PA01.pH.PV,PA02.pH.PV,2026-03-15T00:00:00Z,CC-2026-024 skid swap

Crucially, we do not rewrite the historian. The 18 months of PA01.UV280.PV rows in ts.sensor_reading stay exactly as recorded — rewriting them would be the very destruction of history the audit trail forbids. Instead the alias table lets a cross-changeover query resolve both names to one logical measurement:

-- read 'Protein A UV280' across the swap without rewriting a single historic row
SELECT ts, value
  FROM ts.sensor_reading
 WHERE tag IN ('PA01.UV280.PV', 'PA02.UV280.PV')   -- old + new, joined via gov.tag_alias
 ORDER BY ts;

History is preserved, the new skid is live, and a query that needs "the Protein A UV trace" no longer has to know that a skid was swapped on 15 March. That is the difference between a platform that ages gracefully and one that accretes scar tissue.

The skid-swap changeover, step by step (a numbered walkthrough)

Spelled out as a runbook, the PA01 → PA02 changeover under CC-2026-024 is five ordered moves, each one additive:

Add the new unit, retire (do not delete) the old. INSERT INTO s88.unit the PA02 row ('PA02', 'DOWNSTREAM', 'Protein A Capture Skid 2', 'chromatography', 'Cytiva', 'AKTA pcc 75') with ON CONFLICT DO NOTHING. The PA01 row stays exactly as seeded in Chapter 4 ('PA01', …, 'Cytiva', 'AKTA process') because BATCH-2026-001 still references it.
Record the lineage edge. One INSERT INTO s88.genealogy with child = 'PA02', parent = 'PA01', both *_type = 'equipment', so an equipment-history report can walk PA02 back to PA01.
Deploy the alias table. sqitch deploy the tag_alias change, which (per its requires: recipe_param_no_overlap line) creates gov.tag_alias and the gov.v_tag_current resolution view — a saved query that, with SQL's UNION (which stacks two result sets into one), presents the dictionary and the aliases as a single list of currently-valid tags — the missing capability gov.tag_dictionary cannot express on its own.
Apply the mapping. Run tag_remap.py against remap_PA01_to_PA02.csv (the three signals UV280.PV, Cond.PV, pH.PV), which validates the file, upserts the three alias rows, and registers each new tag by cloning the old tag's governed dictionary metadata where a dictionary row exists (in the shipped seed the chromatography signals are not dictionary-governed, so this step only writes the aliases). Old dictionary rows are left in place so historic tags stay legal.
Read across the seam. A query selecting tag IN ('PA01.UV280.PV', 'PA02.UV280.PV') — or, generically, joining through gov.v_tag_current — returns one continuous Protein A UV trace, and not one row of ts.sensor_reading was rewritten.

The ordering matters: the alias table must exist (step 3) before the remapper writes to it (step 4), which is exactly why the plan's requires dependency is not decoration.

The same swap, as triples: the knowledge-graph view

The two relational moves above — the s88.genealogy edge and the gov.tag_alias row — are not just rows; they are the digital-thread facts the knowledge-graph chapter loads into RDF, and seeing them as triples is what makes the swap survive tech transfer to another site whose tables are shaped differently. The equipment lineage edge is the open-source twin of the derivedFrom spine Book 4 builds — there the antibody lot's derivedFrom is declared a transitive property so a lineage walk reaches any ancestor in one query, and the equipment-genealogy modeling does the same for skids. Written as Turtle (RDF's human-readable text form, where a means "is of type"), the swap is three facts:

# the skid swap as RDF: a successor edge plus a value-identity bridge across the changeover.
@prefix bp:   <https://example.org/bioproc#> .
bp:PA02  a bp:ProteinASkid ; bp:succeeds bp:PA01 .          # genealogy edge (PA02 succeeded PA01)
bp:PA01.UV280.PV  bp:sameSignalAs  bp:PA02.UV280.PV .       # the tag alias as an identity bridge

That bp:sameSignalAs predicate is the alias table's semantic counterpart: it asserts that two named tags denote one logical measurement, the precise problem ISA-95 Part 7's Alias Service Model and Book 4's identifiers-and-units modeling exist to solve — one physical signal, several names over time, resolved to a single governed identifier rather than the same string free-typed and silently drifting. (It is deliberately not owl:sameAs: the two tags are distinct named resources, while the underlying signal is what they share — collapsing the tags with owl:sameAs would wrongly merge their differing metadata.)

Two further pieces map cleanly onto the ontology stack and are worth naming because they turn convention into an enforced gate. First, the swap answers a competency question — a plain-English query the model must satisfy, the acceptance-test discipline Book 4 runs as PASS/FAIL checks: "return the full Protein A UV trace for BATCH-2026-001, across any equipment changeover." A SPARQL property path resolves the alias and reads the historian in one statement, the graph analogue of the tag IN (...) query above:

PREFIX bp: <https://example.org/bioproc#>
SELECT ?ts ?value WHERE {
  ?tag (bp:sameSignalAs|^bp:sameSignalAs)* bp:PA01.UV280.PV .  # any name of this one signal
  ?tag bp:reading [ bp:at ?ts ; bp:val ?value ] .
} ORDER BY ?ts

Second, the "never destroy history" rule has a closed-world enforcement that an OWL reasoner alone cannot give — a SHACL shape (the Shapes Constraint Language, which validates an RDF graph against shape rules). The retire-don't-delete invariant becomes a constraint: every alias must point at a new_tag that resolves in the current dictionary while its old_tag is retained, so a load that drops the historic name fails validation rather than silently orphaning the trace — exactly the SHACL release-gate pattern Book 4 uses to make an integrity rule machine-checkable instead of a habit engineers must remember. The relational alias table and this shape are the same invariant expressed in two registers; the graph view is what lets a second site re-prove that invariant against its own systems during tech transfer without re-inventing the vocabulary.

Three kinds of change — a versioned recipe, a swapped skid, and a migrated data format — each handled by adding a dated truth rather than erasing an old one, so the audit trail (the unbroken line beneath) survives every move.

Original diagram by the authors, created with AI assistance.

Migrating a data format, with verification and rollback

The last and most error-prone change is a data-format migration. An instrument vendor's firmware update changes the offline-assay export from a loosely typed legacy CSV (a plain text file where every field is just characters, with no recorded notion of which columns are numbers and which are text) to a self-describing columnar file — one that stores data column-by-column rather than row-by-row and records its own column names and types — and we want to standardise the historical archive on Apache Parquet — partly for size and speed, partly because Parquet files embed their own schema (their column names and types) in the file's metadata, so the file is its own documentation and supports schema evolution and verifiable round-tripping [10]. PIC/S is unambiguous that this is allowed only with a validated migration that proves the value and meaning of the data are unchanged, and that the old format is retained until that proof exists [5].

The companion repo's examples/tools/format-migrate/format_migrate.py follows a strict convert-then-verify-then-promote sequence, and it refuses to delete the source. The shape of it:

# examples/tools/format-migrate/format_migrate.py  (convert -> verify -> promote; never delete source)
import pandas as pd

def migrate(csv_path: str, parquet_path: str) -> None:
    src = pd.read_csv(csv_path, dtype={"sample_id": "string", "batch_id": "string"})
    src.to_parquet(parquet_path, engine="pyarrow", index=False)

    # VERIFY: read the new file back and assert it is value-identical to the source
    back = pd.read_parquet(parquet_path)
    assert list(back.columns) == list(src.columns), "schema drift on migration"
    pd.testing.assert_frame_equal(
        src.reset_index(drop=True), back.reset_index(drop=True),
        check_dtype=False,            # CSV is untyped; compare values, not storage dtype
    )
    # ROLLBACK is implicit: the source CSV is never touched, so failure leaves it intact.

Verification is the heart of it. We read the freshly written Parquet back, assert the column set is unchanged, and assert every value round-trips. (The check_dtype=False flag tells the comparison to check values, not storage types: a CSV is untyped — every field is just text — so the migrated Parquet will sensibly store a column as a number where the CSV had only characters, and we care that the values match, not that the storage type does.) If assert_frame_equal raises, the migration aborts and the original CSV is untouched — rollback is "do nothing destructive in the first place." Only after verification passes does the tool's --promote step move the CSV into a retained-archive location; it is never deleted. The first rows of the real source — which the Parquet must reproduce byte-for-byte in value — look like this:

# examples/datasets/offline_assays.csv  (first rows; identical values after migration to Parquet)
sample_id,batch_id,sample_time,sample_point,VCD_e6_per_mL,viability_pct,glucose_g_L,lactate_g_L,glutamine_mM,ammonia_mM,osmolality_mOsm_kg,titer_g_L,pH_offline
BATCH-2026-001-OFF-001,BATCH-2026-001,2026-01-05 06:00:00+00:00,BR101,0.34,96.6,6.18,0.13,4.13,0.68,293,0.002,7.06
BATCH-2026-001-OFF-002,BATCH-2026-001,2026-01-05 18:00:00+00:00,BR101,0.43,96.6,6.26,0.19,4.31,0.38,292,0.008,7.04
BATCH-2026-001-OFF-003,BATCH-2026-001,2026-01-06 06:00:00+00:00,BR101,0.56,99.0,6.01,0.32,3.83,0.45,287,0.014,7.05

These are not opaque numbers: each row is a real process record — viable cell density (VCD_e6_per_mL) and viability, the glucose/lactate and glutamine/ammonia metabolite pair, osmolality, and the accumulating titer (the antibody product concentration) that is the batch's whole point. Transpose glucose_g_L and lactate_g_L, or titer_g_L and osmolality_mOsm_kg, and you have silently rewritten the process record while every column header still looks plausible — which is precisely the failure the round-trip assert_frame_equal is there to catch.

For migrations at the dataset scale — many files in object storage rather than rows in Postgres — the book reaches for Git-like data versioning. lakeFS gives object-storage datasets commit/branch/merge/revert with zero-copy branching, so you stage the whole format migration on a branch, run verification against it, and either merge it or revert to the prior immutable commit if the verify fails — a true, atomic rollback for terabytes [11]. DVC takes a lighter approach, capturing each dataset version as a small .dvc pointer file committed to Git, so the history of the data lives beside the history of the code and you can git checkout your way back to the exact prior content [12]. Both preserve dataset genealogy across change; lakeFS suits a shared S3-style store, DVC suits a repo-centric workflow. Either way the principle is identical to the SQL migrations: stage, verify, then promote — and keep a path back.

Running it: deterministic outputs and the named tests that guard them

None of this is theory the reader takes on trust — the companion repo ships executable proof that each change behaves as described. Two pytest cases in examples/tests/test_chapters.py guard this chapter:

test_ch24_tag_remap_validates_committed_mapping loads the committed remap_PA01_to_PA02.csv through tag_remap.load_mapping and asserts the parsed aliases are exactly the three pairs (PA01.UV280.PV → PA02.UV280.PV), (PA01.Cond.PV → PA02.Cond.PV), (PA01.pH.PV → PA02.pH.PV), and that every effective date falls in 2026. If anyone edits the mapping into a malformed or duplicated state, the test fails before it ever reaches the database.
test_ch24_format_migrate_csv_to_parquet_roundtrips runs the real format_migrate.migrate against the real datasets/offline_assays.csv, then re-reads the Parquet and asserts the column set is unchanged, the row count matches the source (168 rows in the full shipped CSV — the snippet above shows only its first three), and the first sample_id is still BATCH-2026-001-OFF-001. This is the convert-verify step exercised end to end on production-shaped data.

Run them with pytest -k ch24 from examples/. The point is not the assertions themselves but what they encode: a migration whose correctness is asserted in code is a migration a validation lifecycle (the documented prove-it-works programme — computerised-system validation, CSV — that a firm must complete before relying on a GMP system) can cite as evidence, rather than a screenshot of a successful run.

When migrations go wrong: the field-failure record

It is worth grounding why this discipline is non-negotiable, because the failure mode is not hypothetical. Data-integrity and audit-trail deficiencies are among the most-cited problems in GMP inspections: the FDA's own data-integrity guidance was issued precisely because "data integrity-related cGMP violations" had become a frequent basis for warning letters and import alerts, and it defines the audit trail as the secure, time-stamped record that allows reconstruction of every creation, modification, and deletion [4]. A migration that silently overwrites a setpoint, renames a tag in place, or transposes two columns destroys exactly that reconstructability — and it is invisible until an inspector asks the system to reproduce a historical batch and the numbers no longer match the record.

The regulatory texts are blunt about the specific risk. PIC/S PI 041-1 treats data migration as a validated activity in its own right: when software or formats change, the firm must demonstrate that the migrated data retain their original meaning and that old data remain readable, retaining the legacy system where a validated migration is not possible [5]. Annex 11 clause 4.8 says the same for any transfer to another format or system — the migration "should be checked to ensure that the value and meaning of data are not altered" [1]. The convert-verify-promote tool above is the direct technical answer to that clause: the assert_frame_equal step is the documented check that value and meaning survived, and the refusal to delete the source is the legacy retention PIC/S requires. The lesson the field-failure record teaches is the one this whole chapter is built around — the changes that get firms quarantined are almost never the ones that were reversible, validated, and additive.

Why a learning model cares which version was true

Every change in this chapter is also a moment of truth for any model trained on this plant — a soft sensor, a release predictor — because a model is only as honest as the timestamps under its training data. Two of this chapter's mechanisms are, read the right way, machine-learning safeguards.

The first is effective-dating as a leakage guard. When you assemble a training table by joining each batch to "the recipe it ran," the only correct join is the point-in-time one this chapter built: BATCH-2026-001 must be labelled with the pH 6.95 that was true on 5 January, not the 7.05 that became true in March. Join to the current row instead and you have committed label leakage — the model learns from a setpoint the batch never experienced, scores beautifully in backtest, and fails in production. The same valid_from/valid_to discipline that keeps an auditor honest is what keeps a feature table honest; Book 5's data chapter treats this temporal-join correctness as a precondition of any trustworthy fold.

The second is the distinction between process drift and model drift, which the skid swap makes concrete. Replacing the PA01 batch skid with the PA02 PCC unit changes the input distribution a downstream model sees — different loading regime, different UV-trace shape — which is a covariate shift: the live process has genuinely moved, a real manufacturing signal the digital thread must preserve, not a defect. A model drifting stale against that moving process is the defect. Book 5's MLOps chapter draws exactly this line, and the change-control event here is its trigger: a skid swap or a recipe shift is precisely when a locked model must be re-checked for applicability domain — does a PA02 batch still resemble the PA01 batches the model was calibrated on? — and, if not, re-validated under its own change control rather than silently trusted out of its trained envelope. (This is the same calibration-transfer problem a Raman model faces on a probe swap, which Book 5's models-and-validation chapter treats as a fresh validation burden, never a free reset.)

There is also a quieter benefit to the equipment genealogy and the unbroken tag history: they are the grouping key an honest validation split needs. Because every record traces — through s88.genealogy and the alias table — back to the batch and the skid that produced it, a leave-one-batch-out (grouped) cross-validation can put every reading from one batch wholly on one side of the train/test line, the discipline Book 5 makes default to stop near-twin sibling batches from leaking across the split. And when a model is itself deployed, its lineage — which dataset hash, which model version, which recipe epoch it scored against — is one more dated, additive fact recorded the same never-destroy-history way, so a later audit can walk from a released lot to the exact frozen model and the exact recipe version that touched it.

Why it matters

Every other capability in this book — the historian, the contextualization views, the knowledge graph, the soft-sensor — assumes the data underneath it is stable and truthful. Change is where that assumption goes to die. A recipe setpoint overwritten in place makes every historical batch report subtly wrong. A skid renamed instead of retired orphans years of chromatography traces. A format migration without verification can transpose two columns and nobody notices until a regulator does. The techniques here — effective-dating, exclusion constraints, reversible migrations, alias-based tag re-mapping, convert-verify-promote — are not gold-plating. They are the difference between a platform a quality unit will trust and one they will quarantine. And because every one of them adds a dated truth rather than erasing an old one, the audit trail you built in Chapter 23 survives intact through every change, which is precisely what Annex 11, ICH Q10, and the data-integrity guidance demand.

In the real world

In a validated GMP environment, none of these changes happens because an engineer feels like it. Each is preceded by a change-control record — a quality-managed document that states what is changing, the risk assessment, the validation impact, the approvals, and the back-out plan — and that is true whether your stack is open source or a wall of commercial systems. ICH Q12's Established Conditions and PACMP machinery exist precisely so that recurring changes (a feed-strategy tweak, a column resin re-qualification) can be pre-agreed with the regulator rather than re-litigated each time [3]. The tooling shown here implements the technical half of that; the SOPs, approvals, and validation deliverables are the operator's burden and are not something any download confers.

The honest open-source verdict for this chapter is comparatively kind. Schema migration is one area where OSS is genuinely strong: Sqitch and Flyway are mature, widely used, and produce exactly the deploy/revert/verify evidence a validation lifecycle wants — Flyway's checksumming even gives you tamper-evidence on the migration scripts themselves. PostgreSQL's range types and exclusion constraints are a first-class, no-extension answer to effective-dating that many expensive systems lack. The places where pure OSS still falls short are the familiar ones: the change-control workflow itself (electronic approvals, e-signatures on the change record, linkage to a validated quality system) is not something Sqitch provides — that lives in a commercial quality-management system or the signing-service-plus-Keycloak (an open-source identity-and-access broker, paired with a service that applies e-signatures) hybrid the book builds in Chapter 24 — and automatic, validated point-in-time recovery as the safety net behind every migration leans on the backup machinery configured in the next chapter. A shared, multi-partner data platform that cannot evolve without breaking its own history is not a platform anyone will build a process on.

Key terms

Change control — the GMP-mandated, quality-managed process for proposing, assessing, approving, and recording any change to a validated system or process. An element of the pharmaceutical quality system under ICH Q10.
Effective-dating (valid-time versioning) — storing a value with valid_from/valid_to so a recipe or mapping can be versioned without overwriting history; a point-in-time query returns whatever was true on a given date. Pairing this single (valid-time) axis with the Chapter 23 transaction-time audit log is what makes the record fully bitemporal.
Exclusion constraint — a PostgreSQL constraint (EXCLUDE USING gist ... WITH &&) that, unlike UNIQUE, forbids two rows whose time ranges overlap — the database-level guard for non-overlapping versions.
Range type — PostgreSQL's tstzrange/daterange types modeling a span with bounds and an overlap operator, the basis for effective-dated validity periods.
Version pair (closed/open row) — the two-row result of one effective-dated change: the outgoing row's valid_to is closed to the effective instant while a new row opens there with valid_to = 'infinity', so a point-in-time query for any date matches exactly one row.
Half-open window ([)) — an interval tstzrange(valid_from, valid_to, '[)') that includes its start and excludes its end, so the boundary instant belongs to exactly one version and adjacent windows can butt without overlapping.
Sqitch plan line — the single append-only entry in sqitch.plan for one change: its name, planned timestamp, committer, # note, and any requires: dependency — the change's audit-trail header.
Sqitch — an MIT-licensed database-change framework; each change is a paired deploy/revert/verify trio, verified during deploy and auto-reverted on failure.
Flyway — a versioned-migration tool that applies each script once with a checksum and offers U-prefixed Undo migrations (but advises pairing undo with a restorable backup).
Established Conditions / PACMP — ICH Q12 mechanisms defining what is legally fixed in a process and pre-agreeing how future changes will be made and reported.
Tag re-mapping / alias — recording an old→new tag correspondence (per ISA-95 Part 7's Alias Service Model) so a measurement keeps one logical identity across an equipment swap, without rewriting historic readings.
Apache Parquet — a self-describing columnar file format that embeds its own schema, enabling schema evolution and verifiable format migration.
lakeFS / DVC — Git-like versioning for datasets: lakeFS gives commit/branch/revert over object storage with zero-copy branching; DVC tracks data versions via lightweight .dvc pointer files in Git.
Convert-verify-promote — the safe data-migration pattern: write the new format, read it back and assert value-identity, and only then promote it, never deleting the validated source.
Value-identity bridge (sameSignalAs) — the RDF/SHACL reading of the tag alias: a predicate asserting that two named tags denote one logical signal across an equipment swap, the open-source twin of ISA-95 Part 7's Alias Service Model and Book 4's identifier modeling; deliberately not owl:sameAs, which would wrongly merge the tags' distinct metadata.
Temporal-join leakage guard — using the point-in-time (valid_from/valid_to) join when building a model's training table, so a batch is labelled with the recipe that was true when it ran, not a later version; the same effective-dating that keeps an auditor honest prevents label leakage.
Process drift vs model drift — a recipe change or skid swap is process drift (the live process genuinely moving, a real signal the thread preserves) and is the trigger to re-check a model's applicability domain and re-validate it; model drift (a predictor going stale against that moving process) is the defect to detect.

Where this leads

The platform can now evolve — recipes versioned, skids swapped, formats migrated — without ever breaking the record. But evolution is only half of "operating at scale." A change is only as safe as the backup behind it, the monitoring that catches a failed migration at 3 a.m., the network segmentation that keeps the OT (operational technology — the plant-floor control and instrumentation network) side isolated, and the supply-chain discipline that keeps a pinned image from becoming a vulnerability. In the next chapter, Operating, Scaling & Securing the Platform, we turn this from a thing that runs on a laptop into a thing you could responsibly run in production: backup and point-in-time recovery, TLS (the encryption that protects data in transit) and zone-and-conduit segmentation (the industrial-network practice of grouping systems into trust zones and allowing traffic between them only through controlled conduits), self-monitoring, and the CVE-watch runbook (CVE being a publicly catalogued security vulnerability) that treats even the security scanners as validated suppliers.

What this chapter covers​

Change is a regulated event​

Effective-dating a recipe, for real​

Stopping overlaps at the database​

Anatomy of an effective-dated recipe-parameter row (the central record, field by field)​

Reversible, validated migrations with Sqitch​

Anatomy of a Sqitch change (deploy, revert, verify, and the plan line)​

Swapping a skid without orphaning history​

The skid-swap changeover, step by step (a numbered walkthrough)​

The same swap, as triples: the knowledge-graph view​

Migrating a data format, with verification and rollback​

Running it: deterministic outputs and the named tests that guard them​

When migrations go wrong: the field-failure record​

Why a learning model cares which version was true​

Why it matters​

In the real world​

Key terms​

Where this leads​