ALCOA+ by Construction: Integrity in Code

📍 Where we are: Part V · Trust — Chapter 23. The platform — the PostgreSQL-based data system we have been building since Chapter 1 — now captures, stores, contextualizes, and visualizes the batch (the production run whose records flow through it). This chapter makes the data trustworthy by construction: we encode the data-integrity rules every inspector cares about straight into PostgreSQL schema, triggers, and a hash chain — and we write tests that prove the guarantees hold.

The simple version

Imagine a lab notebook where every page is numbered, written in indelible ink, signed and dated as you go, and bound so tightly that you cannot tear a page out, slip one in, or shuffle the order without it showing. If someone did tear out or reshuffle pages, the page numbers would stop adding up and everyone would notice. That is what we build here, in code: every change to a regulated record is appended (never overwritten), stamped with who, when, and why, and cryptographically linked to the change before it — so the sequence of the history is tamper-evident. We will be precise about the limits: the link check catches a deleted, reordered, or spliced entry, not a silent edit to an entry's contents that leaves the links intact — and certainly not tamper-proof, which no SQL file can promise. Tamper-evident is nonetheless the bar a regulator actually sets, and we will say exactly how far this reaches.

What this chapter covers

For nineteen chapters we have moved data from sensors into a clean, contextualized, queryable platform. Now we have to make a regulator trust it. The shorthand for that trust is ALCOA+ — data must be Attributable, Legible, Contemporaneous, Original, and Accurate (those first five letters spell ALCOA), plus the four the + adds: Complete, Consistent, Enduring, and Available [1]. It is the spine of the FDA's data-integrity expectations [2], EU Annex 11 (the annex of the EU's GMP — Good Manufacturing Practice, the legally enforced quality rules every medicine maker must follow — guide that governs computerized systems) [3], and the PIC/S (Pharmaceutical Inspection Co-operation Scheme — the international body that aligns GMP inspectors) inspector framework [4].

The temptation is to treat ALCOA+ as a checklist you audit after the fact. This chapter argues the opposite: integrity is cheapest and strongest when engineered in. We will:

map each ALCOA+ attribute to a concrete schema or pipeline mechanism (append-only, quality flags, attributable metadata);
build a trigger-based audit trail in PostgreSQL that records old/new/who/when/why for every change to a regulated table;
hash-chain that audit log so any retroactive edit becomes detectable;
and run a pytest suite that mechanically asserts the guarantees — append-only behavior, an intact chain, captured updates.

Every snippet here comes from two real, tested files in the companion repo: the DDL (Data Definition Language — the SQL that creates tables, triggers, and functions) in examples/platform/db/50-alcoa.sql, applied automatically when the database first initializes (the db/ directory is mounted — made visible inside the container — at Postgres's /docker-entrypoint-initdb.d, a special folder Postgres auto-runs on first start, so make up brings the stack up and runs it), with make seed then adding rows that trigger logging — and the assertions in examples/tests/test_db.py, run by make test. Nothing is illustrative. You can break the chain on your own laptop and watch the test go red.

ALCOA+ is a set of design requirements, not a poster

Before any code, it helps to read ALCOA+ as a list of engineering requirements, because that is what it becomes once you stop treating it as compliance wallpaper. The MHRA's (the UK's Medicines and Healthcare products Regulatory Agency) guidance is the cleanest statement of the attributes and of the crucial point that metadata is part of the record — the who and when around a value are as regulated as the value itself [5].

ALCOA+ attribute	What it demands of the system	Where we enforce it in code
Attributable	Every value traces to a person or device	`db_user` / `app_user` columns on the audit log
Legible	Records are readable and permanent	`jsonb` old/new snapshots, plain SQL, durable storage
Contemporaneous	Recorded at the time the event happened	`clock_timestamp()` server timestamp on each row
Original	The first capture is preserved, not overwritten	Append-only `change_log`; never `UPDATE`/`DELETE`
Accurate	Values are correct and quality-flagged	legacy OPC DA (an older industrial data protocol) `quality` flag on every reading
Complete	Nothing is silently dropped, including changes	A trigger fires on every INSERT/UPDATE/DELETE
Consistent	The sequence is intact and ordered	Monotonic (always-increasing) `seq` identity + the hash chain
Enduring / Available	Survives and is retrievable	PostgreSQL durability + retention (Ch 26)

Two of these were already paid for in earlier chapters. Accurate rides on the quality flag the historian (the time-series database that stores process readings, built in Chapter 9) carries on every reading — recall the column from examples/platform/db/20-historian.sql, where the legacy OPC DA status codes (192 Good, 64 Uncertain, 0 Bad — bit-coded values defined by the OPC standard, not arbitrary) Chapter 7 established are stored alongside the value so a downstream consumer can never mistake a Bad reading for a trustworthy one. Contemporaneous rides on the fact that the collector stamps each reading at acquisition time rather than at insert time. This chapter's job is the harder three — Attributable, Original, Complete — and the tamper-evidence that ties Consistent together. The remaining three — Legible, Enduring, Available — fall out of the table's design (the readable jsonb snapshots, PostgreSQL's durable storage, and the retention work in Chapter 26) rather than needing their own machinery here.

The audit trail: a trigger that watches the regulated tables

The mechanism is a single PostgreSQL trigger function attached to the tables that hold GMP records. A database trigger is a small function the database runs automatically whenever a row changes; a row-level trigger runs once per affected row and sees the OLD (before) and NEW (after) versions of that row plus the operation type — INSERT, UPDATE, or DELETE — in the variable TG_OP. That is exactly the raw material an audit trail needs [8]. Here is the append-only log table, from examples/platform/db/50-alcoa.sql:

CREATE TABLE audit.change_log (
    seq        bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
    ts         timestamptz NOT NULL DEFAULT clock_timestamp(),
    db_user    text NOT NULL DEFAULT current_user,
    app_user   text,                              -- set via SET app.user = '...'
    table_name text NOT NULL,
    action     text NOT NULL,                     -- INSERT | UPDATE | DELETE
    row_key    text,
    old_row    jsonb,
    new_row    jsonb,
    reason     text,                              -- set via SET app.reason = '...'
    prev_hash  text,
    row_hash   text NOT NULL
);

A word on the SQL types, for readers new to them: bigint is a large integer, text is a string, timestamptz is a timestamp with time zone, and jsonb is a JSON value stored in a binary form the database can index and query. NOT NULL forbids an empty value, DEFAULT supplies one when none is given, and PRIMARY KEY marks the column that uniquely identifies each row.

Read this table as ALCOA+ rendered in columns. db_user and app_user make a change Attributable — the first to the database role, the second to the human the application authenticated (we set it with SET app.user = '...', and Chapter 24 wires it to a real Keycloak identity — Keycloak being the open-source login/identity manager introduced there). ts defaulting to clock_timestamp() makes it Contemporaneous — note it is clock_timestamp(), the actual wall-clock instant the row is written, not now(), which would freeze at the start of the transaction (the group of statements committed together), reporting the same time for every row in a batch of writes. old_row and new_row as jsonb keep the record Legible and Original: the full before-and-after image of the row is preserved, not just a diff. And the table is append-only by use — nothing in the system ever issues an UPDATE or DELETE against it. The reason column is the regulator's favorite, capturing the why behind every change.

Anatomy of an audit.change_log row: twelve columns, one link

The whole chapter lives in one row of this table, so it is worth dissecting field by field — the way Chapter 7 took apart an OPC UA node (the modern successor to the legacy OPC DA named above) and Chapter 9 took apart one historian reading. Take the row a verify_chain()-clean log would hold after analyst jdoe corrects the glucose of sample BATCH-2026-001-OFF-014 to 7.8 g/L after re-reading the bench-analyzer printout, and read it column by column: the first six say what and who and when, the seventh (row_key) names which record changed, the next three carry the before, after, and why, and the last two — prev_hash and row_hash — are the link that makes the sequence tamper-evident.

An identity card for one audit.change_log row, listing all twelve columns — seq, ts, table_name, action, db_user, app_user, row_key, old_row, new_row, reason, prev_hash, row_hash — each with its value and meaning, with the prev_hash plus row_hash chain-link pair highlighted as the most important field. One audit.change_log row as an identity card: twelve columns capture who changed what, when, why, the before-and-after image, and the SHA-256 link (prev_hash then row_hash) that binds the row to the one before it. Original diagram by the authors, created with AI assistance.

Four of the twelve columns carry the chapter's load. seq is GENERATED ALWAYS AS IDENTITY — a value PostgreSQL assigns and the application cannot override, so the order of history is the database's to keep, not the writer's. row_key is not a raw primary key but a coalesced one: the trigger computes coalesce(new_row ->> 'batch_id', old_row ->> 'batch_id', new_row ->> 'sample_id', old_row ->> 'sample_id'), where ->> pulls a named field out of the JSON row as text and coalesce(...) returns the first of those that is not null. So a change to a batch keys on its batch_id and a change to a lab result keys on its sample_id (here BATCH-2026-001-OFF-014, a real in-process sample from offline_assays.csv) — one column that points at the changed record whatever table it came from. And prev_hash/row_hash are the pair we unpack next: every other column is evidence, but these two are the chain.

Where this row comes from in the trilogy

This audit.change_log row is the third stop on a chain that runs the length of the trilogy. The physical event it records — an operator or analyst correcting a critical process parameter or an out-of-specification result — is the GMP change discussed in Book 1, Quality, Regulatory, and Data. Book 2 turns that change into a data-integrity problem, mapping the very 12-column row above onto the nine ALCOA+ attributes and asking how a system can make it trustworthy: Data Integrity and ALCOA+. This chapter is the answer in code — the trigger, the schema, and the hash chain that implement what those two chapters demand.

The trigger function does the work. Also from examples/platform/db/50-alcoa.sql:

CREATE OR REPLACE FUNCTION audit.log_change() RETURNS trigger AS $$
DECLARE
    v_prev  text;
    v_key   text;
    v_old   jsonb := CASE WHEN TG_OP = 'INSERT' THEN NULL ELSE to_jsonb(OLD) END;
    v_new   jsonb := CASE WHEN TG_OP = 'DELETE' THEN NULL ELSE to_jsonb(NEW) END;
    v_app   text  := current_setting('app.user', true);
    v_reason text := current_setting('app.reason', true);
    v_hash  text;
BEGIN
    SELECT row_hash INTO v_prev FROM audit.change_log ORDER BY seq DESC LIMIT 1;
    v_key := coalesce((v_new ->> 'batch_id'), (v_old ->> 'batch_id'),
                      (v_new ->> 'sample_id'), (v_old ->> 'sample_id'));
    -- chain hash = H(prev_hash || payload)
    v_hash := encode(digest(
        coalesce(v_prev, '') || TG_TABLE_NAME || TG_OP ||
        coalesce(v_old::text, '') || coalesce(v_new::text, '') ||
        coalesce(v_app, '') || clock_timestamp()::text, 'sha256'), 'hex');

    INSERT INTO audit.change_log(app_user, table_name, action, row_key,
                                 old_row, new_row, reason, prev_hash, row_hash)
    VALUES (v_app, TG_TABLE_NAME, TG_OP, v_key, v_old, v_new, v_reason, v_prev, v_hash);
    RETURN coalesce(NEW, OLD);
END;
$$ LANGUAGE plpgsql;

Three lines deserve a closer look. to_jsonb(OLD) and to_jsonb(NEW) serialize the entire row to JSON, so the log is schema-agnostic — it works regardless of which columns a table has, letting the same function audit a lab result, a batch, or a recipe parameter without one column of bespoke (table-specific, hand-written) code. current_setting('app.user', true) reads a session variable the application sets at the start of a transaction; the true makes it return NULL rather than erroring if the variable was never set. And the digest(...,'sha256') call is the hash chain, which we unpack next.

The trigger is attached to exactly the tables that hold regulated records, also in examples/platform/db/50-alcoa.sql:

CREATE TRIGGER audit_result   AFTER INSERT OR UPDATE OR DELETE ON lab.result
    FOR EACH ROW EXECUTE FUNCTION audit.log_change();
CREATE TRIGGER audit_batch    AFTER INSERT OR UPDATE OR DELETE ON s88.batch
    FOR EACH ROW EXECUTE FUNCTION audit.log_change();
CREATE TRIGGER audit_recipe_p AFTER INSERT OR UPDATE OR DELETE ON s88.recipe_parameter
    FOR EACH ROW EXECUTE FUNCTION audit.log_change();

These are the three tables an inspector cares most about: analytical results (lab.result), the batch record itself (s88.batch), and the recipe parameters that define how the product is made (s88.recipe_parameter, which Chapter 27 versions in place using its valid_from/valid_to columns). Because the triggers fire on every INSERT OR UPDATE OR DELETE, nothing slips by — that is the Complete attribute made mechanical.

The hash chain: linking the records together

An append-only log is only as honest as the storage under it. A determined insider with table access could, in principle, reach into audit.change_log and reorder rows, delete a middle entry, or splice in a fabricated one. We make that class of attack detectable with a technique older than the cloud: a linked hash chain, first described by Haber and Stornetta in 1991 as a way to time-stamp digital documents so that the sequence cannot be quietly reordered or altered [7]. It is the same construction that later underpinned blockchains — but we need none of the distributed-consensus machinery, just the linking.

How prev_hash binds the rows

A SHA-256 hash is a short, fixed-length fingerprint (always 64 hexadecimal characters) computed from some input; change a single byte of the input and the fingerprint changes completely and unpredictably. That one property — same input always gives the same fingerprint, any change gives a different one — is what makes tampering detectable.

The idea is simple. When the trigger writes a row, it computes a SHA-256 hash over that row's payload and the stored hash of the row before it, then saves both values: the prior row's hash in prev_hash and the new digest in row_hash. Conceptually:

row_hash[n] = SHA256( row_hash[n-1] || table || op || old || new || app_user || clock_timestamp() )
prev_hash[n] = row_hash[n-1]

The digest is not hashing one thing but a concatenation of seven: the prior row's hash, then this row's table, op, old, new, app_user, and a timestamp — joined with || (the SQL operator that glues strings end to end, not a logical OR here) and fed to digest(..., 'sha256') in one call. The first part is the link to the past; the rest is this row's own payload. Breaking it open part by part shows both why a relinked or reordered row stops verifying and why the digest is not reproducible from the saved columns:

The SHA-256 input string broken into seven concatenated parts — prev_hash as the link, then table, op, old, new, and app_user as the payload, plus a not-stored fresh clock_timestamp — all fed through one SHA-256 to produce the 64-hex-character row_hash. The row_hash digest broken open: prev_hash links to the row before, then this row's table, op, old, new, and app_user form the payload, and a fresh clock_timestamp() — never persisted — is folded in, which is why the 64-hex-character row_hash cannot be reproduced from the stored columns. Original diagram by the authors, created with AI assistance.

Two honesty notes about this digest, because the chapter's whole point is to claim only what the code delivers. First, the hash folds in clock_timestamp() evaluated inside the trigger — a wall-clock instant that is not the value persisted in the ts column (which has its own independent clock_timestamp() default). Nothing in the stored row records the exact timestamp that went into the hash, so row_hash is not reproducible from the saved columns. Second, and as a consequence, the shipped verifier never recomputes row_hash from the payload at all — it only checks that the links between the stored prev_hash and row_hash columns are consistent. We will see exactly what that does and does not catch.

The pgcrypto extension (an optional add-on module you turn on inside PostgreSQL) provides the in-database digest() and encode() functions that compute and hex-encode that SHA-256 [9]: digest(...) produces the raw 32-byte hash and encode(..., 'hex') renders it as the 64-character hex text you see stored. It is enabled once, at the top of the stack, in examples/platform/db/00-init.sql:

CREATE EXTENSION IF NOT EXISTS pgcrypto;     -- digest() for the ALCOA+ hash chain

Because each row stores the previous row's hash in its prev_hash column, the log becomes a chain whose links can be checked: delete a row, reorder rows, or overwrite a prev_hash/row_hash value, and the stored links stop lining up. That is the property the verifier tests — and, crucially, the only one it tests.

A vertical chain of audit-log rows, each box showing seq, action, who, and old/new JSON; an arrow carries each row's stored SHA-256 row_hash down into the next row's stored prev_hash column. One link is overwritten so the stored prev_hash no longer equals the prior row's row_hash, turning that link red and flagging that seq as a broken link.

The audit log as a linked hash chain: each row stores the prior row's row_hash in its prev_hash column, so deleting, reordering, or overwriting a hash-link column breaks the stored links and verify_chain() flags the first seq where a prev_hash no longer matches the previous row's row_hash. Note the verifier checks link consistency only — it does not recompute hashes, so a silent edit to a row's payload that leaves the hash columns alone is not caught here. Original diagram by the authors, created with AI assistance.

Detecting a broken link is its own function, the last block in examples/platform/db/50-alcoa.sql:

-- Verify the chain is intact: returns rows where a stored prev_hash does not
-- equal the previous row's stored row_hash (a broken/reordered/deleted link).
CREATE OR REPLACE FUNCTION audit.verify_chain()
RETURNS TABLE(seq bigint, ok boolean) AS $$
    WITH chained AS (
        SELECT c.seq, c.row_hash, c.prev_hash,
               lag(c.row_hash) OVER (ORDER BY c.seq) AS expected_prev
        FROM audit.change_log c
    )
    SELECT seq, (prev_hash IS NOT DISTINCT FROM expected_prev) AS ok
    FROM chained
    WHERE prev_hash IS DISTINCT FROM expected_prev;
$$ LANGUAGE sql;

The lag(...) OVER (ORDER BY c.seq) window function walks the log in sequence order and, for each row, looks back at the previous row's stored row_hash. If that does not match the prev_hash the current row recorded, the stored link is broken (the IS DISTINCT FROM / IS NOT DISTINCT FROM operators do this comparison in a way that treats two NULLs as equal, so the first row's empty prev_hash is handled cleanly). A healthy chain returns zero rows — every link is consistent. Any row it does return points at where the stored links stopped lining up: that seq is the first place the chain failed to add up. Be precise about scope: this compares only the prev_hash and row_hash columns. It does not re-derive a hash from old_row/new_row/app_user, so it detects deleted, reordered, or relinked rows — not a silent edit to a row's payload that leaves the hash columns untouched. We will use exactly the attack it can catch in the demo below, and return to the gap it leaves in "In the real world." (The repo's own comment on this function is careful to match: it says the verifier "returns rows where a stored prev_hash does not equal the previous row's stored row_hash" and adds "this checks link consistency only; it does NOT recompute row_hash from the payload" — deliberately not describing it as recomputing a hash, because nothing is recomputed.)

A typical healthy log looks like this when you query it directly (an illustrative healthy log, analyst names stylized — the seed leaves app_user NULL and the tests write as pytest, so these exact seed/jdoe rows are representative rather than seeded):

 seq |             ts             | app_user | table_name | action |        row_key         |  prev_hash  |  row_hash
-----+----------------------------+----------+------------+--------+------------------------+-------------+-------------
   1 | 2026-01-05 00:00:00.142+00 | seed     | batch      | INSERT | BATCH-2026-001         | (null)      | 9f2a...c41b
   2 | 2026-01-05 00:00:00.197+00 | seed     | result     | INSERT | BATCH-2026-001-OFF-001 | 9f2a...c41b | 1ce8...77a0
   3 | 2026-01-13 09:14:22.030+00 | jdoe     | result     | UPDATE | BATCH-2026-001-OFF-014 | 1ce8...77a0 | b430...e9f2

Row 3 is an analyst correcting a result on January 13 — sample BATCH-2026-001-OFF-014's glucose re-keyed to its true 7.8 g/L: app_user is jdoe, the action is UPDATE, and its prev_hash equals row 2's row_hash. The links add up, so verify_chain() returns nothing.

Flow from a regulated row through an AFTER ROW trigger that fans out to the append-only audit.change_log and the row_hash link, both feeding audit.verify_chain.

Proving it with tests, not promises

A guarantee you have not tested is a hope. The companion repo treats the integrity rules as executable acceptance criteria, asserting them with a pytest suite — pytest being Python's test runner, and an assert statement a line that fails the test if its condition is not true; pytest's plain-assert rewriting lets a one-line assertion stand in for a full integrity check [10]. These tests run against the live stack (the running set of services brought up by make up && make seed && make load — make load ingests the sample data — and torn down again) and skip cleanly if the database is unreachable. From examples/tests/test_db.py:

def test_alcoa_chain_intact(conn):
    assert _scalar(conn, "select count(*) from audit.change_log") > 0
    assert _scalar(conn, "select count(*) from audit.verify_chain()") == 0  # 0 broken links

That second assertion is the whole chapter in one line: the chain has entries, and zero of its links are broken. The next test is the more interesting one, because it actually exercises an edit and proves the trail captured it:

def test_audit_captures_update(conn):
    # an UPDATE must record old + new + who + why and keep the chain intact
    with conn.cursor() as cur:
        cur.execute("select set_config('app.user','pytest',false), "
                    "set_config('app.reason','test correction',false)")
        cur.execute("update lab.result set value = value where result_id = "
                    "(select result_id from lab.result limit 1)")
        conn.commit()
    last = _scalar(conn, "select action from audit.change_log "
                         "where app_user='pytest' order by seq desc limit 1")
    assert last == "UPDATE"
    assert _scalar(conn, "select count(*) from audit.verify_chain()") == 0

Walk through what it proves. set_config('app.user','pytest', ...) and set_config('app.reason', ...) set the attributable who and the reason why — exactly the session variables the trigger reads. The update lab.result set value = value is a deliberately trivial edit (it sets the value to itself), yet the trigger still fires and logs an UPDATE row, because the audit trail records the act of changing, not just net differences. The suite then asserts the most recent row by pytest is an UPDATE and that the chain is still intact afterward. Attributable, Original, Complete, Consistent — all four, mechanically verified, on a laptop. (This in-place UPDATE is purely to show the trigger captures any change; the production correction path is Chapter 10's pattern of appending a new verified status row under a UNIQUE(sample_id, test_id, result_ts) constraint — a rule that forbids two rows sharing that combination, so each correction lands as a distinct timestamped row — never mutating the original in place.)

What the verifier catches — and what it does not

To watch the verifier catch tampering, you do not need new code; you reach into the log and break a link directly. The attack verify_chain() is built to detect is one that disturbs the stored hash columns — deleting a row, reordering rows, or splicing the chain. Here we sever the link by overwriting one row's prev_hash:

-- Simulate an insider trying to splice the chain by relinking row 3.
UPDATE audit.change_log SET prev_hash = 'deadbeef'
WHERE seq = 3;
SELECT * FROM audit.verify_chain();

 seq | ok
-----+----
   3 | f

Row 3's stored prev_hash no longer equals row 2's stored row_hash, so the link no longer lines up — the verifier flags seq = 3 as the first broken link, and test_alcoa_chain_intact would turn red on the next make test. Deleting row 2 outright, or swapping the seq ordering, breaks the links the same way and is caught the same way. The tampering did not stay hidden.

Be equally clear about what this verifier does not catch, because it is the difference between tamper-evident and a claim we cannot back. If instead of breaking a link the insider silently edits a row's payload — UPDATE audit.change_log SET new_row = jsonb_set(new_row, '{value}', '99.9') WHERE seq = 2; — and leaves the prev_hash/row_hash columns alone, then verify_chain() returns zero rows: every stored link still lines up, and the edit goes undetected. Closing that gap needs a verifier that recomputes each row_hash from the payload and compares it to the stored value — which, as noted above, the current schema cannot support because the hashed clock_timestamp() is never persisted. We name that limitation honestly here and revisit the fix (hash the stored ts, store the chain head off-database) in "In the real world" and Chapter 24.

Why it matters

Data integrity is the single most common theme in FDA warning letters and EU GMP findings — not because companies set out to falsify data, but because their systems make integrity optional: an audit trail you can switch off, a value you can overwrite with no trace, a timestamp the user can set. ALCOA+ is the language regulators use to describe what "trustworthy" means, and Annex 11 and 21 CFR Part 11 (Code of Federal Regulations Title 21 — the FDA's electronic-records and electronic-signatures regulation) turn it into law for computerized systems [3][2].

What an audit-trail failure looks like in the field

The failure mode is not exotic. The FDA's own data-integrity guidance is organized around the questions it keeps having to answer — whether audit trails must be reviewed (yes, as part of routine record review), whether shared logins are acceptable (no, because they destroy Attributable), whether a system that lets users disable the audit trail or backdate an entry is acceptable (no) — which is a fair inventory of the deficiencies inspectors actually write up [1]. The recurring shapes are concrete: an analyst reusing a shared account so no change traces to a person; "testing into compliance" by overwriting an out-of-specification result (one that fell outside its approved acceptance limit) — say an aggregate-level result (SEC HMW%, the high-molecular-weight fraction measured by size-exclusion chromatography) that came back above its 3.0% spec — with a passing one and leaving no before image; a clock the operator can set, defeating Contemporaneous; or an audit-trail feature present in the software but switched off in configuration. PIC/S PI 041 catalogs the same patterns and tells inspectors to look for exactly them [4], and the MHRA's guidance pins down why they matter — the metadata around a value (who, when, why) is itself part of the regulated record, so losing it is losing the record [5].

Every one of those failures is unreachable under the schema we built. There is no shared-login path that strips app_user, because the trigger captures both db_user and the application identity on every write; there is no overwrite-without-trace, because an UPDATE to lab.result appends a row with the old and new images rather than replacing the value in place; there is no user-settable clock, because ts is the server's clock_timestamp(); and there is no off-switch a routine user can reach, because the trigger fires on every operation by construction. The point of "by construction" is precisely that the common warning-letter findings stop being possible to commit by accident.

Building integrity into the schema flips the default. With the trigger attached, there is no path to change a regulated row that does not write an attributable, reasoned, timestamped, hash-linked audit entry — the right thing is the only thing the database lets you do. That is what "by construction" means, and it is far stronger than a procedure that asks people to remember to document changes. The FDA's own data-integrity guidance frames audit trails and attributable metadata as a CGMP expectation that must be implemented, not merely promised [1]; we have implemented it in about seventy lines of SQL.

It also makes review-by-exception possible — a reviewer inspects only the records that were changed or that deviated (a GMP deviation — a documented departure from the approved procedure or spec), not every record. An inspector or a QA reviewer can run one query — SELECT * FROM audit.change_log WHERE table_name = 'result' AND action = 'UPDATE' — and see every correction ever made to a result, with the before image, the after image, the analyst, and the reason. That is the audit-trail review that Part 11 and PIC/S PI 041 expect, and it is a query, not a forensic exercise [4].

In the real world

Here is the honest reckoning this book promises. The pattern we built is real, runnable, and standards-aligned: it implements the system-versioned-history idea from SQL:2011 (the 2011 revision of the SQL standard) — preserving the original row when it changes, the way a temporal table automatically keeps its own history. PostgreSQL (through version 18) still does not implement that SQL:2011 system-versioned temporal-table feature [6] natively, so a trigger is the idiomatic open-source way to get the behavior. The hash chain is the genuine Haber–Stornetta construction [7]. And the whole suite is runnable on every commit via make test.

Tamper-evident, not tamper-proof: the honest bound

But it is tamper-evident, not tamper-proof, and there are two distinct edges to be honest about. The first is a limit of the verifier itself, narrower than the strong "any edit is caught" story is tempting to tell: verify_chain() checks only that the stored prev_hash/row_hash links line up, so it catches a deleted, reordered, or relinked row but not a silent edit to a row's old_row/new_row/app_user/reason payload that leaves the hash columns alone. Worse, the current schema cannot easily be upgraded to a recomputing verifier, because the digest folds in a fresh clock_timestamp() that the row never stores — so even a correct re-derivation could not reproduce row_hash. The fix is a small repo change we flag for Chapter 24: hash the row's stored ts value (or persist the exact instant hashed) so row_hash becomes reproducible, then add a verifier that recomputes each digest from the saved columns and compares it. Only then does "altering any historical entry is detected" become a true statement rather than an aspiration.

The second edge is operational and bigger. A PostgreSQL superuser (an all-powerful admin role) — or the owner of the audit.change_log table — can DISABLE TRIGGER to switch the audit logging off, edit a regulated row with no audit entry, then, because they hold the very same hashing tools the system uses, recompute the entire hash chain from the point of edit forward and leave a chain that verifies cleanly. Our defense makes casual tampering visible and raises the cost of deliberate tampering enormously, but it cannot defeat the database administrator. Closing that gap is not a code problem; it is an operational one: segregation of duties so the people who can administer the database are not the people who own the data, locked-down roles, an off-database copy of the chain head (the latest row_hash, written periodically to write-once storage the DBA cannot later alter — such as a SeaweedFS WORM (write-once-read-many) bucket we stand up in Chapter 24, or an RFC 3161 trusted timestamp, a standard cryptographic proof that a value existed at a given time), and pgAudit (a PostgreSQL extension that logs privileged database sessions) — all of which Chapter 24 and the validation work in Chapter 25 take on. No open-source component is 21 CFR Part 11-compliant out of the box, and this one is no exception: compliance is a property of the validated system and its procedures, never of a SQL file.

This is also where the OSS-versus-commercial line falls. A validated (formally tested and documented to meet GMP, per Chapter 25) commercial historian or MES (Manufacturing Execution System — the software that runs and records production; examples are AVEVA PI with its audit subsystem, or a vendor MES electronic batch record, the digital version of the paper batch record) ships an audit trail you configure rather than build, backed by a supplier who carries the accountability and the validation package. With our stack, you own the audit logic — which means you own validating it, version-controlling the DDL (it lives in Git, a genuine advantage for change control), and defending it under qualification (the formal IQ/OQ/PQ proof that a computerized system is fit for GMP use, per Chapter 25).

The same record as a triple: gating it with SHACL, not just a trigger

The relational audit.change_log row has a clean semantic twin, and seeing it is worth a paragraph because it ties this chapter to the knowledge graph of Semantics & the Digital Thread. One change-event row is a small provenance graph: in W3C PROV-O (the standard ontology for provenance — who did what, to which thing, when) the act of changing the result is a prov:Activity, the analyst named in app_user is the prov:Agent it prov:wasAssociatedWith, the lab.result row keyed by row_key is the prov:Entity it prov:used and generated a new version of, and reason is a prov:value on the activity. The trigger is, in graph terms, a PROV-O materializer running inside the database. And the closed-world completeness this chapter enforces in PL/pgSQL — every regulated change must carry a who, a when, and a why — is exactly what a SHACL (the Shapes Constraint Language — a way to validate that graph data has the required structure) shape expresses declaratively on the triple side:

# Illustrative: the closed-world gate the trigger enforces, written as a SHACL shape.
bp:AuditEntryShape a sh:NodeShape ;
    sh:targetClass prov:Activity ;
    sh:property [ sh:path prov:wasAssociatedWith ; sh:minCount 1 ;
                  sh:message "Every regulated change must be attributable (app_user)." ] ;
    sh:property [ sh:path prov:atTime ; sh:minCount 1 ; sh:maxCount 1 ] ;
    sh:property [ sh:path bp:reason ; sh:minCount 1 ] .

That sh:minCount 1 on the agent is the Attributable rule, and a missing who is a failure, now — the same closed-world "is a required field present?" question Book 4 runs as a release gate, where OWL's open world would only call the absence "unknown." A regulator's audit-trail review then becomes a SPARQL competency question — "list every change to a result lacking a reason, with its agent and time" — the graph analogue of the SELECT * FROM audit.change_log WHERE action = 'UPDATE' we ran above. Book 4 develops exactly this open-world-OWL-versus-closed-world-SHACL division in The Release Gate and SHACL, and the bp:approvedBy signature it gates is the very hook Chapter 24 binds to a real identity — so the audit row here, the PROV-O activity, and the released-lot signature are three views of one regulated fact.

Why this is the floor under any model: trustworthy data is a precondition for ML

A learning model is only as trustworthy as the records it was fit on, which makes this chapter the unglamorous floor under everything Book 5 builds. A bioprocess model — a Raman soft sensor for titer, a release-prediction classifier — is trained on exactly the lab.result and s88.batch rows this trigger guards. Three of its needs are governance needs this audit trail supplies directly. First, reproducible training data: a model dossier must pin the exact rows it learned from, and a hash-chained, append-only log gives a model lineage record a dataset hash can anchor to — the examples/platform/ml/ suite logs precisely such a dataset hash on every retrain. Second, leakage-free, batch-aware validation: because every result carries its batch_id (the row_key here), the same context that makes a change attributable is what lets a model split its data by batch rather than by row — the grouped, leave-one-batch-out cross-validation that stops a model cheating by memorizing a batch it will also be tested on, the cardinal pitfall Book 5 hammers in Models and Validation. Third, drift versus tampering: an audit-trail break and a model's input-drift alarm are different signals on the same stream — a hash-chain failure says the record was altered, while a Population Stability Index shift says the process moved; an MLOps loop needs both, and the lineage that lets you replay a model's exact training set is the same lineage that lets you prove a corrected result was genuinely re-trained on, not silently overwritten (the lifecycle Book 5 details in MLOps and Lifecycle). The slogan is blunt: there is no trustworthy model on untrustworthy data, and this chapter is where the data earns the adjective.

Key terms

ALCOA+ — the data-integrity attributes (Attributable, Legible, Contemporaneous, Original, Accurate, plus Complete, Consistent, Enduring, Available) that define a trustworthy GMP record.
Audit trail — a secure, append-only record of who changed what, when, and why; here the audit.change_log table populated by a trigger.
Trigger-based audit — using a PostgreSQL AFTER ... FOR EACH ROW trigger to capture OLD/NEW row images on every change automatically, rather than relying on application code.
System-versioned (temporal) history — the SQL:2011 pattern of preserving a row's prior versions when it changes, so the original is never lost; emulated here with a trigger.
Hash chain — a sequence of records where each row stores a SHA-256 hash that incorporates the previous row's hash; here it makes a deleted, reordered, or relinked entry detectable (a silent edit to a row's payload that leaves the hash columns intact is not, unless the hash is recomputed from the payload).
Tamper-evident vs tamper-proof — the chain makes alteration detectable (evident); it does not make alteration impossible (proof), which a privileged DBA can still defeat.
pgcrypto — the PostgreSQL extension providing in-database cryptographic functions, including the digest()/SHA-256 used to build the chain.
Attributable metadata — the db_user/app_user/reason context recorded around a value, which is itself part of the regulated record.
GENERATED ALWAYS AS IDENTITY — a PostgreSQL column (here seq) whose value the database assigns and the application cannot override, so the order of the audit history is kept by the system, not the writer.
prev_hash / row_hash — the two link columns of each audit.change_log row: prev_hash is copied verbatim from the previous row's row_hash, and row_hash is the SHA-256 over prev_hash plus this row's payload; together they form the chain verify_chain() walks.
row_key — the coalesced batch_id/sample_id the trigger extracts from the changed row, so one column points at the affected record whatever regulated table it came from.
PROV-O — the W3C provenance ontology; one audit.change_log row maps onto a prov:Activity (the change), a prov:Agent (app_user), and a prov:Entity (the regulated row), so the trigger is in effect a provenance-graph materializer.
SHACL gate (closed-world) — the graph-side twin of the trigger's enforcement: a sh:NodeShape whose sh:minCount 1 on the agent, time, and reason makes a missing field a failure now — the same closed-world completeness check Book 4 runs as a release gate, where OWL's open world would only call the absence "unknown."
Model lineage / dataset hash — the reproducibility record a deployed model needs: which exact regulated rows it was trained on. A hash-chained, append-only audit log is the substrate a model's dataset hash anchors to, so a corrected result can be proven re-trained-on rather than silently overwritten.
Batch-grouped (leave-one-batch-out) validation — splitting a model's data by batch_id (the row_key here) rather than by row, so a model cannot cheat by being tested on a batch it memorized in training; the same attributable context that gates a change is what makes leakage-free validation possible.

Where this leads

We have made the data tamper-evident and attributable by construction. But an audit trail answers what changed; it does not yet answer who formally approved it with a legally meaningful signature. In Chapter 24 — Electronic Records & Signatures: Part 11 / Annex 11 with Open Source, we take the app.user and reason hooks from this chapter and bind them to real authenticated identities, add pgAudit for logging privileged database sessions, and cryptographic e-signatures (tamper-proof digital approvals) via eLabFTW (an open-source electronic lab notebook) and a reason-for-change service — and we draw a brutally honest gap register showing exactly which Part 11 clauses open source satisfies and which still demand procedure or commercial tooling.

What this chapter covers​

ALCOA+ is a set of design requirements, not a poster​

The audit trail: a trigger that watches the regulated tables​

Anatomy of an audit.change_log row: twelve columns, one link​

The hash chain: linking the records together​

How prev_hash binds the rows​

Proving it with tests, not promises​

What the verifier catches — and what it does not​

Why it matters​

What an audit-trail failure looks like in the field​

In the real world​

Tamper-evident, not tamper-proof: the honest bound​

The same record as a triple: gating it with SHACL, not just a trigger​

Why this is the floor under any model: trustworthy data is a precondition for ML​

Key terms​

Where this leads​