ALCOA+ by Construction: Integrity in Code
π Where we are: Part V Β· Trust β Chapter 20. The platform now captures, stores, contextualizes, and visualizes the batch. This chapter makes the data trustworthy by construction: we encode the data-integrity rules every inspector cares about straight into PostgreSQL schema, triggers, and a hash chain β and we write tests that prove the guarantees hold.
Imagine a lab notebook where every page is numbered, written in indelible ink, signed and dated as you go, and bound so tightly that you cannot tear a page out, slip one in, or shuffle the order without it showing. If someone did tear out or reshuffle pages, the page numbers would stop adding up and everyone would notice. That is what we build here, in code: every change to a regulated record is appended (never overwritten), stamped with who, when, and why, and cryptographically linked to the change before it β so the sequence of the history is tamper-evident. We will be precise about the limits: the link check catches a deleted, reordered, or spliced entry, not a silent edit to an entry's contents that leaves the links intact β and certainly not tamper-proof, which no SQL file can promise. Tamper-evident is nonetheless the bar a regulator actually sets, and we will say exactly how far this reaches.
What this chapter coversβ
For nineteen chapters we have moved data from sensors into a clean, contextualized, queryable platform. Now we have to make a regulator trust it. The shorthand for that trust is ALCOA+ β data must be Attributable, Legible, Contemporaneous, Original, and Accurate, plus Complete, Consistent, Enduring, and Available [1]. It is the spine of the FDA's data-integrity expectations [2], EU Annex 11 [3], and the PIC/S inspector framework [4].
The temptation is to treat ALCOA+ as a checklist you audit after the fact. This chapter argues the opposite: integrity is cheapest and strongest when engineered in. We will:
- map each ALCOA+ attribute to a concrete schema or pipeline mechanism (append-only, quality flags, attributable metadata);
- build a trigger-based audit trail in PostgreSQL that records old/new/who/when/why for every change to a regulated table;
- hash-chain that audit log so any retroactive edit becomes detectable;
- and run a pytest suite that mechanically asserts the guarantees β append-only behavior, an intact chain, captured updates.
Every snippet here comes from two real, tested files in the companion repo: the DDL in examples/platform/db/50-alcoa.sql, applied by make seed, and the assertions in examples/tests/test_db.py, run by make test. Nothing is illustrative. You can break the chain on your own laptop and watch the test go red.
ALCOA+ is a set of design requirements, not a posterβ
Before any code, it helps to read ALCOA+ as a list of engineering requirements, because that is what it becomes once you stop treating it as compliance wallpaper. The MHRA's guidance is the cleanest statement of the attributes and of the crucial point that metadata is part of the record β the who and when around a value are as regulated as the value itself [1].
| ALCOA+ attribute | What it demands of the system | Where we enforce it in code |
|---|---|---|
| Attributable | Every value traces to a person or device | db_user / app_user columns on the audit log |
| Legible | Records are readable and permanent | jsonb old/new snapshots, plain SQL, durable storage |
| Contemporaneous | Recorded at the time the event happened | clock_timestamp() server timestamp on each row |
| Original | The first capture is preserved, not overwritten | Append-only change_log; never UPDATE/DELETE |
| Accurate | Values are correct and quality-flagged | OPC UA quality flag on every reading |
| Complete | Nothing is silently dropped, including changes | A trigger fires on every INSERT/UPDATE/DELETE |
| Consistent | The sequence is intact and ordered | Monotonic seq identity + the hash chain |
| Enduring / Available | Survives and is retrievable | PostgreSQL durability + retention (Ch 23) |
Two of these were already paid for in earlier chapters. Accurate rides on the quality flag the historian carries on every reading β recall the column from examples/platform/db/20-historian.sql, where OPC UA status codes (192 Good, 64 Uncertain, 0 Bad) are stored alongside the value so a downstream consumer can never mistake a Bad reading for a trustworthy one. Contemporaneous rides on the fact that the collector stamps each reading at acquisition time rather than at insert time. This chapter's job is the harder three β Attributable, Original, Complete β and the tamper-evidence that ties Consistent together.
The audit trail: a trigger that watches the regulated tablesβ
The mechanism is a single PostgreSQL trigger function attached to the tables that hold GMP records. PostgreSQL gives row-level triggers access to the OLD and NEW versions of a row and to the operation type via TG_OP, which is exactly the raw material an audit trail needs [5]. Here is the append-only log table, from examples/platform/db/50-alcoa.sql:
CREATE TABLE audit.change_log (
seq bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
ts timestamptz NOT NULL DEFAULT clock_timestamp(),
db_user text NOT NULL DEFAULT current_user,
app_user text, -- set via SET app.user = '...'
table_name text NOT NULL,
action text NOT NULL, -- INSERT | UPDATE | DELETE
row_key text,
old_row jsonb,
new_row jsonb,
reason text, -- set via SET app.reason = '...'
prev_hash text,
row_hash text NOT NULL
);
Read this table as ALCOA+ rendered in columns. db_user and app_user make a change Attributable β the first to the database role, the second to the human the application authenticated (we set it with SET app.user = '...', and Chapter 21 wires it to a real Keycloak identity). ts defaulting to clock_timestamp() makes it Contemporaneous β note it is clock_timestamp(), the actual wall-clock instant the row is written, not now(), which would freeze at the start of the transaction. old_row and new_row as jsonb keep the record Legible and Original: the full before-and-after image of the row is preserved, not just a diff. And the table is append-only by use β nothing in the system ever issues an UPDATE or DELETE against it. The reason column is the regulator's favorite, capturing the why behind every change.
The trigger function does the work. Also from examples/platform/db/50-alcoa.sql:
CREATE OR REPLACE FUNCTION audit.log_change() RETURNS trigger AS $$
DECLARE
v_prev text;
v_key text;
v_old jsonb := CASE WHEN TG_OP = 'INSERT' THEN NULL ELSE to_jsonb(OLD) END;
v_new jsonb := CASE WHEN TG_OP = 'DELETE' THEN NULL ELSE to_jsonb(NEW) END;
v_app text := current_setting('app.user', true);
v_reason text := current_setting('app.reason', true);
v_hash text;
BEGIN
SELECT row_hash INTO v_prev FROM audit.change_log ORDER BY seq DESC LIMIT 1;
v_key := coalesce((v_new ->> 'batch_id'), (v_old ->> 'batch_id'),
(v_new ->> 'sample_id'), (v_old ->> 'sample_id'));
-- chain hash = H(prev_hash || payload)
v_hash := encode(digest(
coalesce(v_prev, '') || TG_TABLE_NAME || TG_OP ||
coalesce(v_old::text, '') || coalesce(v_new::text, '') ||
coalesce(v_app, '') || clock_timestamp()::text, 'sha256'), 'hex');
INSERT INTO audit.change_log(app_user, table_name, action, row_key,
old_row, new_row, reason, prev_hash, row_hash)
VALUES (v_app, TG_TABLE_NAME, TG_OP, v_key, v_old, v_new, v_reason, v_prev, v_hash);
RETURN coalesce(NEW, OLD);
END;
$$ LANGUAGE plpgsql;
Three lines deserve a closer look. to_jsonb(OLD) and to_jsonb(NEW) serialize the entire row to JSON, so the log is schema-agnostic β the same function audits a lab result, a batch, or a recipe parameter without one column of bespoke code. current_setting('app.user', true) reads a session variable the application sets at the start of a transaction; the true makes it return NULL rather than erroring if the variable was never set. And the digest(...,'sha256') call is the hash chain, which we unpack next.
The trigger is attached to exactly the tables that hold regulated records, also in examples/platform/db/50-alcoa.sql:
CREATE TRIGGER audit_result AFTER INSERT OR UPDATE OR DELETE ON lab.result
FOR EACH ROW EXECUTE FUNCTION audit.log_change();
CREATE TRIGGER audit_batch AFTER INSERT OR UPDATE OR DELETE ON s88.batch
FOR EACH ROW EXECUTE FUNCTION audit.log_change();
CREATE TRIGGER audit_recipe_p AFTER INSERT OR UPDATE OR DELETE ON s88.recipe_parameter
FOR EACH ROW EXECUTE FUNCTION audit.log_change();
These are the three tables an inspector cares most about: analytical results (lab.result), the batch record itself (s88.batch), and the recipe parameters that define how the product is made (s88.recipe_parameter, which Chapter 24 versions in place using its valid_from/valid_to columns). Because the triggers fire on every INSERT OR UPDATE OR DELETE, nothing slips by β that is the Complete attribute made mechanical.
The hash chain: linking the records togetherβ
An append-only log is only as honest as the storage under it. A determined insider with table access could, in principle, reach into audit.change_log and reorder rows, delete a middle entry, or splice in a fabricated one. We make that class of attack detectable with a technique older than the cloud: a linked hash chain, first described by Haber and Stornetta in 1991 as a way to time-stamp digital documents so that the sequence cannot be quietly reordered or altered [6]. It is the same construction that later underpinned blockchains β but we need none of the distributed-consensus machinery, just the linking.
The idea is simple. When the trigger writes a row, it computes a SHA-256 hash over that row's payload and the stored hash of the row before it, then saves both values: the prior row's hash in prev_hash and the new digest in row_hash. Conceptually:
row_hash[n] = SHA256( row_hash[n-1] || table || op || old || new || app_user || clock_timestamp() )
prev_hash[n] = row_hash[n-1]
Two honesty notes about this digest, because the chapter's whole point is to claim only what the code delivers. First, the hash folds in clock_timestamp() evaluated inside the trigger β a wall-clock instant that is not the value persisted in the ts column (which has its own independent clock_timestamp() default). Nothing in the stored row records the exact timestamp that went into the hash, so row_hash is not reproducible from the saved columns. Second, and as a consequence, the shipped verifier never recomputes row_hash from the payload at all β it only checks that the links between the stored prev_hash and row_hash columns are consistent. We will see exactly what that does and does not catch.
The pgcrypto extension provides the in-database digest() and encode() functions that compute and hex-encode that SHA-256 [7]. It is enabled once, at the top of the stack, in examples/platform/db/00-init.sql:
CREATE EXTENSION IF NOT EXISTS pgcrypto; -- digest() for the ALCOA+ hash chain
Because each row stores the previous row's hash in its prev_hash column, the log becomes a chain whose links can be checked: delete a row, reorder rows, or overwrite a prev_hash/row_hash value, and the stored links stop lining up. That is the property the verifier tests β and, crucially, the only one it tests.
The audit log as a linked hash chain: each row stores the prior row's row_hash in its prev_hash column, so deleting, reordering, or overwriting a hash-link column breaks the stored links and verify_chain() flags the first seq where a prev_hash no longer matches the previous row's row_hash. Note the verifier checks link consistency only β it does not recompute hashes, so a silent edit to a row's payload that leaves the hash columns alone is not caught here.
Original diagram by the authors, created with AI assistance.
Detecting a broken link is its own function, the last block in examples/platform/db/50-alcoa.sql:
-- Verify the chain is intact: returns rows where a stored prev_hash does not
-- equal the previous row's stored row_hash (a broken/reordered/deleted link).
CREATE OR REPLACE FUNCTION audit.verify_chain()
RETURNS TABLE(seq bigint, ok boolean) AS $$
WITH chained AS (
SELECT c.seq, c.row_hash, c.prev_hash,
lag(c.row_hash) OVER (ORDER BY c.seq) AS expected_prev
FROM audit.change_log c
)
SELECT seq, (prev_hash IS NOT DISTINCT FROM expected_prev) AS ok
FROM chained
WHERE prev_hash IS DISTINCT FROM expected_prev;
$$ LANGUAGE sql;
The lag(...) OVER (ORDER BY c.seq) window function walks the log in sequence order and, for each row, looks back at the previous row's stored row_hash. If that does not match the prev_hash the current row recorded, the stored link is broken. A healthy chain returns zero rows β every link is consistent. Any row it does return points at where the stored links stopped lining up: that seq is the first place the chain failed to add up. Be precise about scope: this compares only the prev_hash and row_hash columns. It does not re-derive a hash from old_row/new_row/app_user, so it detects deleted, reordered, or relinked rows β not a silent edit to a row's payload that leaves the hash columns untouched. We will use exactly the attack it can catch in the demo below, and return to the gap it leaves in "In the real world." (The repo's own comment on this function is careful to match: it says the verifier "returns rows where a stored prev_hash does not equal the previous row's stored row_hash" and adds "this checks link consistency only; it does NOT recompute row_hash from the payload" β deliberately not describing it as recomputing a hash, because nothing is recomputed.)
A typical healthy log looks like this when you query it directly:
seq | ts | app_user | table_name | action | row_key | prev_hash | row_hash
-----+----------------------------+----------+------------+--------+----------------+-------------+-------------
1 | 2026-01-05 00:00:00.142+00 | seed | batch | INSERT | BATCH-2026-001 | (null) | 9f2a...c41b
2 | 2026-01-05 00:00:00.197+00 | seed | result | INSERT | S-001 | 9f2a...c41b | 1ce8...77a0
3 | 2026-01-13 09:14:22.030+00 | jdoe | result | UPDATE | S-014 | 1ce8...77a0 | b430...e9f2
Row 3 is an analyst correcting a result on January 13: app_user is jdoe, the action is UPDATE, and its prev_hash equals row 2's row_hash. The links add up, so verify_chain() returns nothing.
Proving it with tests, not promisesβ
A guarantee you have not tested is a hope. The companion repo treats the integrity rules as executable acceptance criteria, asserting them with a pytest suite β pytest's plain-assert rewriting lets a one-line assertion stand in for a full integrity check [8]. These tests run against the live stack (make up && make seed && make load) and skip cleanly if the database is unreachable. From examples/tests/test_db.py:
def test_alcoa_chain_intact(conn):
assert _scalar(conn, "select count(*) from audit.change_log") > 0
assert _scalar(conn, "select count(*) from audit.verify_chain()") == 0 # 0 broken links
That second assertion is the whole chapter in one line: the chain has entries, and zero of its links are broken. The next test is the more interesting one, because it actually exercises an edit and proves the trail captured it:
def test_audit_captures_update(conn):
# an UPDATE must record old + new + who + why and keep the chain intact
with conn.cursor() as cur:
cur.execute("select set_config('app.user','pytest',false), "
"set_config('app.reason','test correction',false)")
cur.execute("update lab.result set value = value where result_id = "
"(select result_id from lab.result limit 1)")
conn.commit()
last = _scalar(conn, "select action from audit.change_log "
"where app_user='pytest' order by seq desc limit 1")
assert last == "UPDATE"
assert _scalar(conn, "select count(*) from audit.verify_chain()") == 0
Walk through what it proves. set_config('app.user','pytest', ...) and set_config('app.reason', ...) set the attributable who and the reason why β exactly the session variables the trigger reads. The update lab.result set value = value is a deliberately trivial edit (it sets the value to itself), yet the trigger still fires and logs an UPDATE row, because the audit trail records the act of changing, not just net differences. The suite then asserts the most recent row by pytest is an UPDATE and that the chain is still intact afterward. Attributable, Original, Complete, Consistent β all four, mechanically verified, on a laptop.
To watch the verifier catch tampering, you do not need new code; you reach into the log and break a link directly. The attack verify_chain() is built to detect is one that disturbs the stored hash columns β deleting a row, reordering rows, or splicing the chain. Here we sever the link by overwriting one row's prev_hash:
-- Simulate an insider trying to splice the chain by relinking row 3.
UPDATE audit.change_log SET prev_hash = 'deadbeef'
WHERE seq = 3;
SELECT * FROM audit.verify_chain();
seq | ok
-----+----
3 | f
Row 3's stored prev_hash no longer equals row 2's stored row_hash, so the link no longer lines up β the verifier flags seq = 3 as the first broken link, and test_alcoa_chain_intact would turn red on the next make test. Deleting row 2 outright, or swapping the seq ordering, breaks the links the same way and is caught the same way. The tampering did not stay hidden.
Be equally clear about what this verifier does not catch, because it is the difference between tamper-evident and a claim we cannot back. If instead of breaking a link the insider silently edits a row's payload β UPDATE audit.change_log SET new_row = jsonb_set(new_row, '{value}', '99.9') WHERE seq = 2; β and leaves the prev_hash/row_hash columns alone, then verify_chain() returns zero rows: every stored link still lines up, and the edit goes undetected. Closing that gap needs a verifier that recomputes each row_hash from the payload and compares it to the stored value β which, as noted above, the current schema cannot support because the hashed clock_timestamp() is never persisted. We name that limitation honestly here and revisit the fix (hash the stored ts, store the chain head off-database) in "In the real world" and Chapter 21.
Why it mattersβ
Data integrity is the single most common theme in FDA warning letters and EU GMP findings β not because companies set out to falsify data, but because their systems make integrity optional: an audit trail you can switch off, a value you can overwrite with no trace, a timestamp the user can set. ALCOA+ is the language regulators use to describe what "trustworthy" means, and Annex 11 and Part 11 turn it into law for computerized systems [3][9].
Building integrity into the schema flips the default. With the trigger attached, there is no path to change a regulated row that does not write an attributable, reasoned, timestamped, hash-linked audit entry β the right thing is the only thing the database lets you do. That is what "by construction" means, and it is far stronger than a procedure that asks people to remember to document changes. The FDA's own data-integrity guidance frames audit trails and attributable metadata as a CGMP expectation that must be implemented, not merely promised [2]; we have implemented it in about seventy lines of SQL.
It also makes review-by-exception possible. An inspector or a QA reviewer can run one query β SELECT * FROM audit.change_log WHERE table_name = 'result' AND action = 'UPDATE' β and see every correction ever made to a result, with the before image, the after image, the analyst, and the reason. That is the audit-trail review that Part 11 and PIC/S PI 041 expect, and it is a query, not a forensic exercise [4].
In the real worldβ
Here is the honest reckoning this book promises. The pattern we built is real, runnable, and standards-aligned: it implements the SQL:2011 idea of system-versioned history β preserving the original row when it changes β using triggers, because PostgreSQL (through version 18) still does not implement SQL:2011 system-versioned temporal tables natively, so a trigger is the idiomatic open-source way to get the behavior [10]. The hash chain is the genuine HaberβStornetta construction [6]. And the whole thing is tested in CI on every commit.
But it is tamper-evident, not tamper-proof, and there are two distinct edges to be honest about. The first is a limit of the verifier itself, narrower than the strong "any edit is caught" story is tempting to tell: verify_chain() checks only that the stored prev_hash/row_hash links line up, so it catches a deleted, reordered, or relinked row but not a silent edit to a row's old_row/new_row/app_user/reason payload that leaves the hash columns alone. Worse, the current schema cannot easily be upgraded to a recomputing verifier, because the digest folds in a fresh clock_timestamp() that the row never stores β so even a correct re-derivation could not reproduce row_hash. The fix is a small repo change we flag for Chapter 21: hash the row's stored ts value (or persist the exact instant hashed) so row_hash becomes reproducible, then add a verifier that recomputes each digest from the saved columns and compares it. Only then does "altering any historical entry is detected" become a true statement rather than an aspiration.
The second edge is operational and bigger. A PostgreSQL superuser β or the owner of the audit.change_log table β can DISABLE TRIGGER, edit a regulated row with no audit entry, recompute the entire hash chain from the point of edit forward, and leave a chain that verifies cleanly. Our defense makes casual tampering visible and raises the cost of deliberate tampering enormously, but it cannot defeat the database administrator. Closing that gap is not a code problem; it is an operational one: segregation of duties so the people who can administer the database are not the people who own the data, locked-down roles, an off-database copy of the chain head (a periodic row_hash written to write-once storage such as the SeaweedFS WORM bucket we stand up in Chapter 23, or an RFC 3161 trusted timestamp), and pgAudit capturing privileged sessions β all of which Chapter 21 and the validation work in Chapter 22 take on. No open-source component is 21 CFR Part 11-compliant out of the box, and this one is no exception: compliance is a property of the validated system and its procedures, never of a SQL file.
This is also where the OSS-versus-commercial line falls. A validated commercial historian or MES (AVEVA PI with its audit subsystem, a vendor MES electronic batch record) ships an audit trail you configure rather than build, backed by a supplier who carries the accountability and the validation package. With our stack, you own the audit logic β which means you own validating it, version-controlling the DDL (it lives in Git, a genuine advantage for change control), and defending it under qualification. NIIMBL's SABRE facility β the NIIMBL / University of Delaware pilot-scale cGMP plant (cGMP being current Good Manufacturing Practice) that broke ground in April 2024 β is precisely the kind of multi-vendor, sensor-dense line where these audit and integrity controls have to span open and commercial systems alike; SABRE is a facility, not a data standard, but it is the physical setting this stack is built to serve.
Key termsβ
- ALCOA+ β the data-integrity attributes (Attributable, Legible, Contemporaneous, Original, Accurate, plus Complete, Consistent, Enduring, Available) that define a trustworthy GMP record.
- Audit trail β a secure, append-only record of who changed what, when, and why; here the
audit.change_logtable populated by a trigger. - Trigger-based audit β using a PostgreSQL
AFTER ... FOR EACH ROWtrigger to captureOLD/NEWrow images on every change automatically, rather than relying on application code. - System-versioned (temporal) history β the SQL:2011 pattern of preserving a row's prior versions when it changes, so the original is never lost; emulated here with a trigger.
- Hash chain β a sequence of records where each row stores a SHA-256 hash that incorporates the previous row's hash; here it makes a deleted, reordered, or relinked entry detectable (a silent edit to a row's payload that leaves the hash columns intact is not, unless the hash is recomputed from the payload).
- Tamper-evident vs tamper-proof β the chain makes alteration detectable (evident); it does not make alteration impossible (proof), which a privileged DBA can still defeat.
pgcryptoβ the PostgreSQL extension providing in-database cryptographic functions, including thedigest()/SHA-256 used to build the chain.- Attributable metadata β the
db_user/app_user/reasoncontext recorded around a value, which is itself part of the regulated record.
Where this leadsβ
We have made the data tamper-evident and attributable by construction. But an audit trail answers what changed; it does not yet answer who formally approved it with a legally meaningful signature. In Chapter 21 β Electronic Records & Signatures: Part 11 / Annex 11 with Open Source, we take the app.user and reason hooks from this chapter and bind them to real authenticated identities, add pgAudit for privileged-session logging and cryptographic e-signatures via eLabFTW and a reason-for-change service β and we draw a brutally honest gap register showing exactly which Part 11 clauses open source satisfies and which still demand procedure or commercial tooling.