Validating an Open-Source Stack: GAMP 5 & CSA
๐ Where we are: Part V ยท Trust โ we have a running stack with an audit trail and e-signatures; now we have to prove it is fit for GxP use, with no vendor to hide behind.
Buying validated software is like hiring a contractor who hands you a thick binder of paperwork and takes the blame if the roof leaks. Open source is like building the deck yourself: nobody hands you the binder, and nobody else signs off. That is not a deal-breaker โ a home inspector still passes a well-built deck โ but you have to keep the receipts, the photos, and a list of every board you used. This chapter is about keeping those receipts in a way an inspector will accept, and doing it as code you can re-run.
In the last chapters you turned on the trust profile, enabled the audit triggers, and stood up a signing service. Everything works. But "works on my laptop" is not "validated." A regulated manufacturer running this stack to make a monoclonal antibody (mAb) must be able to tell an inspector, in writing, why they trust it โ and for open source there is no supplier to lean on.
This is the chapter where the honest-hybrid thesis bites hardest. Pure open source can carry you about 80% of the way to a defensible validation package. The last mile โ the procedures, the change control, the human sign-off, the audited environment โ is yours, and it is the same burden whether the software underneath cost zero dollars or a million.
What this chapter coversโ
- Why "validated software" is a myth and validation is a property of a system + procedures, framed by 21 CFR Part 11 ยง11.10(a), 21 CFR 820.70(i), and FDA's General Principles of Software Validation.
- GAMP 5 categorization of each stack component โ PostgreSQL as infrastructure, our batch model as Category 5 โ and the CSA (Computer Software Assurance) shift from documentation to risk.
- Supplier and provenance assessment for community projects that have no supplier quality system, backed by an SBOM pinned to image digests.
- A URS โ IQ/OQ/PQ traceability matrix and an IQ manifest locked to image digests.
- Automated OQ as
pytestโ the tests inexamples/tests/are the validation evidence, re-run by CI on every change.
"Validated software" does not existโ
Start by deleting a phrase from your vocabulary: "validated software." No download is compliant. The law is precise about this. 21 CFR Part 11 ยง11.10(a) requires "validation of systems to ensure accuracy, reliability, consistent intended performance" [1]. For devices and quality systems, 21 CFR 820.70(i) is even more direct: computer software used as part of production or the quality system "shall be validated for its intended use according to an established protocol" [2]. Both put the obligation on the system as you use it, not on the vendor's box.
FDA's General Principles of Software Validation gives us the verbs โ installation, operational, and performance qualification (IQ/OQ/PQ) โ and frames validation as a lifecycle activity, not a one-time test [3]. The good news, which the rest of this chapter cashes out, is that for an infrastructure-as-code stack each of those verbs maps cleanly onto something you can run.
GAMP 5, Second Edition, is the industry's playbook for doing this proportionately, and crucially it does not treat open source as forbidden fruit: it provides a risk-based, critical-thinking approach and explicit guidance on software categories, suppliers, and even open-source software [4]. The companion Pharmaceutical Engineering article from the GAMP Community of Practice spells out the OSS angle: keep a catalog of the open-source components, assess the project's governance and sustainability, and verify the installed copy matches the intended version from a reputable source [5].
Categorize before you validateโ
The first GAMP 5 move is to categorize each component, because category drives effort. You do not validate PostgreSQL's B-tree implementation; you validate the application you built on top of it. Here is how our stack splits.
| Component | GAMP category | Why | Validation focus |
|---|---|---|---|
| PostgreSQL / TimescaleDB | 1 (Infrastructure) | Established platform software | Correct install, version pinned, configured per spec |
| Mosquitto, Grafana, Fuseki | 1 (Infrastructure) | Configurable platform tools, used unmodified | Install + configuration verification |
| Grafana dashboards, OPA policy, Telegraf TOML | 4 (Configured) | We configure, not code, the behavior | Verify the configuration meets the requirement |
bioproc_sim, the ISA-88/95 model, audit triggers, the soft-sensor | 5 (Custom) | Code we wrote for this intended use | Full lifecycle: requirements โ design โ code review โ test |
The category-1-vs-5 split is the whole economy of validation. The bulk of the stack is infrastructure or configuration; only the code we authored is Category 5 and needs the heavyweight treatment. That is exactly where we point our OQ tests.
CSA: stop writing, start thinkingโ
For years, CSV (Computer System Validation) drifted into a documentation arms race โ pages of screenshots proving a button is blue. FDA's Computer Software Assurance (CSA) guidance is the correction. Issued as a draft in 2022 [6] and finalized in 2025 (with an administrative update in early 2026) [7], CSA tells you to spend effort in proportion to risk: high-risk, patient-impacting functions get rigorous scripted testing; low-risk functions get lighter, unscripted or automated checks. The recordkeeping scales with the risk, not the other way around.
CSA is a gift to an OSS stack. It explicitly blesses leveraging existing evidence โ logs, audit trails, automated test results โ instead of re-deriving everything by hand. That is precisely what a Git repository full of pytest runs and CI logs gives you. The assurance activity becomes: identify the intended use, judge the risk, choose the lightest test that establishes confidence, and let the automation produce the record.
And this is not theory. The IMPALA Consortium โ Roche, MSD, and Boehringer Ingelheim โ independently validated a community-developed open-source R package to GxP/GCP standards and published exactly how they did it [8]. Regulated pharma can defensibly validate software with no vendor. The pattern they used is the pattern here: assess provenance, define intended use, test against requirements, keep the evidence.
Supplier assessment when there is no supplierโ
GAMP 5 expects a supplier assessment โ normally a questionnaire or audit of the vendor's quality system. Open source has no vendor to audit, so you assess the project and the artifact instead. Two questions: is the project a trustworthy "supplier," and is the bit-for-bit copy you are running the one you think it is?
For the first, you assess project health as a proxy: governance, release cadence, security response, community size, license [5]. For the second, you pin and verify. Our examples/platform/compose/compose.yaml pins every image by human-readable tag, and the repo's examples/platform/versions.lock records the matching immutable manifest digest for each one (regenerated by make lock):
# from examples/platform/compose/compose.yaml
postgres:
image: timescale/timescaledb:2.17.2-pg17
profiles: ["core"]
mosquitto:
image: eclipse-mosquitto:2.0.22
profiles: ["core"]
grafana:
image: grafana/grafana-oss:11.4.0
profiles: ["core"]
A tag like :2.17.2-pg17 is a human-friendly label that can be re-pointed; a digest (@sha256:โฆ) is content-addressable and cannot. Pinning both is the difference between "we ran TimescaleDB 2.17" and "we ran this exact TimescaleDB 2.17." That distinction is what makes the install reproducible โ and reproducibility is the foundation of every IQ claim.
The machine-readable supplier dossier is a Software Bill of Materials (SBOM). We generate one with Syft, the Apache-2.0 tool that walks a container image and lists every package inside it [9]. The output is a standardized format โ CycloneDX, now Ecma International ECMA-424 [10], or SPDX, ratified as ISO/IEC 5962:2021 [11] โ so the inventory has formal, inspection-defensible footing rather than being a homemade spreadsheet. A trimmed CycloneDX row looks like this:
// illustrative SBOM row (CycloneDX 1.6), produced by `make sbom` (Syft)
{
"type": "container",
"name": "timescale/timescaledb",
"version": "2.17.2-pg17",
"purl": "pkg:docker/timescale/timescaledb@2.17.2-pg17",
"hashes": [{ "alg": "SHA-256", "content": "sha256:3324f81cโฆ (digest pinned in versions.lock)" }],
"licenses": [
{ "license": { "name": "PostgreSQL License" } },
{ "license": { "name": "Timescale License (TSL)" } }
]
}
That single artifact answers three questions an inspector will ask: what is in here, where did it come from, and under what license. And it forces an honesty the prose alone can hide: the pinned timescale/timescaledb:2.17.2-pg17 is the Community bundle, dual-licensed under the PostgreSQL License (the Apache-2.0 core) and the source-available Timescale License (TSL) โ so the SBOM row carries TSL, not a clean Apache-2.0 (the strictly Apache-only build would be the -oss tag). The book is deliberate about this trap (see the historian chapter): it uses the free TSL Community automation โ continuous aggregates and add_retention_policy โ while staying off the one TSL feature, Hypercore columnstore/compression, that it does not need; it ships VictoriaMetrics instead of InfluxDB v3, and it flags Grafana's AGPL for redistribution. The license column is not decoration โ the supplier register and the SBOM share one source (versions.lock), so the recorded license is whatever the pinned image actually is, and the inventory cannot quietly drift away from the running stack.
The traceability matrix: URS โ IQ/OQ/PQโ
Validation is fundamentally about answering "did we build what we said we needed?" The artifact that proves it is a traceability matrix: every user requirement (URS) maps forward to a test, and every test maps back to a requirement. Nothing untested, nothing untraced.
Here is a slice of the requirements-to-test matrix for our stack, expressed as the CSV the repo would ship under compliance/:
# illustrative compliance/traceability.csv (URS -> test, generated from test IDs)
urs_id,requirement,gamp_cat,risk,verifies,test_id
URS-001,"Historian stores all bioreactor tags for a batch",1,High,test_historian_loaded,tests/test_db.py::test_historian_loaded
URS-002,"Every reading resolves to a named ISA-88 phase",5,High,test_contextualization_joins_phase,tests/test_db.py::test_contextualization_joins_phase
URS-003,"Record changes are attributable, reasoned, tamper-evident",5,Critical,test_audit_captures_update,tests/test_db.py::test_audit_captures_update
URS-004,"The audit hash chain has no broken links",5,Critical,test_alcoa_chain_intact,tests/test_db.py::test_alcoa_chain_intact
URS-005,"Generated process data is deterministic & reproducible",5,Medium,test_determinism_two_runs_identical,tests/test_simulator.py::test_determinism_two_runs_identical
The test_id column is the punchline: each requirement does not point to a paragraph of prose, it points to a function that runs. That is CSA in practice โ the evidence is generated, not transcribed.
From a written requirement to a runnable test: the OSS validation lifecycle treats GAMP categorization and risk as the dial that sets how much qualification each component gets, then captures IQ as a digest-pinned manifest, OQ as automated pytest, and PQ as an end-to-end batch โ every step traceable back to a URS.
Original diagram by the authors, created with AI assistance.
IQ: the install manifest, pinned to digestsโ
Installation Qualification answers "is the right thing installed, configured correctly, in the right place?" For infrastructure-as-code this is almost free, because the install is a file. The IQ manifest is a snapshot of the running stack's components and their digests โ the evidence that what is deployed matches what was specified:
// illustrative compliance/iq_manifest.json (captured from the running stack)
{
"captured_utc": "2026-06-14T09:00:00Z",
"compose_file_sha256": "โฆ",
"components": [
{ "service": "postgres", "image": "timescale/timescaledb:2.17.2-pg17",
"digest": "sha256:3324f81cโฆ", "gamp_category": 1, "profile": "core" },
{ "service": "grafana", "image": "grafana/grafana-oss:11.4.0",
"digest": "sha256:d8ea3779โฆ", "gamp_category": 1, "profile": "core" }
]
}
Because the manifest is generated from the same versions.lock digests that the SBOM and supplier register use, an IQ that passes proves three things at once: the right versions are installed, they match the license inventory, and they match what CI tested. If a digest drifts, the OQ suite in the next section is designed to fail loudly โ which is the whole point.
OQ: the tests in the repo are the evidenceโ
Operational Qualification answers "does it do what it should, across its operating range?" Here is where the hands-on nature of this book pays off: we already wrote the OQ. The pytest suite [12] in examples/tests/ is not a developer convenience bolted on afterward โ it is the operational evidence, and it runs against the live compose stack.
Look at examples/tests/test_db.py. These tests connect to the running PostgreSQL+TimescaleDB and assert that the system behaves to requirement. The historian and contextualization checks verify URS-001 and URS-002:
# from examples/tests/test_db.py
def test_historian_loaded(conn):
n = _scalar(conn, "select count(*) from ts.sensor_reading where batch_id='BATCH-2026-001'")
assert n > 300_000 # 16 tags x ~20160 minutes
def test_contextualization_joins_phase(conn):
# every reading in the golden batch should resolve to a named phase
rows = _scalar(conn, "select count(distinct phase_name) from s88.v_batch_sensor "
"where batch_id='BATCH-2026-001' and phase_name is not null")
assert rows >= 4 # Inoculate, Growth, Production, Harvest
The two data-integrity tests are the most important, because they verify the Critical-risk URS-003 and URS-004 โ the ALCOA+ controls an inspector cares about most. test_audit_captures_update performs a real UPDATE, supplying a user and a reason, then asserts the change was captured and the tamper-evident hash chain is still intact:
# from examples/tests/test_db.py
def test_audit_captures_update(conn):
# an UPDATE must record old + new + who + why and keep the chain intact
with conn.cursor() as cur:
cur.execute("select set_config('app.user','pytest',false), "
"set_config('app.reason','test correction',false)")
cur.execute("update lab.result set value = value where result_id = "
"(select result_id from lab.result limit 1)")
conn.commit()
last = _scalar(conn, "select action from audit.change_log "
"where app_user='pytest' order by seq desc limit 1")
assert last == "UPDATE"
assert _scalar(conn, "select count(*) from audit.verify_chain()") == 0
That single test, run and logged, is an OQ script that exercises the audit trail end to end. It is better evidence than a screenshot, because anyone โ including the inspector โ can re-run it and watch it pass.
The simulator tests in examples/tests/test_simulator.py cover the reproducibility requirement (URS-005). Determinism is a validation property in its own right: if the same inputs do not produce the same outputs, you cannot qualify anything built on them.
# from examples/tests/test_simulator.py
def test_determinism_two_runs_identical():
a = fed_batch.simulate("BATCH-2026-001").tags["value"].to_numpy()
b = fed_batch.simulate("BATCH-2026-001").tags["value"].to_numpy()
assert np.array_equal(a, b)
And the whole suite is fronted by a single command in examples/Makefile, which is the exact line a validation engineer (or CI) types to produce the operational evidence:
# from examples/Makefile
test: ## run the test suite (determinism + db + analytics)
$(PY) -m pytest -q tests
make test is the OQ execution. The console output, captured to a file, is the OQ result. Re-running it on a clean machine โ which CI does on every commit โ is regression-testing your validated state for free.
PQ: the batch is the proofโ
Performance Qualification answers "does it work for the real job, under real conditions?" For us that is the capstone: a full simulated fed-batch CHO campaign flowing from sensor through historian and contextualization to a reviewable, signed dataset and an audit-trail review report. PQ is where the abstract requirements meet the actual mAb process. The capstone target exercises exactly that path end to end, and CI asserts the reviewable dataset exists and the report is non-empty โ the headline "implementation is possible" proof for the whole book.
Why it mattersโ
Without this chapter, the previous twenty are a hobby project. A regulated facility cannot run software it cannot defend. The reason this matters so much for open source specifically is the missing vendor: when you buy a validated LIMS, the supplier's quality system absorbs part of your burden. With OSS, that burden lands entirely on you โ but it does not disappear, and CSA plus automated testing make it cheaper to discharge honestly than the old screenshot-driven CSV ever did. The repo turns validation from a documentation project into an engineering one, where the evidence is generated by the same CI that builds the software.
In the real worldโ
The honest verdict: this approach is real, and it is being done. The IMPALA Consortium's published validation of a community R package is proof that big pharma will stand behind independently-assessed OSS in a GxP context [8]. GAMP 5 Second Edition's open-source appendix and the GAMP CoP guidance mean the framework explicitly supports it [4] [5]. And CSA's finalization in 2025 means the risk-based, evidence-leveraging posture this chapter takes is now the expected inspection-ready approach, not a clever workaround [7].
But be brutally clear about the limits. Automated pytest is excellent OQ evidence; it is not the whole validation package. You still owe: a validation plan and report signed by a qualified person, a change-control procedure (every digest change re-triggers assessment), periodic review, a deviation process, and โ the recurring honesty of this book โ the recognition that no OSS component is Part 11-compliant out of the box. The Ch 20 hash chain makes tampering detectable, not impossible; a superuser who disables the trigger can still bypass it. The supplier-assessment effort for a thinly-maintained project can exceed the cost of buying a supported one, and that trade-off is a legitimate business decision, not a failure. Initiatives like NIIMBL โ the U.S. public-private Manufacturing USA institute for biopharmaceutical innovation โ and its SABRE pilot-scale current Good Manufacturing Practice (cGMP) facility at the University of Delaware (a building under construction, not a data program) exist precisely because getting from a working prototype to a validated, inspectable process is the hard, expensive part. This chapter shows you can get there with open source; it never pretends the last mile is free.
Key termsโ
- GAMP 5 โ ISPE's risk-based framework for validating GxP computerized systems; its categories (1 infrastructure โ 5 custom) scale validation effort.
- CSA (Computer Software Assurance) โ FDA's risk-based successor to heavyweight CSV: spend assurance effort in proportion to patient/data risk; leverage existing evidence.
- CSV (Computer System Validation) โ the traditional, often document-heavy approach CSA reforms.
- IQ / OQ / PQ โ Installation, Operational, and Performance Qualification: is it installed right, does it operate right, does it perform for the real job.
- URS (User Requirement Specification) โ the statements of what the system must do; the root of the traceability matrix.
- Traceability matrix โ the mapping from each URS to the test(s) that verify it (and back).
- Supplier assessment โ evaluating a software supplier's quality; for OSS, assessing the project's health and verifying the artifact instead.
- SBOM โ Software Bill of Materials; a standardized inventory (CycloneDX/ECMA-424, SPDX/ISO 5962) of every component, its version, provenance, and license.
- Image digest โ a content-addressable SHA-256 of a container image; pinning by digest makes an install bit-for-bit reproducible.
Where this leadsโ
Validation answers "is this system trustworthy?" โ but trust is not global. A record that satisfies FDA's Part 11 may face different residency, retention, and audit-trail expectations in the EU, Korea, China, or Japan. The next chapter, Data Across Jurisdictions: FDA, EU, PIC/S, NMPA, PMDA, MFDS, takes the validated stack across borders and shows how to encode those differing rules as data and policy rather than as forked deployments.