Skip to main content

The Honest Verdict: Open Source vs Commercial

๐Ÿ“ Where we are: the last chapter before the references. The stack is built, the capstone batch has flowed end to end, and now we add up the bill โ€” honestly. Where did pure open source win, where did it fall short, and how do you write that decision down so the next engineer (or the next auditor) understands why?

The whole book has been one long argument that you can build most of a bioprocess data platform in open source, on a laptop, against a deterministic simulated fed-batch CHO + Protein A monoclonal-antibody (mAb) line. The companion repo proves the runnable half: clone it, type a handful of make targets, and the historian fills, the batches contextualize, the audit chain verifies, the soft-sensor trains. But "you can build it" was never the same claim as "you should build all of it." This chapter is the reckoning. It is deliberately the least code-heavy chapter in the book, because the artifact it produces is a decision, not a service.

The simple version

Building a data platform is like building a house. You can pour the slab, frame the walls, run the plumbing, and wire the lights yourself โ€” open source hands you genuinely excellent, free materials for all of that. But when the building inspector signs the occupancy permit, you are on the hook, not the lumber yard. For the rooms the inspector cares about most โ€” the ones that hold the legal record of what you made and shipped โ€” most people hire a licensed contractor who carries insurance and signs their name to the work. This chapter is the honest scorecard of which rooms you frame yourself and which you contract out, plus a one-page form for writing down why you chose what you chose.

What this chapter coversโ€‹

  • A layer-by-layer scorecard of what the pure open-source stack we built genuinely delivers, and where commercial or hybrid is the responsible choice.
  • The hidden costs the price tag never shows: validation burden, supply-chain ownership, and vendor accountability.
  • A build-vs-buy decision framework and an Architecture Decision Record (ADR) template you can paste into your own repo.
  • A reference "honest hybrid" target architecture โ€” the shape a real facility actually ships.

The scorecard: what we actually built, and what it actually costโ€‹

Every chapter in this book turned on one more layer of the same stack and then told you, plainly, where that layer stopped being enough. Collect those verdicts in one place and a pattern appears. Open source wins decisively at the bottom and the middle of the stack โ€” connectivity, ingestion, historization, contextualization, semantics, analytics โ€” and the score tilts toward commercial or hybrid exactly where the validated GMP record of truth lives.

Here is the honest scorecard, layer by layer. The OSS tools and their pins are the real ones from the running stack defined in examples/platform/compose/compose.yaml; the verdicts are the ones the earlier chapters earned.

LayerOSS tool in this bookPure-OSS verdictWhere commercial / hybrid wins
Edge connectivityOPC UA (asyncua)Wins. Mature, standards-based, free.Vendor driver certification; the ~80% of field devices that mis-handle OPC UA security.
Message busMQTT + Sparkplug B (eclipse-mosquitto:2.0.22)Wins. Light, ubiquitous, broker-agnostic.Brokered HA clusters with vendor SLAs.
Historian / TSDBTimescaleDB (timescale/timescaledb:2.17.2-pg17)Wins for capture & query.HA, compression at PB scale, and 30-year retention SLAs (the TSL/commercial tier).
Batch & equipment modelPostgreSQL (ISA-88/95) โ€” same timescale/timescaledb:2.17.2-pg17 container as the historianWins. A relational join backbone is a relational join backbone.A validated, configurable MES product.
ContextualizationSQL viewsWins. This is just good schema design.โ€”
Semantics / digital threadRDF / SPARQL (apache/jena-fuseki:5.2.0)Wins. Open standards beat any proprietary graph here.Managed graph + ontology curation services.
VisualizationGrafana (grafana/grafana-oss:11.4.0)Wins for local use.AGPL redistribution/SaaS triggers obligations; commercial dashboards bundle support.
Compliance / record of truthPostgres audit + hash chainPartial. Detectable, not tamper-proof.Commercial/hybrid. Validated MES/eQMS, qualified e-signatures, vendor accountability.
Analytics / soft-sensorPython (scikit-learn PLS)Wins. The science is open.Validated modelling platforms for release-impacting models.

The two analytics rows are the cleanest "OSS wins." When we trained the Raman-to-titer soft-sensor in examples/analytics/soft_sensor.py, the open stack did not merely run โ€” it produced a genuinely useful model:

PLS soft-sensor (titer from Raman): R2=0.9923 RMSE=0.1498 g/L (6 comps, 701 wavenumbers, 235 train / 101 test)
ASSERT ok: R2 > 0.85 โ€” the Raman dataset is genuinely predictive of titer.

scikit-learn (BSD-licensed) gave us an Rยฒ of 0.99 against a held-out split, for nothing. There is no commercial PLS engine that would have done the chemometrics better. This is the side of the verdict where the answer is easy. Peer-reviewed surveys of "Bioprocessing 4.0" reach the same conclusion from the other direction: industry adoption of digital tools is held back far less by the availability of algorithms than by the absence of common standards and integration โ€” exactly the gap an open, standards-based stack is good at closing [1].

The hard row is the compliance one, and it is hard for reasons the price tag never shows.

The hidden costs: validation burden, supply chain, accountabilityโ€‹

A naive cost comparison says open source is free and commercial software is expensive. That comparison is wrong in both directions, and the chapter's whole point is why.

Validation burden does not disappear when the license fee does. No open-source component in this book ships as "21 CFR Part 11 compliant," because compliance is a property of a validated system plus procedures, not of a download โ€” a theme the preface set and every Trust chapter reinforced. GAMP 5 (2nd edition) is explicit that regulated firms should maximize supplier involvement and leverage vendor documentation to reduce their own validation effort [2]. When you choose open source, there is no supplier to leverage โ€” you are the supplier. You write the user requirements, the IQ/OQ/PQ, the traceability matrix, the supplier assessment, all of it. The good news, the same direction GAMP 5 has been moving and the FDA's Computer Software Assurance (CSA) guidance pushes hard, is that this burden is risk-based: low-risk, non-record components can lean on automated testing and existing evidence rather than heavyweight documentation [3]. That is why this book made the test suite (make test) the validation evidence and pinned every image by digest โ€” to make the assurance effort least-burdensome, not to make it vanish.

You inherit the supply chain you no longer pay a vendor to own. Choosing open source means choosing to manage third-party and supply-chain risk yourself. NIST's Secure Software Development Framework names the obligation directly: you must vet the components you reuse and express security requirements you would otherwise delegate to a vendor [4]. The 2026 reality โ€” a compromised release of a widely-used container scanner earlier in the year โ€” is a reminder that even your security tooling is part of the threat surface. The open-source answer is transparency: a Software Bill of Materials gives you the supplier, component, and version visibility to own the record-of-truth without a vendor, while honestly documenting the maintenance you have just signed up for [5]. And the tooling for that is itself excellent open source โ€” Syft generates an SBOM straight from a container image or filesystem [6]. The repo ships no SBOM target, so treat the command below as a representative Syft invocation you would add yourself โ€” run against the real historian image pinned in versions.lock, it inventories exactly what is running, by digest:

# Representative Syft invocation (not a make target in the repo). Inventory the
# real historian/relational image โ€” the accountability OSS *can* deliver.
syft timescale/timescaledb:2.17.2-pg17 -o spdx-json > sbom.spdx.json

An SBOM with the NTIA minimum elements โ€” supplier, component name, version, unique identifier, dependency relationship, author, timestamp โ€” is the open-source operator's substitute for "call the vendor": you can answer what is in my stack and where did it come from without anyone's permission [5].

Accountability is the cost that has no open-source substitute. When an inspector finds a data-integrity problem in your batch record, EU Annex 11 is unambiguous about who answers for it: there must be formal agreements with suppliers stating their responsibilities, the supplier must be assessed, and the regulated manufacturer remains ultimately responsible [7]. A commercial vendor signs that formal agreement and carries that responsibility contractually. An open-source project's license, by design, disclaims all warranty. For the record-of-truth itself โ€” the data that FDA guidance requires to be reliable, accurate, attributable, with reviewed audit trails [8] โ€” that missing accountability is precisely the factor that tilts a responsible build-vs-buy decision toward a validated, vendor-backed system.

This is exactly why our compliance layer is honest about its own limits. The ALCOA+ audit chain in examples/platform/db/50-alcoa.sql is a beautiful, free mechanism โ€” but read the comment at the top:

-- 50-alcoa.sql โ€” ALCOA+ by construction (Chapters 20 & 21).
-- A generic, trigger-based audit trail: every INSERT/UPDATE/DELETE on a
-- registered table is appended to audit.change_log with who/what/when/old/new,
-- and each row is hash-chained to the previous one so tampering is *detectable*.
-- The book is explicit that a superuser who disables the trigger can still
-- bypass this โ€” hash chaining makes tampering evident, not impossible.

make alcoa runs audit.verify_chain() and returns 0 broken links when the chain is intact. That is real, and it is worth having. But "detectable, not impossible" is the whole verdict in five words: the mechanism is open source; the compliance โ€” the validated procedures, the change control, the accountable owner โ€” is the hybrid last mile.

The build-vs-buy framework, and writing it down as an ADRโ€‹

So how do you actually decide, layer by layer? The framework is one question asked four ways:

  1. Does this layer touch the regulated control strategy or the record of truth? If a value flowing through it can change a release decision, the bar is "validated and accountable." If it is engineering, monitoring, or analytics downstream of the record, open source is usually the better buy. The A-Mab case study โ€” the canonical fed-batch CHO + Protein A QbD reference grounded in ICH Q8(R2)/Q9/Q10 โ€” is the map of which parameters are critical and therefore which parts of the stack inherit that bar [9].
  2. What is the validation burden, and can CSA make it proportionate? A low-risk historian read-replica is cheap to assure; a system that captures the qualified electronic signature on a batch disposition is not [3].
  3. Who carries accountability if it fails an inspection? If the honest answer is "no one but us, with no warranty," that is a real cost, not a free one [7].
  4. What is the lock-in cost of the alternative? Open standards (OPC UA, MQTT/Sparkplug, ISA-88/95, RDF) are the strongest argument for OSS: they make every future migration cheaper, and an SBOM keeps the door open [5].

The bioprocessing literature frames the same fork as three execution paths: partner with system integrators, buy off-the-shelf, or develop in-house โ€” and documents that low Industry 4.0 adoption and a lack of common standards are what make the choice hard [1]. A scorecard is not a sales pitch only if you record the reasoning. The discipline that turns an opinion into a defensible engineering decision is the Architecture Decision Record: a short, dated document capturing context, the decision, and its consequences [10]. Here is the template the repo recommends you commit beside the code it governs:

# ADR-0028: Build-vs-buy for the GMP record-of-truth
- Status: Accepted Date: 2026-06-14
## Context
The fed-batch CHO + Protein A line needs a system of record for the
batch disposition. Open-source Postgres + audit hash chain makes
tampering *detectable* (make alcoa -> 0 broken links) but ships no
vendor accountability and is not Part 11-compliant out of the box.
## Decision
Use a VALIDATED commercial MES/eQMS for the signed record of truth.
Keep the OSS stack (TimescaleDB historian, contextualization, SPARQL
digital thread, PLS soft-sensor) as a read-mostly NOA-side layer.
## Consequences
+ Vendor carries the Annex 11 supplier agreement & accountability.
+ OSS layer stays cheap, fast to evolve, lock-in-resistant (open standards).
- Two systems to reconcile; the OSS<->MES boundary must stay explicit.
- We still own validation of the OSS layer (risk-based, CSA-leveraged).

The point of the ADR is not the verdict; it is that six months later, when someone asks "why didn't we just use Postgres for everything?", the answer is written down with its reasoning intact.

The honest hybrid target architectureโ€‹

Put the scorecard and the framework together and you get the architecture a real facility actually ships โ€” not "all open source," not "all commercial," but a deliberate hybrid with the boundary drawn in ink.

A target architecture split into two zones. On the left, a large open-source zone holds edge connectivity (OPC UA), the MQTT/Sparkplug message bus, the TimescaleDB historian, the PostgreSQL ISA-88/95 batch model, SQL contextualization, the SPARQL digital thread, and the Python analytics and soft-sensor. On the right, a smaller commercial-and-validated zone holds the validated MES and eQMS that own the signed batch record and electronic signatures. A clearly marked, read-mostly NOA boundary connects them, with arrows showing the open-source layer reading from and feeding context to the validated record without writing into it.

The honest hybrid. The wide open-source zone โ€” everything you built in this book โ€” handles connectivity, historization, contextualization, semantics, and analytics, all on open standards that resist lock-in. The narrow commercial-and-validated zone owns the signed record of truth and the qualified signatures, where vendor accountability is non-negotiable. The line between them is the read-mostly NOA boundary: the open stack reads and models; the validated system holds the record. Original diagram by the authors, created with AI assistance.

This is not a compromise born of weakness. It is the strongest design: it spends open source where open source is genuinely better โ€” speed, cost, openness, lock-in resistance โ€” and spends accountability where accountability is genuinely required.

Why it mattersโ€‹

Get this verdict wrong in either direction and it costs you. Build the regulated record on unvalidated open source because it was free, and you discover during an inspection that "free" carried a validation-and-accountability bill you never budgeted for [7]. Buy everything commercial because it felt safe, and you pay vendor margins for connectivity, historization, and analytics that open standards do for nothing, while quietly locking yourself out of your own data. The engineers who get it right are the ones who can say, layer by layer, here is what we built, here is what we bought, and here is the written reasoning โ€” which is exactly what the scorecard, the framework, and the ADR give you.

In the real worldโ€‹

The hybrid this chapter scores is the pattern that actually ships. A real CHO + Protein A mAb facility runs a validated commercial MES and historian for the signed record-of-truth, and increasingly runs an open-source layer alongside it for contextualization, analytics, and engineering โ€” because that layer is cheaper to own and faster to evolve, and because open standards keep the data portable. NIIMBL โ€” the U.S. public-private Institute for Innovation in Biopharmaceutical Manufacturing โ€” funds work on precisely these data-integration and digital-maturity problems, and its SABRE facility (a pilot-scale current Good Manufacturing Practice, or cGMP, facility at the University of Delaware that broke ground in April 2024) is the kind of site where these boundaries have to be real rather than theoretical. Note carefully what SABRE is: a facility under construction, not a data program, and as of mid-2026 no open-source tool in this book โ€” not PostgreSQL, not Grafana, not Fuseki, not Keycloak โ€” ships as a "Part 11-compliant product." They ship the mechanisms; you build, validate, and procedurally surround the compliance. SENAITE, the OSS LIMS this book treated as a teaching system, is the cautionary footnote: its only published Part 11 gap analysis dates to 2019 and still lists real gaps in e-signatures, retention, and password controls. Marketing maturity is not validated maturity โ€” and the honest verdict is the one that keeps that distinction in ink.

Key termsโ€‹

  • Build-vs-buy โ€” the decision, taken per layer, between assembling a capability from open-source parts (build) and licensing a vendor product (buy); often resolved as a deliberate hybrid.
  • Total cost of ownership (TCO) โ€” the full lifetime cost of a choice, including the costs no price tag shows: validation burden, supply-chain maintenance, and the cost of missing vendor accountability.
  • Architecture Decision Record (ADR) โ€” a short, dated, version-controlled document capturing the context, decision, and consequences of one architectural choice, so the reasoning survives the engineer who made it.
  • Validation burden โ€” the IQ/OQ/PQ, traceability, and supplier-assessment effort a regulated firm must perform; reducible (not removable) by risk-based CSA and by leveraging supplier evidence โ€” which open source lacks because you are the supplier.
  • Vendor accountability โ€” the contractual responsibility a commercial supplier carries under agreements Annex 11 requires; the cost open source structurally cannot supply, because its licenses disclaim warranty.
  • Software Bill of Materials (SBOM) โ€” a formal inventory of every component, version, and supplier in a system; the open-source operator's substitute for "call the vendor," and the basis for owning supply-chain risk transparently.
  • Honest hybrid โ€” the target architecture this book argues for: a wide open-source zone for connectivity through analytics, a narrow commercial-and-validated zone for the signed record, joined at a read-mostly NOA boundary.

Where this leadsโ€‹

That is the verdict, and it is the end of the build. One short section remains: the References โ€” the verified sources behind every claim in this book, the citations you can hand to a reviewer who, reasonably, wants to check that "every claim is runnable or cited" was a promise we kept.