Skip to main content

Bridging to Commercial & Open-Source LIMS

๐Ÿ“ Where we are: Part IV ยท Meeting Reality โ€” our platform now holds the time-series, the batch model, and the bridges to PI and the DCS/MES/ERP; this chapter connects the last system that owns a decision โ€” the laboratory that decides whether a batch may be released โ€” and is honest about who the system of record really is.

The simple version

Think of your data platform as a busy newsroom that prints a daily paper about every batch. It gathers facts from everywhere โ€” the bioreactor, the chromatography skid, the dashboards. But there is one fact it is not allowed to make up: whether the drug passed its release tests. That verdict is written, signed, and locked away in the laboratory's own filing cabinet โ€” the LIMS (Laboratory Information Management System). Our job is not to seize that cabinet. It is to politely ask for a photocopy of the certificate, file it neatly next to everything else, and never let our copy pretend to be the original. When the lab changes its mind โ€” a result is corrected, a batch is failed โ€” our copy must update faithfully, or it becomes a dangerous rumour. This chapter builds that polite, faithful, two-way photocopier.

What this chapter coversโ€‹

The previous bridges pulled process data from systems that watch the plant. The lab is different: it does not just observe, it adjudicates. Final release testing โ€” the SEC, CEX, host-cell-protein, residual Protein A, DNA, and endotoxin assays that say a monoclonal-antibody (mAb) lot is fit to ship โ€” usually lives in a commercial LIMS, and the Certificate of Analysis (CofA) it issues is a regulated record. We cover:

  • The vocabulary of lab informatics โ€” LIMS, ELN, SDMS โ€” and where each open-source tool fits.
  • How the OSS stack already has a place for lab results: the lab.sample / lab.test / lab.result tables, exchanged with a LIMS over REST/JSON and files.
  • A real CofA-in sync against a mock commercial LIMS, with idempotency and conflict resolution so re-running it is safe.
  • The honest open-source landscape: SENAITE for QC/release, openBIS for process development, and why LabKey's Part 11 features sit behind a paywall.
  • The discipline that keeps your copy from becoming a shadow record โ€” the failure mode regulators care about most.

Everything that can run on a laptop runs against tables and a mock you already have. The real LIMS is the one thing we cannot ship, so we mock its API and are explicit about it.

Three letters that are not the same: LIMS, ELN, SDMSโ€‹

Before wiring anything, get the vocabulary straight, because integration contracts depend on it. ASTM E1578, the standard guide for laboratory informatics, draws the lines the industry uses [1]. A LIMS is sample-centric: it registers a sample, schedules tests, captures results against specifications, and drives the release workflow. An ELN (Electronic Lab Notebook) is experiment-centric: it records what a scientist did and why, the narrative a paper notebook used to hold. An SDMS (Scientific Data Management System) is file-centric: it archives the raw instrument output โ€” chromatograms, spectra โ€” as the untouched original.

The running case touches all three. For release, a LIMS owns the SEC/CEX/endotoxin verdicts. In process development, an ELN captures the rationale behind a media tweak. And the raw HPLC trace behind a titer number belongs in an SDMS-style archive. Confusing these is how integrations rot: you cannot push a free-text experiment narrative into a LIMS result field and expect a clean release decision out the other side.

The OSS stack already has a lab schemaโ€‹

We do not need a new database for lab data; Chapter 8 already gave us one. In examples/platform/db/30-lab-events.sql, three tables model the sample-to-result spine that any LIMS exchange has to map onto:

-- examples/platform/db/30-lab-events.sql
CREATE TABLE lab.sample (
sample_id text PRIMARY KEY,
batch_id text REFERENCES s88.batch,
sample_time timestamptz NOT NULL,
sample_point text NOT NULL,
sample_type text NOT NULL DEFAULT 'in_process' -- in_process | release | stability
);

CREATE TABLE lab.test (
test_id text PRIMARY KEY,
name text NOT NULL,
unit text,
spec_low numeric,
spec_high numeric
);

CREATE TABLE lab.result (
result_id bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
sample_id text NOT NULL REFERENCES lab.sample,
test_id text REFERENCES lab.test,
value numeric,
text_value text,
unit text,
result_ts timestamptz NOT NULL DEFAULT now(),
analyst text,
instrument_id text,
status text NOT NULL DEFAULT 'preliminary', -- preliminary | verified | rejected
UNIQUE (sample_id, test_id, result_ts)
);
CREATE INDEX ON lab.result (sample_id);

Two design choices in this schema are the whole game for a faithful bridge. First, every result carries its provenance โ€” analyst, instrument_id, result_ts โ€” and a status that distinguishes a preliminary value from a verified one. A LIMS release decision hangs on that distinction, so our copy must carry it too, or we will quietly elevate an unconfirmed number to fact. Second, the UNIQUE (sample_id, test_id, result_ts) constraint is our idempotency key: it lets the same result be re-sent any number of times without ever duplicating a row. That single constraint is what turns a fragile one-shot import into a sync you can re-run after a crash.

The link back to manufacturing is lab.sample.batch_id, a foreign key into the ISA-88/95 s88.batch table we seeded in Chapter 3. That is what makes a lab result contextual โ€” a CofA value is not floating data, it is attached to BATCH-2026-001, which ran on unit BR101, made product MAB-001.

What a Certificate of Analysis actually looks likeโ€‹

The simulator already generated a release dataset for the six-batch campaign. Below are representative release tests for the first batch from examples/datasets/hplc_results.csv (a subset โ€” the real file also carries SEC_LMW_pct, CEX_acidic_pct, CEX_basic_pct, and bioburden_CFU_per_10mL rows between these) โ€” exactly the shape a CofA exchange delivers:

batch_id,test,value,unit,spec_low,spec_high,result
BATCH-2026-001,SEC_monomer_pct,98.611,%,95.0,100.0,PASS
BATCH-2026-001,SEC_HMW_pct,1.287,%,0.0,3.0,PASS
BATCH-2026-001,CEX_main_pct,70.686,%,60.0,80.0,PASS
BATCH-2026-001,HCP_ng_per_mg,28.203,ng/mg,0.0,100.0,PASS
BATCH-2026-001,residual_ProteinA_ng_per_mg,1.149,ng/mg,0.0,20.0,PASS
BATCH-2026-001,host_cell_DNA_ng_per_dose,0.939,ng/dose,0.0,10.0,PASS
BATCH-2026-001,endotoxin_EU_per_mL,0.215,EU/mL,0.0,5.0,PASS
# ... (SEC_LMW_pct, CEX_acidic_pct, CEX_basic_pct, bioburden_CFU_per_10mL omitted)

Notice the columns: a value, its unit, the specification window, and a PASS/OOS verdict computed against that window. The whole batch passes only if every test passes. Our campaign deliberately includes one failure: BATCH-2026-004 returns HCP_ng_per_mg,128.0 against a spec ceiling of 100.0, flagged OOS โ€” out of specification โ€” which is why its s88.batch.status is rejected, not released. A bridge that silently dropped or rounded that one number would be hiding a failed lot. That is the nightmare. Faithful sync is not a nicety here; it is patient safety.

The release decision is a regulated act. Under 21 CFR 211.165 a drug product may not be released until laboratory tests have determined conformance to final specifications [2], and 21 CFR 211.194 defines the complete laboratory record โ€” methods, raw data, who tested, who reviewed โ€” that must stand behind every value on the CofA [3]. The LIMS holds that complete record. Our table holds a true copy of the parts the platform needs to contextualize and visualize. The difference between those two is the spine of this entire chapter.

A faithful, idempotent CofA-in syncโ€‹

A real commercial LIMS (LabWare, STARLIMS, SampleManager) is proprietary and license-locked; there is no public image to run on a laptop. So, exactly as we did for AVEVA PI in Chapter 17, we point the bridge at a mock that honours the same REST contract โ€” and we are explicit that the bridge code is real while the counterpart is simulated. The mock ships in the companion repo as services/lims-cofa-adapter and comes up behind the commercial Compose profile (docker compose --profile commercial up -d lims-cofa-adapter); the real bridge client is bridges/lims_cofa_adapter.py, and in production only the base URL and credentials change. The adapter exposes a CofA as JSON โ€” this is the contract a real LIMS CofA endpoint returns:

// GET /api/v1/cofa/BATCH-2026-001 (served by services/lims-cofa-adapter; a real LIMS returns the same shape)
{
"batch_id": "BATCH-2026-001",
"lot": "L26001",
"disposition": "released",
"results": [
{"test": "SEC_monomer_pct", "value": 98.611, "unit": "%",
"spec_low": 95.0, "spec_high": 100.0, "result": "PASS",
"analyst": "j.okafor", "instrument_id": "HPLC-07", "status": "verified",
"result_ts": "2026-01-20T10:15:00Z"},
{"test": "HCP_ng_per_mg", "value": 28.203, "unit": "ng/mg",
"spec_low": 0.0, "spec_high": 100.0, "result": "PASS",
"analyst": "j.okafor", "instrument_id": "ELISA-02", "status": "verified",
"result_ts": "2026-01-20T11:02:00Z"}
]
}

The bridge's job is to land those rows in lab.result idempotently, so the sync can run on a schedule and re-run after any failure without creating duplicates or โ€” worse โ€” second copies that disagree. The pattern is a PostgreSQL INSERT ... ON CONFLICT, keyed on the unique constraint the schema already defines:

-- examples/bridges/lims_cofa_adapter.py โ€” the heart of the idempotent CofA-in upsert
INSERT INTO lab.result
(sample_id, test_id, value, unit, result_ts, analyst, instrument_id, status)
VALUES
(%(sample_id)s, %(test_id)s, %(value)s, %(unit)s,
%(result_ts)s, %(analyst)s, %(instrument_id)s, %(status)s)
ON CONFLICT (sample_id, test_id, result_ts)
DO UPDATE SET
value = EXCLUDED.value,
status = EXCLUDED.status,
analyst = EXCLUDED.analyst
WHERE lab.result.status <> 'verified'; -- never overwrite a verified record in place

Read that WHERE clause carefully, because it is the conflict-resolution policy in one line. We let a preliminary row be updated by a fresher value, but we refuse to overwrite a row already marked verified. Why? Because in a GxP world you do not edit a confirmed record โ€” you supersede it, leaving the old one visible. A correction to a verified result must arrive as a new result_ts (a new row), not a quiet in-place change, so that the history of what the lab thought and when is never erased. Chapter 20 will make that append-only discipline structural with system-versioned history tables; here we simply refuse the destructive update.

The richest single instrument record gets archived in its original, vendor-neutral form. The simulator also emits one Allotrope Simple Model document, examples/datasets/hplc_titer.asm.json, so the raw HPLC titer measurement survives as a self-describing original โ€” the SDMS role โ€” not just a number stripped of its context:

// examples/datasets/hplc_titer.asm.json (Allotrope Simple Model โ€” the "original" the LIMS value derives from)
{
"$asm.manifest": "http://purl.allotrope.org/manifests/core/REC/2024/06/manifest.schema",
"measurement aggregate document": {
"measurement document": [{
"sample document": {"batch identifier": "BATCH-2026-001",
"sample identifier": "BATCH-2026-001-DS"},
"device system document": {"device identifier": "HPLC-07",
"model number": "OpenHPLC-1"},
"measurement identifier": "BATCH-2026-001-titer",
"protein concentration": {"value": 5.877, "unit": "g/L"},
"measurement time": "2026-01-20T10:15:00Z"
}]
}
}

The CofA value and this ASM document share a batch identifier, an instrument_id/device identifier (HPLC-07), and a timestamp โ€” so the platform can always walk from a released number back to the raw measurement it came from. That traceable thread is what a regulator means by reconstructable.

The shape of the bridge, end to endโ€‹

Putting it together, the flow is small and deliberate. The diagram below shows where the decision lives versus where our copy lives.

Diagram showing release-test data flowing from instruments into a commercial LIMS that issues a signed Certificate of Analysis; a one-way verified true-copy sync over REST/JSON lands those results in the OSS lab.result table linked to the ISA-95 batch, while SENAITE and openBIS occupy the QC and process-development slots.

The LIMS adjudicates and signs; the open-source stack keeps a faithful, read-mostly true copy joined to the batch. The arrow of authority points one way. Original diagram by the authors, created with AI assistance.

The single most important property of this picture is the direction of the authority arrow. Data flows into our stack; the decision never flows out of it. The MHRA's data-integrity guidance is blunt about the trap we are avoiding: a copy of a regulated record is acceptable only as a verified true copy, and a parallel record that is treated as authoritative without the original's controls is a shadow record โ€” a finding, not a feature [4]. Our status column, our refusal to overwrite verified rows, and our preservation of the ASM original are precisely the controls that keep the copy honest.

The honest open-source LIMS landscapeโ€‹

If you do not have a commercial LIMS, can open source fill the slot? Partly โ€” and the honest answer depends on which slot.

SENAITE is the strongest open-source fit for QC and release testing. It is a mature LIMS built on Plone/Zope, covering sample registration through to a published result report, and it speaks REST: senaite.jsonapi exposes create/read/update endpoints so our stack can register samples (as AnalysisRequests) and pull results over plain HTTP/JSON [5] [6]. But be precise about maturity: SENAITE is published under the GPL v2.0 and is not a Part 11-compliant system out of the box โ€” the recurring "GxP last mile" items (a configured tamper-evident audit trail, point-of-signing e-signatures, validated retention controls, a documented password policy, and an IQ/OQ/PQ package) are configuration, procedure, and validation work the operator owns. To teach the shape of that gap, the repo ships an illustrative gap register at examples/compliance/gap-analyses/senaite-part11-gap.md and treats SENAITE as a teaching LIMS, not a compliant one out of the box. It is an excellent QC backbone; it is not a download that satisfies Part 11.

openBIS, from ETH Zurich, fits the process-development and R&D slot โ€” registering experiments, samples, and datasets with rich metadata. It is programmatically friendly through its V3 API and the pyBIS Python client, so PD samples can be registered and pulled with a few lines of Python [7]. It is Apache-2.0 licensed and the right tool when the question is "what did we try and what happened," not "is this lot releasable."

LabKey deserves a clear-eyed flag because its marketing blurs the line. The LabKey Community Edition is genuinely free, but the features you would actually need for a release LIMS โ€” electronic signatures designed to comply with 21 CFR Part 11 โ€” are explicitly labelled a Premium feature, available only in the Enterprise Edition [8], and the editions page confirms that compliance capabilities require a paid licence [9]. This is the recurring theme of the whole book in miniature: pure OSS gets you the data model and the workflow; the regulated last mile โ€” signatures, validated audit trail, vendor accountability โ€” is paywalled or hybrid. Saying so plainly is more useful than pretending otherwise.

Why it mattersโ€‹

A bioprocess data platform that cannot answer "did this lot pass?" is a curiosity, not a system. But the release decision is the one fact the platform must never author. Get the bridge wrong in the lossy direction and you hide an OOS result; get it wrong in the authoritative direction and you create a shadow record that a regulator can cite. The schema choices in this chapter โ€” provenance columns, a status lifecycle, a unique key for idempotency, and a conflict policy that supersedes rather than overwrites โ€” are not database trivia. They are the difference between a faithful copy and a liability. They let the platform do what it is good at (contextualize, visualize, analyze the released number against the whole batch) while leaving authorship where it legally belongs.

In the real worldโ€‹

In a real fed-batch CHO + Protein A facility, the LIMS sits at the centre of QC and the OSS platform orbits it. The integration is rarely glamorous: a nightly REST pull or a watched CofA file drop, an upsert keyed on sample and test, and an alert when a batch flips to OOS. The intensified/continuous variant โ€” perfusion with multi-column capture โ€” only raises the stakes, because near-continuous harvest means near-continuous sampling, and the sync cadence shifts from per-batch to per-shift. The pattern does not change; the frequency does.

NIIMBL, the U.S. public-private Manufacturing Innovation Institute, exists precisely because these integration seams are where biomanufacturing loses time and trust. Its SABRE facility โ€” a pilot-scale current Good Manufacturing Practice (cGMP) facility under construction at the University of Delaware, groundbreaking in April 2024 โ€” is a place to prove that a modern data platform can interoperate with the validated LIMS that QC actually runs, rather than replace it. (SABRE is a facility, not a data programme; the data discipline is ours to build.) GAMP 5, second edition, frames the risk picture cleanly: the LIMS that holds the release decision is a higher-risk system of record demanding rigorous assurance, while a read-mostly integration layer is risk-assessed and validated proportionately to its lighter intended use [10]. And the regulatory anchor never moves: wherever the LIMS holds the release record, its electronic records and signatures must meet 21 CFR Part 11, and the OSS layer must not become an uncontrolled parallel record alongside it [11]. Honest verdict: open source gives you a superb QC and PD data model and a clean exchange contract; it does not give you a validated, Part 11-signed release system. That last mile is hybrid, and pretending otherwise is exactly the mistake this book refuses to make.

Key termsโ€‹

  • LIMS (Laboratory Information Management System): sample-centric software that registers samples, schedules tests, captures results against specs, and drives release; usually the system of record for the release decision.
  • ELN (Electronic Lab Notebook): experiment-centric software recording what a scientist did and why.
  • SDMS (Scientific Data Management System): file-centric archive for raw instrument output kept as the untouched original.
  • CofA (Certificate of Analysis): the regulated document summarizing a lot's release-test results against specifications, with a pass/fail disposition.
  • OOS (Out Of Specification): a result outside its specification window; in the campaign, BATCH-2026-004's HCP value, which fails the lot.
  • True copy / shadow record: a verified faithful copy of a regulated original (acceptable) versus an uncontrolled parallel record treated as authoritative (a data-integrity finding).
  • Idempotent sync: an import that can be re-run safely without duplicating or corrupting rows, here enforced by ON CONFLICT on (sample_id, test_id, result_ts).
  • ASM (Allotrope Simple Model): a vendor-neutral JSON format for analytical measurements, used to keep the raw HPLC original self-describing.
  • cGMP: current Good Manufacturing Practice, the FDA's enforceable quality framework for drug manufacturing.

Where this leadsโ€‹

We have been careful all chapter to supersede rather than overwrite, to mark a record verified, and to keep the original measurement traceable โ€” but so far that discipline has lived in our habits and a single WHERE clause. The next chapter, ALCOA+ by Construction: Integrity in Code, makes it structural: append-only patterns, PostgreSQL trigger-based system-versioned history tables, and a hash-chain over the audit log, with a test suite that asserts the guarantees. We stop promising the copy is faithful and start proving it.