Bridging to Commercial & Open-Source LIMS

📍 Where we are: Part IV · Meeting Reality — our platform now holds the time-series, the batch model, and the bridges to PI and the DCS/MES/ERP; this chapter connects the last system that owns a decision — the laboratory that decides whether a batch may be released — and is honest about who the system of record really is.

The simple version

Think of your data platform as a busy newsroom that prints a daily paper about every batch. It gathers facts from everywhere — the bioreactor, the chromatography skid, the dashboards. But there is one fact it is not allowed to make up: whether the drug passed its release tests. That verdict is written, signed, and locked away in the laboratory's own filing cabinet — the LIMS (Laboratory Information Management System). Our job is not to seize that cabinet. It is to politely ask for a photocopy of the certificate, file it neatly next to everything else, and never let our copy pretend to be the original. When the lab changes its mind — a result is corrected, a batch is failed — our copy must update faithfully, or it becomes a dangerous rumour. This chapter builds that polite, faithful, two-way photocopier.

What this chapter covers

The previous bridges pulled process data from systems that watch the plant. The lab is different: it does not just observe, it adjudicates. Final release testing — the SEC (size-exclusion chromatography, which measures aggregation), CEX (cation-exchange chromatography, which measures charge variants), host-cell-protein, residual Protein A, DNA, and endotoxin assays that say a monoclonal-antibody (mAb) lot is fit to ship — usually lives in a commercial LIMS, and the Certificate of Analysis (CofA) it issues is a regulated record. (What each assay measures and why it gates patient safety — including how the dose-driven endotoxin limit is derived — is the subject of Book 1's Quality control and batch release and Measuring quality and keeping the protein stable; here we treat them as the results that cross the bridge.) We cover:

The vocabulary of lab informatics — LIMS, ELN, SDMS — and where each open-source tool fits.
How the OSS stack already has a place for lab results: the lab.sample / lab.test / lab.result tables, exchanged with a LIMS over REST/JSON and files.
A real CofA-in sync against a mock commercial LIMS, with idempotency and conflict resolution so re-running it is safe.
The honest open-source landscape: SENAITE for QC/release, openBIS for process development, and why LabKey's electronic-signature controls under the US 21 CFR Part 11 rule (Part 11 for short) sit behind a paywall.
The discipline that keeps your copy from becoming a shadow record — the failure mode regulators care about most.

Everything that can run on a laptop runs against tables and a mock you already have. The real LIMS is the one thing we cannot ship, so we mock its API and are explicit about it.

Three letters that are not the same: LIMS, ELN, SDMS

Before wiring anything, get the vocabulary straight, because integration contracts depend on it. ASTM E1578 (a standards-body guide — ASTM International publishes consensus technical standards) for laboratory informatics draws the lines the industry uses [1]. The three systems are not interchangeable; each owns a different shape of record, and an integration that mistakes one for another rots.

A LIMS is sample-centric: it registers a sample, schedules tests, captures results against specifications, and drives the release workflow. Its native object is a sample with results — a row that says "this lot, this test, this value, pass or fail." That is the shape our lab.result table mirrors and the shape a CofA delivers.

An ELN (Electronic Lab Notebook) is experiment-centric: it records what a scientist did and why — the method, the deviation, the reasoning — the narrative a paper notebook used to hold. Its native object is a signed, dated page, not a structured result row; pushing free-text rationale into a LIMS result field, or a numeric verdict into an ELN page, is exactly the category error that breaks integrations.

An SDMS (Scientific Data Management System) is file-centric: it archives the instrument's own output as the untouched original. This is the one most people flatten, so it is worth being precise — the SDMS keeps an original in three layers, and the chapter's whole "true copy" discipline depends on telling them apart:

The unprocessed instrument file — the vendor-specific binary the instrument actually wrote (a Waters MassLynx .raw directory, an ÄKTA/UNICORN .res or zip-of-XML). This is the irreducible original: years later it can be re-integrated with a corrected method, which a derived number never can.
A vendor-neutral export of that raw payload — mzML for a mass spectrum, ASTM ANDI/NetCDF for a chromatogram, or Allotrope's HDF5-based ADF cube for any dense curve — kept alongside the binary so the data survives the software that made it.
The processed result — the processed scalar the integration software computed from the trace (our titer 5.877 g/L, one of the sample's release scalars), written out as an Allotrope ASM JSON or an AnIML (Analytical Information Markup Language) XML. This is the thin summary the LIMS value derives from, not the original.

The SDMS keeps all three layers; the LIMS records only the top one. Fidelity lives at the bottom — the binary you can re-integrate — while convenience lives at the top — the scalar you can put in a table. The "true copy" this chapter syncs is a copy of the top layer, and knowing that is the whole discipline. Original diagram by the authors, created with AI assistance.

The running case touches all three systems. For release, a LIMS owns the SEC/CEX/endotoxin verdicts. In process development, an ELN captures the rationale behind a media tweak. And the raw HPLC trace behind a titer number belongs in an SDMS-style archive, in all three of its layers. Confusing these is how integrations rot: you cannot push a free-text experiment narrative into a LIMS result field and expect a clean release decision out the other side.

The OSS stack already has a lab schema

We do not need a new database for lab data; Chapter 10 already gave us one. In examples/platform/db/30-lab-events.sql, three tables model the sample-to-result spine that any LIMS exchange has to map onto:

-- examples/platform/db/30-lab-events.sql
CREATE TABLE lab.sample (
    sample_id    text PRIMARY KEY,
    batch_id     text REFERENCES s88.batch,
    sample_time  timestamptz NOT NULL,
    sample_point text NOT NULL,
    sample_type  text NOT NULL DEFAULT 'in_process'   -- in_process | release | stability
);

CREATE TABLE lab.test (
    test_id   text PRIMARY KEY,
    name      text NOT NULL,
    unit      text,
    spec_low  numeric,
    spec_high numeric
);

CREATE TABLE lab.result (
    result_id   bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
    sample_id   text NOT NULL REFERENCES lab.sample,
    test_id     text REFERENCES lab.test,
    value       numeric,
    text_value  text,
    unit        text,
    result_ts   timestamptz NOT NULL DEFAULT now(),
    analyst     text,
    instrument_id text,
    status      text NOT NULL DEFAULT 'preliminary',   -- preliminary | verified | rejected
    UNIQUE (sample_id, test_id, result_ts)
);
CREATE INDEX ON lab.result (sample_id);

Two design choices in this schema are the whole game for a faithful bridge. First, every result carries its provenance — analyst, instrument_id, result_ts — and a status that distinguishes a preliminary value from a verified one. A LIMS release decision hangs on that distinction, so our copy must carry it too, or we will quietly elevate an unconfirmed number to fact. Second, the UNIQUE (sample_id, test_id, result_ts) constraint is our idempotency key: it lets the same result be re-sent any number of times without ever duplicating a row. That single constraint is what turns a fragile one-shot import into a sync you can re-run after a crash.

The link back to manufacturing is lab.sample.batch_id, a foreign key into the ISA-88/95 s88.batch table we seeded in Chapter 4. That is what makes a lab result contextual — a CofA value is not floating data, it is attached to BATCH-2026-001, which ran on unit BR101, made product MAB-001.

What a Certificate of Analysis actually looks like

The simulator already generated a release dataset for the six-batch campaign. Below are representative release tests for the first batch from examples/datasets/hplc_results.csv (a subset — the real file also carries SEC_LMW_pct, CEX_acidic_pct, CEX_basic_pct, and bioburden_CFU_per_10mL rows between these) — exactly the shape a CofA exchange delivers. These are impurity, size, and charge tests; a real CofA also carries a potency result (the efficacy-linked CQA — Critical Quality Attribute, a measurable property that must stay within limits to keep the product safe and effective) and an identity confirmation, deliberately omitted here because the bridge mechanics are identical for every test type — release is not just impurity clearance:

batch_id,test,value,unit,spec_low,spec_high,result
BATCH-2026-001,SEC_monomer_pct,98.611,%,95.0,100.0,PASS
BATCH-2026-001,SEC_HMW_pct,1.287,%,0.0,3.0,PASS
BATCH-2026-001,CEX_main_pct,70.686,%,60.0,80.0,PASS
BATCH-2026-001,HCP_ng_per_mg,28.203,ng/mg,0.0,100.0,PASS
BATCH-2026-001,residual_ProteinA_ng_per_mg,1.149,ng/mg,0.0,20.0,PASS
BATCH-2026-001,host_cell_DNA_ng_per_dose,0.939,ng/dose,0.0,10.0,PASS
BATCH-2026-001,endotoxin_EU_per_mL,0.215,EU/mL,0.0,5.0,PASS
# ... (SEC_LMW_pct, CEX_acidic_pct, CEX_basic_pct, bioburden_CFU_per_10mL omitted)

Notice the columns: a value, its unit, the specification window, and a PASS/OOS verdict computed against that window. One subtlety worth flagging: the endotoxin_EU_per_mL ceiling shown here as 5.0 is product-and-dose-specific, not a universal limit — the parenteral (injected-drug) endotoxin limit is dose-driven — EU is the endotoxin unit, the measure of pyrogenic (fever-causing) contamination, and the per-mL ceiling comes from the pharmacopeial constant 5 EU/kg/hr (endotoxin units allowed per kilogram of patient body weight per hour) divided by the maximum hourly dose, set by the bacterial-endotoxins test in USP General Chapter <85> — so the EU/mL number on a real CofA follows from the drug's dosing. Book 1 walks the per-kilogram-per-hour arithmetic (how the units cancel to a per-mass, and then per-mL, limit) in Quality control and batch release. The whole batch passes only if every test passes. Our campaign deliberately includes one failure: BATCH-2026-004 returns HCP_ng_per_mg,128.0 against a spec ceiling of 100.0, flagged OOS — out of specification — which is why its s88.batch.status is rejected, not released. A bridge that silently dropped or rounded that one number would be hiding a failed lot. That is the nightmare. Faithful sync is not a nicety here; it is patient safety.

The release decision is a regulated act. Under 21 CFR 211.165 (the Code of Federal Regulations — US federal law, here the FDA's drug-manufacturing rules) a drug product may not be released until laboratory tests have determined conformance to final specifications [2], and 21 CFR 211.194 defines the complete laboratory record — methods, raw data, who tested, who reviewed — that must stand behind every value on the CofA [3]. The LIMS holds that complete record. Our table holds a true copy of the parts the platform needs to contextualize and visualize. The difference between those two is the spine of this entire chapter.

Anatomy of a CofA result: the fields a release decision rides on

The CSV row is the human-readable view; the real artifact that crosses the bridge is one result object from the JSON the LIMS endpoint serves — the smallest unit on which a release decision turns. The mock adapter (services/lims-cofa-adapter/app.py) assembles it field by field for every test in hplc_results.csv, and a real LIMS returns the same shape. It is worth dissecting one of these objects the way the connectivity chapter dissected a historian reading, because each field is there for a regulatory reason, and each one has a destination column. The diagram below takes the SEC_monomer_pct result for BATCH-2026-001 apart.

One result object, dissected: test, value/unit, the spec_low/spec_high window, the PASS/OOS result, analyst, instrument_id, status, and result_ts — and where each one lands once the bridge copies it. Original diagram by the authors, created with AI assistance.

Walk the fields, because the bridge code (bridges/lims_cofa_adapter.py) treats none of them as decoration:

test (SEC_monomer_pct) names the assay and becomes lab.result.test_id; the bridge first registers it in lab.test so the foreign key resolves.
value + unit (98.611, %) are inseparable — a value without its unit is not a result, which is why the schema carries unit on every row rather than assuming it.
spec_low / spec_high (95.0 … 100.0) are the acceptance window. Note where they go: into lab.test, not lab.result. The specification belongs to the test definition, not to a single measurement, and conflating the two is a classic schema smell.
result (PASS) is the verdict the LIMS computed against that window — PASS or OOS. The bridge does not recompute it; recomputing a regulated verdict in the copy is exactly the kind of authorship the platform must not assume.
analyst (j.okafor) and instrument_id (HPLC-07) are the provenance. instrument_id is the thread that ties this scalar back to the ASM original from the same instrument.
status (verified) is the lifecycle flag — preliminary → verified → rejected — and it is the single field the conflict policy reads before deciding whether the row may be touched.
result_ts (2026-01-20T10:15:00Z) is one third of the idempotency key. It is not just when the result was produced; it is part of the row's identity, which is why a correction must arrive as a new result_ts rather than an edit.

The adapter fills analyst, instrument_id, and status deterministically (a small instrument map keyed by test prefix, so SEC/CEX resolve to HPLC-07, HCP/residual_ProteinA to ELISA-02) precisely so the copy carries the provenance a real LIMS would. It also assigns result_ts deterministically — a small per-test offset off a fixed t0, not a real lab clock-time — so re-runs are byte-stable, which is why the HCP row's stamp differs from the SEC row's by minutes rather than reflecting any physical gap. The bridge's sync_cofa function then derives the release sample once per batch — sample_id = f"{batch_id}-DS", the drug-substance pull (the purified bulk mAb after capture and polish, as opposed to the formulated drug product filled into vials) — and maps the rest one-to-one. That mapping is the true-copy discipline: nothing is invented, nothing is dropped, and the destination column for every field is fixed in advance.

A faithful, idempotent CofA-in sync

A real commercial LIMS (LabWare, STARLIMS, SampleManager) is proprietary and license-locked; there is no public image to run on a laptop. So, exactly as we did for AVEVA PI in Chapter 20, we point the bridge at a mock that honours the same REST contract — and we are explicit that the bridge code is real while the counterpart is simulated. The mock ships in the companion repo as services/lims-cofa-adapter and comes up behind the commercial Compose profile (docker compose --profile commercial up -d lims-cofa-adapter); the real bridge client is bridges/lims_cofa_adapter.py, and in production only the base URL and credentials change. The adapter exposes a CofA as JSON — this is the contract a real LIMS CofA endpoint returns:

// GET /api/v1/cofa/BATCH-2026-001  (served by services/lims-cofa-adapter; a real LIMS returns the same shape)
{
  "batch_id": "BATCH-2026-001",
  "lot": "L26001",
  "disposition": "released",
  "results": [
    {"test": "SEC_monomer_pct", "value": 98.611, "unit": "%",
     "spec_low": 95.0, "spec_high": 100.0, "result": "PASS",
     "analyst": "j.okafor", "instrument_id": "HPLC-07", "status": "verified",
     "result_ts": "2026-01-20T10:15:00Z"},
    {"test": "HCP_ng_per_mg", "value": 28.203, "unit": "ng/mg",
     "spec_low": 0.0, "spec_high": 100.0, "result": "PASS",
     "analyst": "j.okafor", "instrument_id": "ELISA-02", "status": "verified",
     "result_ts": "2026-01-20T10:21:00Z"}
  ]
}

The bridge's job is to land those rows in lab.result idempotently, so the sync can run on a schedule and re-run after any failure without creating duplicates or — worse — second copies that disagree. The pattern is a PostgreSQL INSERT ... ON CONFLICT — an upsert (insert-or-update: insert the row if it is new, otherwise update the existing one) — keyed on the unique constraint the schema already defines:

-- examples/bridges/lims_cofa_adapter.py — the heart of the idempotent CofA-in upsert
INSERT INTO lab.result
    (sample_id, test_id, value, unit, result_ts, analyst, instrument_id, status)
VALUES
    (%(sample_id)s, %(test_id)s, %(value)s, %(unit)s,
     %(result_ts)s, %(analyst)s, %(instrument_id)s, %(status)s)
ON CONFLICT (sample_id, test_id, result_ts)
DO UPDATE SET
    value      = EXCLUDED.value,
    status     = EXCLUDED.status,
    analyst    = EXCLUDED.analyst
WHERE lab.result.status <> 'verified';   -- never overwrite a verified record in place

The idempotent upsert, step by step

The bridge does not just fire that one statement; sync_cofa walks each result through a small, deliberate sequence whose whole purpose is to be safe to repeat. The flow below traces it from the scheduler trigger to the guarantee at the bottom — that re-running the sync produces the same database, every time.

Each step earns its place. The two parent registrations — INSERT ... ON CONFLICT (test_id) DO NOTHING for lab.test and the analogous DO NOTHING for lab.sample — exist so the foreign keys resolve without ever erroring on a re-run: the second time the same batch syncs, those rows already exist and the DO NOTHING makes the call a quiet no-op. Only then does the result row go in, keyed on (sample_id, test_id, result_ts). Because every parent insert is idempotent and the result insert is an ON CONFLICT upsert, the entire sync_cofa is idempotent end to end: run it once or run it ten times after a crashed scheduler, and the row count is identical. That is the property the dashed loop arrow stands for.

Reading the conflict-resolution policy line by line

Read that WHERE clause carefully, because it is the conflict-resolution policy in one line, and the three outcomes in the flow above are exactly its three branches. If no row matches the key, the ON CONFLICT does not fire and a fresh row is inserted. If a row matches and it is still preliminary, the WHERE lab.result.status <> 'verified' test is true, so the DO UPDATE refreshes value, status, and analyst to the newer reading. But if the matching row is already verified, the WHERE test is false, the DO UPDATE is silently skipped, and the confirmed row is left exactly as it was. Why refuse it? Because in a GxP (good-practice, e.g. GMP/GLP) world you do not edit a confirmed record — you supersede it, leaving the old one visible. A correction to a verified result must arrive as a new result_ts (a new row), not a quiet in-place change, so that the history of what the lab thought and when is never erased. Chapter 23 will make that append-only discipline structural with system-versioned history tables; here we simply refuse the destructive update.

The richest single instrument record gets archived in its original, vendor-neutral form. The simulator also emits one Allotrope Simple Model (ASM) document — the JSON rendering of an Allotrope Data Model, the format The Analytical Lab: Instruments, LIMS & ELN unpacks layer by layer (the Allotrope stack — AFO ontology, ADM data model, ADF binary cube, ASM JSON) — as examples/datasets/hplc_titer.asm.json, so the raw HPLC titer — beside the sample's SEC monomer purity — survives as a self-describing original — the SDMS role — not just a number stripped of its context:

// examples/datasets/hplc_titer.asm.json (Allotrope Simple Model — the "original" the LIMS value derives from)
{
  "$asm.manifest": "http://purl.allotrope.org/manifests/core/REC/2024/06/manifest.schema",
  "measurement aggregate document": {
    "measurement document": [{
      "sample document": {"batch identifier": "BATCH-2026-001",
                           "sample identifier": "BATCH-2026-001-DS"},
      "device system document": {"device identifier": "HPLC-07",
                                 "model number": "OpenHPLC-1"},
      "measurement identifier": "BATCH-2026-001-titer",
      "protein concentration": {"value": 5.877, "unit": "g/L"},
      "measurement time": "2026-01-20T10:15:00Z"
    }, {
      "sample document": {"batch identifier": "BATCH-2026-001",
                           "sample identifier": "BATCH-2026-001-DS"},
      "device system document": {"device identifier": "HPLC-07",
                                 "model number": "OpenHPLC-1"},
      "measurement identifier": "BATCH-2026-001-sec-monomer",
      "monomer percentage": {"value": 98.611, "unit": "%"},
      "measurement time": "2026-01-20T11:30:00Z"
    }]
  }
}

The CofA value and this ASM document share a batch identifier, an instrument_id/device identifier (HPLC-07), and a timestamp — so the platform can always walk from a released number back to the raw measurement it came from. That traceable thread is what a regulator means by reconstructable.

Be honest about one thing, though: this ASM file is a processed original. The number 5.877 g/L is what the integration software computed from a peak (the SEC monomer % beside it from a separate size-exclusion trace); the SDMS's deeper job is to also keep the unprocessed instrument file — the full chromatogram or mass spectrum that the titer was integrated from — so that years later someone can re-integrate it with a corrected method. And that raw file is almost never a tidy CSV: it is heterogeneous, vendor-specific binary (a Waters MassLynx .raw directory, an ÄKTA/UNICORN .res or zip-of-XML — representative industry shapes, not a fixed inventory). You cannot diff it, query it, or trust it to outlive its vendor's software. That is exactly why a vendor-neutral export alongside the binary matters — mzML for mass spectra, the ASTM ANDI/NetCDF (.cdf) chromatography format the downstream-capture chapter (Chapter 13) recommends, or Allotrope's HDF5-based ADF for dense spectra and curves. The SDMS keeps the binary for fidelity and the open export for survival; the ASM JSON above is the thin, processed summary the LIMS value derives from. Those three are the SDMS's three layers again, and they map onto the Allotrope stack precisely: the binary is the vendor's own, the ADF cube (or mzML/NetCDF) is where the dense array belongs, and the ASM JSON here is the scalar rendering of the same Allotrope Data Model — the full AFO/ADM/ADF/ASM picture being the analytical-lab chapter's subject. The bridge in this chapter only ever copies that top, scalar layer; it never pretends to be the cube or the binary beneath it.

The shape of the bridge, end to end

Putting it together, the flow is small and deliberate. The diagram below shows where the decision lives versus where our copy lives.

Diagram showing release-test data flowing from instruments into a commercial LIMS that issues a signed Certificate of Analysis; a one-way verified true-copy sync over REST/JSON lands those results in the OSS lab.result table linked to the ISA-88/95 batch, while SENAITE and openBIS occupy the QC and process-development slots.

The LIMS adjudicates and signs; the open-source stack keeps a faithful, read-mostly true copy joined to the batch. The arrow of authority points one way. Original diagram by the authors, created with AI assistance.

The single most important property of this picture is the direction of the authority arrow. Data flows into our stack; the decision never flows out of it. The MHRA's (the UK drug regulator's) data-integrity guidance is blunt about the trap we are avoiding: a copy of a regulated record is acceptable only as a verified true copy, and a parallel record that is treated as authoritative without the original's controls is a shadow record — a finding, not a feature [4]. Our status column, our refusal to overwrite verified rows, and our preservation of the ASM original are precisely the controls that keep the copy honest.

What the field record shows: when the copy becomes a shadow

The "shadow record is a finding, not a feature" line is not rhetoric; it is the recurring shape of real inspection outcomes. The FDA's Data Integrity and Compliance With Drug CGMP guidance answers the question directly: handling of out-of-specification results and reliance on uncontrolled records kept outside the validated system — the classic example is the analyst's private spreadsheet shadowing the LIMS — are among the data-integrity failures the agency cites most, because they let a true value be quietly hidden, overwritten, or contradicted [12]. Two failure modes recur, and the bridge is built to make both impossible.

The lossy failure is dropping or softening an OOS result. The campaign's BATCH-2026-004 is the deliberate test case: its HCP_ng_per_mg reads 128.0 against a 100.0 ceiling, so the result is OOS and the lot disposition is rejected. The companion repo asserts this end to end — tests/test_bridges.py::test_lims_mock_propagates_oos_lot checks that the adapter returns disposition == "rejected" and that the HCP result surfaces as result == "OOS" with value == 128.0, never silently rounded to passing. A bridge that hid that one number would be hiding a failed lot.

The authoritative failure is the subtler one, and it is the field-level danger the anatomy card was drawn to expose. If the bridge ever overwrote a verified row in place — editing a confirmed value instead of superseding it with a new result_ts — the platform's copy would diverge from the LIMS original and present itself as authoritative, which is the textbook definition of a shadow record. The status field on every result object, read by the one-line WHERE clause, is the guardrail: a verified row is immutable in the copy, so the copy can only ever lag the original or match it, never silently contradict it. The companion sibling test test_lims_mock_serves_real_release_dataset pins the agreement the other way — it asserts the mock's SEC_monomer_pct value is 98.611 with status == "verified", matching hplc_results.csv to the digit, so the copy and the "LIMS original" cannot drift apart unnoticed.

The honest open-source LIMS landscape

If you do not have a commercial LIMS, can open source fill the slot? Partly — and the honest answer depends on which slot.

SENAITE is the strongest open-source fit for QC and release testing, and it earns the slot because its object model is the LIMS workflow. Built on the Plone/Zope content-management stack, it registers a sample as an AnalysisRequest carrying a set of Analyses, and walks each through the lifecycle that matters — received → submitted → verified → published — with a configurable rule that the verifier may not be the submitter (the four-eyes control this whole layer turns on). It speaks REST: senaite.jsonapi exposes create/read/search/update endpoints, so our stack registers samples as AnalysisRequests and pulls back only the verified analyses over plain HTTP/JSON [5] [6]. But be precise about maturity: SENAITE is published under the GPL v2.0 and is not a Part 11-compliant system out of the box — the recurring "GxP last mile" items — a configured tamper-evident audit trail (every create/change recorded with who, what, when, and why), e-signatures applied at the moment of signing, validated controls over how long records are retained, a documented password policy, and an IQ/OQ/PQ package (Installation, Operational, and Performance Qualification — the documented proof that the system was installed, runs, and performs to spec) — are the configuration, procedure, and validation work that 21 CFR Part 11 requires and that the operator owns. To teach the shape of that gap, the repo ships an illustrative gap register at examples/compliance/gap-analyses/senaite-part11-gap.md and treats SENAITE as a teaching LIMS, not a compliant one out of the box. It is an excellent QC backbone; it is not a download that satisfies Part 11.

openBIS, from ETH Zurich's Scientific IT Services, fits the process-development and R&D slot — and it is really an ELN-and-LIMS hybrid, which is why it sits in a different place from SENAITE. Its data model is a hierarchy — Space → Project → Experiment/Collection → Object (sample) → DataSet — so a PD scientist registers what they did and attaches the files it produced, with rich, typed metadata on every node. It is programmatically friendly through its V3 API and the pyBIS Python client, so PD samples and datasets can be registered and pulled with a few lines of Python [7]. It is Apache-2.0 licensed and the right tool when the question is "what did we try and what happened," not "is this lot releasable."

LabKey deserves a clear-eyed flag because its marketing blurs the line. It is a capable assay-and-sample-management platform — a Java server over a relational database, with an assay framework and sample-type registry — and the Community Edition is genuinely free and open. But the features you would actually need for a release LIMS — electronic signatures designed to comply with 21 CFR Part 11 — are explicitly labelled a Premium feature, available only in the Enterprise Edition [8], and the editions page confirms that compliance capabilities require a paid licence [9]. This is the recurring theme of the whole book in miniature: pure OSS gets you the data model and the workflow; the regulated last mile — signatures, validated audit trail, vendor accountability — is paywalled or hybrid. Saying so plainly is more useful than pretending otherwise.

Where the copied result goes next: the graph, the gate, and the model

The lab.result row this bridge lands is not a dead end — it is the input two later disciplines build on, and the true-copy discipline is what makes both of them honest rather than hazardous.

As a triple and a shape. The same CofA result that crosses the bridge as a JSON object is also, in Semantics & the Digital Thread, one RDF triple — bp:BATCH-2026-001 bp:monomerPct "98.611"^^xsd:float — hung on the batch node so a single SPARQL walk can ask "what did this lot derive from, and what was its quality result?" in one statement. And the spec_low/spec_high window the bridge files into lab.test is exactly the release rule Book 4 makes executable: its release gate models the very same panel — monomer at or above 95 %, HMW at or below 2 %, CEX-main in its window, HCP at or below 100 ppm, plus a controlled releaseStatus and an attributable signature — as a SHACL (Shapes Constraint Language) bp:ReleaseShape. The mapping is one-to-one and worth seeing as a competency question the vocabulary must answer: does every released lot carry exactly one in-range value for every required CQA? SHACL answers it closed-world, where a missing required result is a failure now, not an open question — which is precisely the lossy failure mode this chapter's status lifecycle exists to prevent. The bridge guarantees the copy is faithful; the SHACL gate then proves it is complete and in range. One caution carries straight over from that chapter: SHACL checks completeness, not correctness — a plausible in-range value filed against the wrong vial passes the gate cleanly, so the bridge's refusal to overwrite a verified row, and the data-integrity discipline it protects, are what keep the copy correct, not just well-formed.

As a feature for a model. A governed, status-aware release dataset is also the raw material a release-prediction or soft-sensor model learns from — and the very columns this bridge is careful to copy are the ones that decide whether that model is trustworthy or fantasy. Two disciplines from Book 5 hang directly off them. First, leakage-free validation: the lab.sample.batch_id foreign key is the grouping key a model must split on, because sibling lots off one cell bank are near-twins, not independent rows — a row-wise random train/test split lets a near-twin land on both sides of the line and reports a fantasy score, which is why the models-and-validation chapter makes a grouped, leave-one-batch-out split (scikit-learn's GroupKFold) the default. Second, applicability domain and drift: a model should decline to guess on a lot unlike anything it trained on, and once deployed it must distinguish genuine process drift (the living cells wandering campaign to campaign — a real signal the thread should preserve) from model drift (the predictor going stale against that moving process — a defect to detect), the split the MLOps chapter builds detectors for. None of that works on a shadow record: a model trained on a copy that silently rounded an OOS to passing, or that elevated a preliminary value to fact, has learned a lie. The true-copy status discipline is therefore not just a regulatory nicety — it is the precondition for any model downstream of this table to be honest about what it knows.

Why it matters

A bioprocess data platform that cannot answer "did this lot pass?" is a curiosity, not a system. But the release decision is the one fact the platform must never author. Get the bridge wrong in the lossy direction and you hide an OOS result; get it wrong in the authoritative direction and you create a shadow record that a regulator can cite. The schema choices in this chapter — provenance columns, a status lifecycle, a unique key for idempotency, and a conflict policy that supersedes rather than overwrites — are not database trivia. They are the difference between a faithful copy and a liability. They let the platform do what it is good at (contextualize, visualize, analyze the released number against the whole batch) while leaving authorship where it legally belongs.

In the real world

In a real fed-batch CHO + Protein A facility (a typical antibody plant: CHO = Chinese-hamster-ovary cells grown in fed-batch mode, with a Protein A capture step), the LIMS sits at the centre of QC and the OSS platform orbits it. The integration is rarely glamorous: a nightly REST pull or a watched CofA file drop, an upsert keyed on sample and test, and an alert when a batch flips to OOS. The intensified/continuous variant — perfusion with multi-column capture — only raises the stakes, because near-continuous harvest means near-continuous sampling, and the sync cadence shifts from per-batch to per-shift. And it is not only the sampling frequency that shifts: under continuous capture a "lot" is itself redefined as a pooled-harvest time window rather than a discrete tank, so the change in what a lot is — not just how often you sample it — is what intensification really demands of the bridge. The pattern does not change; the frequency does.

These integration seams are where biomanufacturing loses time and trust, and the goal is to prove that a modern data platform can interoperate with the validated LIMS that QC actually runs, rather than replace it. GAMP 5 (Good Automated Manufacturing Practice, ISPE's industry guidance on validating computerized systems), second edition, frames the risk picture cleanly: the LIMS that holds the release decision is a higher-risk system of record demanding rigorous assurance, while a read-mostly integration layer is risk-assessed and validated proportionately to its lighter intended use [10]. And the regulatory anchor never moves: wherever the LIMS holds the release record, its electronic records and signatures must meet 21 CFR Part 11, and the OSS layer must not become an uncontrolled parallel record alongside it [11]. Honest verdict: open source gives you a superb QC and PD data model and a clean exchange contract; it does not give you a validated, Part 11-signed release system. That last mile is hybrid, and pretending otherwise is exactly the mistake this book refuses to make.

Key terms

LIMS (Laboratory Information Management System): sample-centric software that registers samples, schedules tests, captures results against specs, and drives release; usually the system of record for the release decision.
ELN (Electronic Lab Notebook): experiment-centric software recording what a scientist did and why.
SDMS (Scientific Data Management System): file-centric archive for raw instrument output kept as the untouched original — both the unprocessed vendor binary (e.g. Waters .raw, ÄKTA/UNICORN .res) for later reprocessing and a vendor-neutral export (mzML, ANDI/NetCDF, ADF) for survival, distinct from the processed number a LIMS records.
CofA (Certificate of Analysis): the regulated document summarizing a lot's release-test results against specifications, with a pass/fail disposition.
OOS (Out Of Specification): a result outside its specification window; in the campaign, BATCH-2026-004's HCP value, which fails the lot.
True copy / shadow record: a verified faithful copy of a regulated original (acceptable) versus an uncontrolled parallel record treated as authoritative (a data-integrity finding).
Idempotent sync: an import that can be re-run safely without duplicating or corrupting rows, here enforced by ON CONFLICT on (sample_id, test_id, result_ts).
Idempotency key: the column tuple that defines a row's identity for re-sends — here (sample_id, test_id, result_ts); because result_ts is part of it, a correction is a new row, not an edit of the old one.
Conflict-resolution policy: the rule an upsert applies when a row already exists — here the one-line WHERE lab.result.status <> 'verified', which lets a preliminary row be refreshed but refuses to overwrite a verified one in place.
ASM (Allotrope Simple Model): the JSON rendering of an Allotrope Data Model — a vendor-neutral format for analytical measurements (unpacked in full, alongside AFO/ADM/ADF, in the analytical-lab chapter) — here keeping the processed HPLC original self-describing. It is the top of the SDMS's three layers, not the binary beneath it.
cGMP: current Good Manufacturing Practice, the FDA's enforceable quality framework for drug manufacturing.
GxP: the umbrella for the "good practice" regulations governing regulated manufacturing and labs (GMP, GLP, GCP, etc.); cGMP is the manufacturing member of this family.
CQA (Critical Quality Attribute): a measurable product property (e.g. potency, aggregation, endotoxin) that must stay within limits to keep the product safe and effective, which is why it is release-critical.
Upsert: an insert-or-update operation — insert the row if it is new, otherwise update the existing one; here implemented as PostgreSQL INSERT ... ON CONFLICT.
SHACL (Shapes Constraint Language): the W3C standard for validating an RDF graph against shape constraints; the release spec this bridge copies becomes a bp:ReleaseShape that checks each lot's CQA panel is present, in range, and signed — closed-world, so a missing required result is a failure, not an open question. It checks completeness, not correctness.
Grouped (leave-one-batch-out) split: the validation discipline of putting every record from a batch wholly on one side of the train/test line, keyed on lab.sample.batch_id, so a model is scored on genuinely unseen lots rather than on near-twin siblings off the same cell bank.
Process drift vs. model drift: process drift is the living cells genuinely wandering campaign to campaign (a real signal to preserve); model drift is a deployed predictor going stale against that moving process (a defect to detect) — conflating them is how monitoring either cries wolf or misses a real shift.

Where this leads

We have been careful all chapter to supersede rather than overwrite, to mark a record verified, and to keep the original measurement traceable — but so far that discipline has lived in our habits and a single WHERE clause. The next chapter, ALCOA+ by Construction: Integrity in Code, makes it structural: append-only patterns, PostgreSQL trigger-based system-versioned history tables, and a hash-chain over the audit log, with a test suite that asserts the guarantees. We stop promising the copy is faithful and start proving it.

What this chapter covers​

Three letters that are not the same: LIMS, ELN, SDMS​

The OSS stack already has a lab schema​

What a Certificate of Analysis actually looks like​

Anatomy of a CofA result: the fields a release decision rides on​

A faithful, idempotent CofA-in sync​

The idempotent upsert, step by step​

Reading the conflict-resolution policy line by line​

The shape of the bridge, end to end​

What the field record shows: when the copy becomes a shadow​

The honest open-source LIMS landscape​

Where the copied result goes next: the graph, the gate, and the model​

Why it matters​

In the real world​

Key terms​

Where this leads​