The Analytical Lab: Instruments, LIMS & ELN
๐ Where we are: Part II, "Capturing the Process." We have captured everything the production floor produces โ every in-line tag, every chromatography phase, every pooling decision. Now we leave the floor for the QC lab, where the molecule's quality is finally judged, and learn to capture the data that decides whether a batch is released or rejected.
The bioreactor and the skids are like a kitchen: they tell you the oven was at 180 ยฐC and the timer ran for 40 minutes. The analytical lab is the food critic. It does not care what the oven said โ it tastes the cake and writes a verdict. That verdict (Is it pure? Is it the right antibody? Is it safe to inject?) is the highest-stakes data in the whole plant, because it is the data that lets you ship a medicine. So the lab's job, and this chapter's, is to capture each verdict with an iron-clad answer to one question: who measured this, on which instrument, against which spec, and can we prove nobody quietly changed it afterward?
What this chapter coversโ
The production floor measures the process; the QC lab measures the product. After Protein A capture, a sample of drug substance goes to the lab, where a battery of instruments answers the release questions: how pure (size-exclusion HPLC), how correctly charged (cation-exchange HPLC), how much host-cell protein and DNA contaminate it, how much endotoxin. These results, plus the daily at-line samples taken throughout the run, are the certificate of analysis โ the thing an inspector reads first.
This chapter shows how to capture that data in open source:
- The deterministic offline / at-line assays and the HPLC release panel the simulator produces, and the
lab.sample/lab.test/lab.resultmodel that holds them. - Getting data off the instruments: OPC UA LADS device servers, SiLA 2, and the vendor-neutral analytical formats AnIML and Allotrope ASM/AFO.
- An open-source LIMS (SENAITE) for sample login, worksheets, and verified results, and an ELN (eLabFTW) for method and experiment provenance with cryptographic e-signatures.
- Pulling verified results back into the batch record โ and being brutally honest about the Part 11 gaps that mean none of these tools is compliant out of the box.
Every number below comes from a file you can regenerate byte-for-byte with SIM_SEED=2026.
Two kinds of lab data: at-line and releaseโ
The lab produces two distinct streams, and they have different rhythms.
The first is at-line / offline process monitoring: twice a day, an operator pulls a few millilitres from the bioreactor and runs it through a cell counter, a metabolite analyzer, and an osmometer. These tell you how the culture is doing right now โ viable cell density, viability, glucose, lactate, ammonia. They are the offline twins of the in-line tags, and Chapter 8's whole job was reconciling the two. The companion repo generates them from the same kinetic state the in-line trace comes from, so a bench number agrees with the online curve โ just noisier and sparser.
From examples/sim/bioproc_sim/offline_assays.py, the sampling cadence and the measurement model are explicit:
# examples/sim/bioproc_sim/offline_assays.py
def sample(result: BatchResult | None = None, batch_id: str = "BATCH-2026-001") -> pd.DataFrame:
"""Two offline samples per day from the fed-batch state, with assay noise + LoD."""
if result is None:
result = simulate(batch_id)
s = result.state
rng = stream_rng("offline_assays", result.batch_id)
minutes = []
day = 0.0
while day <= 14.0 + 1e-9:
for frac in (0.25, 0.75): # ~06:00 and ~18:00
m = int(round((day + frac) * 1440))
if m < len(s):
minutes.append(m)
day += 1.0
Twice a day (around 06:00 and 18:00) over a 14-day fed batch gives 28 in-process samples. Each value is the true kinetic state plus a small, assay-specific noise term โ a VCD read is drawn as Xv ร (1 + N(0, 0.05)), viability as state + N(0, 1.2) โ which is exactly how a bench instrument differs from a sensor: same truth, a little measurement scatter on top.
Run python -m bioproc_sim.offline_assays and the first committed rows of datasets/offline_assays.csv look like this โ a wide, tidy table, one row per sample:
sample_id,batch_id,sample_time,sample_point,VCD_e6_per_mL,viability_pct,glucose_g_L,lactate_g_L,glutamine_mM,ammonia_mM,osmolality_mOsm_kg,titer_g_L,pH_offline
BATCH-2026-001-OFF-001,BATCH-2026-001,2026-01-05 06:00:00+00:00,BR101,0.34,96.6,6.18,0.13,4.13,0.68,293,0.002,7.06
BATCH-2026-001-OFF-002,BATCH-2026-001,2026-01-05 18:00:00+00:00,BR101,0.43,96.6,6.26,0.19,4.31,0.38,292,0.008,7.04
BATCH-2026-001-OFF-003,BATCH-2026-001,2026-01-06 06:00:00+00:00,BR101,0.56,99.0,6.01,0.32,3.83,0.45,287,0.014,7.05
The second stream is release testing: once the drug substance exists, the QC lab runs the panel that decides whether it can be released. This is the high-stakes data. From the same module, the release specs are coded as a table of (name, low, high, unit, target, sd):
# examples/sim/bioproc_sim/offline_assays.py
_RELEASE_SPECS = [
("SEC_monomer_pct", 95.0, 100.0, "%", 98.5, 0.4),
("SEC_HMW_pct", 0.0, 3.0, "%", 1.1, 0.3),
("CEX_main_pct", 60.0, 80.0, "%", 70.0, 2.0),
("HCP_ng_per_mg", 0.0, 100.0, "ng/mg", 22.0, 8.0),
("residual_ProteinA_ng_per_mg", 0.0, 20.0, "ng/mg", 4.0, 1.5),
("host_cell_DNA_ng_per_dose", 0.0, 10.0, "ng/dose", 1.2, 0.5),
("endotoxin_EU_per_mL", 0.0, 5.0, "EU/mL", 0.3, 0.15),
# ... bioburden, SEC_LMW, CEX_acidic, CEX_basic
]
Each test draws a value around its target and flags PASS or OOS (out of specification) against the limits:
# examples/sim/bioproc_sim/offline_assays.py
val = target + (rng.normal(0, sd) if sd > 0 else 0.0)
val = float(np.clip(val, low, high))
rows.append({
"batch_id": bid, "test": name, "value": round(val, 3), "unit": unit,
"spec_low": low, "spec_high": high,
"result": "PASS" if low <= val <= high else "OOS",
})
Eleven tests per batch, six batches in the golden campaign โ 66 rows in all. Look at datasets/hplc_results.csv and the simulator has planted exactly one deliberate failure โ the kind of thing the rest of the trilogy's governance machinery exists to catch:
batch_id,test,value,unit,spec_low,spec_high,result
BATCH-2026-001,SEC_monomer_pct,98.611,%,95.0,100.0,PASS
BATCH-2026-001,HCP_ng_per_mg,28.203,ng/mg,0.0,100.0,PASS
...
BATCH-2026-004,HCP_ng_per_mg,128.0,ng/mg,0.0,100.0,OOS
BATCH-2026-004 has a host-cell-protein result of 128 ng/mg against a 100 ng/mg limit โ a single number that should freeze that batch and open an investigation. The whole reason we are so careful about how this number is stored is that, when it is an OOS, it must be tamper-evident, attributable, and impossible to quietly "fix." FDA's data-integrity guidance is explicit that QC release data carries the strongest audit-trail and quality-unit-review expectations [12], and 21 CFR Part 11 sets the electronic-record and electronic-signature bar those results must meet [13].
The lab data model: sample โ test โ resultโ
All of this lands in three tables that are reused by every later chapter. From examples/platform/db/30-lab-events.sql:
-- examples/platform/db/30-lab-events.sql
CREATE TABLE lab.sample (
sample_id text PRIMARY KEY,
batch_id text REFERENCES s88.batch,
sample_time timestamptz NOT NULL,
sample_point text NOT NULL,
sample_type text NOT NULL DEFAULT 'in_process' -- in_process | release | stability
);
CREATE TABLE lab.test (
test_id text PRIMARY KEY,
name text NOT NULL,
unit text,
spec_low numeric,
spec_high numeric
);
CREATE TABLE lab.result (
result_id bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
sample_id text NOT NULL REFERENCES lab.sample,
test_id text REFERENCES lab.test,
value numeric,
text_value text,
unit text,
result_ts timestamptz NOT NULL DEFAULT now(),
analyst text,
instrument_id text,
status text NOT NULL DEFAULT 'preliminary', -- preliminary | verified | rejected
UNIQUE (sample_id, test_id, result_ts)
);
CREATE INDEX ON lab.result (sample_id);
Three columns carry most of the regulatory weight. sample.batch_id is a foreign key straight into the ISA-95 batch table, so every result is permanently bonded to the lot it judges โ the attributability backbone. result.analyst and result.instrument_id answer "who and on what." And result.status encodes the lab workflow itself: a result is born preliminary, becomes verified when a second qualified person reviews it (the four-eyes principle), and can be rejected. A preliminary result is not release data; only a verified one is. The UNIQUE (sample_id, test_id, result_ts) constraint means you never silently overwrite a result โ a re-test is a new row with a new timestamp, never an edit, which is how the audit trail stays honest.
From instrument to batch record. Devices speak LADS / SiLA / AnIML / ASM; SENAITE owns sample login and the preliminaryโverified transition; eLabFTW signs and timestamps the method record; only verified results cross into PostgreSQL and the batch record. The red notes mark where pure OSS does not yet meet Part 11.
Original diagram by the authors, created with AI assistance.
Getting data off the instrumentsโ
The hard, unglamorous truth of lab integration is that most analytical instruments are islands. The HPLC has its own chromatography data system; the plate reader exports a proprietary blob; the cell counter prints a PDF. The whole point of the standards below is to stop re-typing numbers.
The newest and most promising is OPC UA LADS โ the Laboratory and Analytical Device Standard, OPC 30500, published in 2023 by the OPC Foundation with SPECTARIS and VDMA. It gives lab instruments the same self-describing OPC UA address space the bioreactor skid uses, with a device-type-agnostic information model split into a Hardware view and a Functional view, so a titrator and an HPLC expose results through the same shape [1]. The peer-reviewed design rationale is worth reading: LADS exists precisely because networked labs needed one model instead of dozens of drivers [2]. A LADS server can attach results as Allotrope documents, and there is an older, lighter-weight sibling โ SiLA 2 for commanding lab devices (start a run, read a result) [6], a self-describing, gRPC/HTTP2-based standard whose Feature Definition Language lets a client discover a device's capabilities at runtime [7]. In practice LADS and SiLA are complementary: SiLA to drive the instrument, LADS/OPC UA to publish into the plant data fabric.
A minimal LADS-style result node, as the repo's illustrative examples/ingest/lads_server.js sketches it, looks like this. The file is committed as a teaching sketch, modelled on the LADS information model โ it is not a runnable, certified LADS server (the repo ships the asyncua OPC UA bioreactor server for Chapters 5/7 as its real OPC UA stack):
// examples/ingest/lads_server.js โ illustrative LADS-shaped result node (not a certified LADS server)
const fnSet = addObject(device, "FunctionalUnitSet");
const hplc = addFunctionalUnit(fnSet, "HPLC_Titer");
addAnalogResult(hplc, {
name: "ProteinConcentration",
value: 5.877, unit: "g/L", // QUDT-mapped engineering unit
sampleId: "BATCH-2026-001-DS",
method: "SOP-AT-HPLC-001",
measuredAt: "2026-01-20T10:15:00Z"
});
Vendor-neutral result files: AnIML and Allotropeโ
Whether or not an instrument speaks LADS, you still want to archive its output in a format you can read in twenty years without the original vendor's software. Two open standards do this, and the repo ships one example of each for the same HPLC titer measurement.
AnIML (Analytical Information Markup Language) is the older, ASTM-governed (subcommittee E13.15) XML format. Its design goal from the start was a vendor-neutral document with explicit sample, method, audit-trail, and signature sections [4], built as a generic core plus per-technique definitions [5]. The committed datasets/hplc_titer.animl.xml is a deliberately minimal but valid-shaped example:
<!-- examples/datasets/hplc_titer.animl.xml -->
<AnIML xmlns="urn:org:astm:animl:schema:core:draft:0.90" version="0.90">
<SampleSet>
<Sample sampleID="BATCH-2026-001-DS" name="Drug Substance"/>
</SampleSet>
<ExperimentStepSet>
<ExperimentStep experimentStepID="titer-hplc" name="Protein A HPLC titer">
<Result name="Titer">
<SeriesSet name="titer" length="1">
<Series name="concentration" dependency="dependent" seriesID="c" seriesType="Float32">
<IndividualValueSet><F>5.877</F></IndividualValueSet>
<Unit label="g/L"/>
</Series>
</SeriesSet>
</Result>
<Method name="SOP-AT-HPLC-001"/>
</ExperimentStep>
</ExperimentStepSet>
</AnIML>
The newer, JSON-native option is the Allotrope Simple Model (ASM) โ a JSON representation of an Allotrope Data Model that uses the Allotrope Foundation Ontology (AFO) controlled vocabulary, so the meaning of each field is machine-actionable, not just its name [3]. The same titer measurement as datasets/hplc_titer.asm.json:
{
"$asm.manifest": "http://purl.allotrope.org/manifests/core/REC/2024/06/manifest.schema",
"measurement aggregate document": {
"measurement document": [
{
"sample document": {
"batch identifier": "BATCH-2026-001",
"sample identifier": "BATCH-2026-001-DS"
},
"device system document": {
"device identifier": "HPLC-07",
"model number": "OpenHPLC-1"
},
"measurement identifier": "BATCH-2026-001-titer",
"protein concentration": { "value": 5.877, "unit": "g/L" },
"measurement time": "2026-01-20T10:15:00Z"
}
]
}
}
Notice both files agree on 5.877 g/L for sample BATCH-2026-001-DS โ that is the point. The titer the lab measures is, within assay noise, the eluate titer the chromatography chapter computed. The standard you choose is mostly a question of which downstream tool you feed: AnIML for archival and ASTM-aligned regulatory packages, ASM for FAIR data lakes and ontology-driven querying. Either way, the field-level units map to QUDT so the knowledge graph in Chapter 16 can reason over them.
SENAITE: an open-source LIMS for the workflowโ
A pile of result files is not a lab. A LIMS (Laboratory Information Management System) is the system that logs the sample, assigns the tests, captures the analyst's result, and โ crucially โ runs the verification workflow that turns a preliminary number into a released one. The open-source LIMS this book uses is SENAITE, an enterprise LIMS built on the Plone/Zope stack and licensed GPL-2.0 [8]. The companion stack runs it behind the lab profile (senaite/senaite:2.6.0); be warned that its first boot takes minutes because Plone bootstraps a lot.
The integration pattern is API-first. SENAITE ships a JSON REST API, so the repo's illustrative examples/ingest/senaite_import.py sketch registers a sample against the batch and posts the at-line results, then later reads back only the verified ones (the route names and POST body are faithful to the real senaite.jsonapi; the surrounding orchestration assumes a configured lab profile):
# examples/ingest/senaite_import.py โ register sample + push results via the SENAITE REST API
import requests
S = requests.Session()
S.auth = ("lab_importer", PASSWORD) # service account, not a person
base = "http://senaite:8080/senaite/@@API/senaite/v1" # /<plone-site-id>/@@API/...
# 1) create the analysis request (sample login) bonded to the batch
ar = S.post(f"{base}/create", json={
"portal_type": "AnalysisRequest",
"Client": "uid-of-internal-qc",
"SampleType": "drug-substance",
"ClientSampleID": "BATCH-2026-001-DS",
"Analyses": ["SEC_monomer_pct", "HCP_ng_per_mg", "endotoxin_EU_per_mL"],
}).json()
# 2) submit a result for one analysis (still 'preliminary' until verified)
S.post(f"{base}/update", json={
"uid": ar["items"][0]["Analyses"][0]["uid"],
"Result": "98.611",
})
The workflow that follows โ submit โ verify โ publish โ is SENAITE's reason for existing. An analyst submits; a second qualified user verifies (SENAITE can enforce that the verifier is not the submitter); only then is the result publishable. Pulling that verified result into our PostgreSQL lab.result table is then a small, careful sync that refuses to import anything not yet verified:
# examples/ingest/senaite_import.py โ only verified results cross into the system of record
for item in S.get(f"{base}/search",
params={"portal_type": "Analysis",
"review_state": "verified"}).json()["items"]:
db.execute(
"INSERT INTO lab.result (sample_id, test_id, value, unit, analyst, "
"instrument_id, status) VALUES (%s, %s, %s, %s, %s, %s, 'verified') "
"ON CONFLICT (sample_id, test_id, result_ts) DO NOTHING",
(item["ClientSampleID"], item["getKeyword"], item["Result"],
item["Unit"], item["getAnalyst"], item["Instrument"]))
This is the gate that keeps preliminary lab noise out of the batch record. The review_state=verified filter is the whole control in one line.
eLabFTW: the ELN for method and experiment provenanceโ
A LIMS records results; an ELN (Electronic Lab Notebook) records how you got them โ the method, the deviations, the reasoning, the analyst's signed statement that "I ran SOP-AT-HPLC-001 on instrument HPLC-07 on this date." The book uses eLabFTW, licensed AGPL-3.0 (elabftw/elabimg:5.1.15, with a MySQL sidecar), run standalone over the network so its copyleft imposes nothing on your own code.
eLabFTW's standout features for a regulated lab are cryptographic: it can apply an Ed25519ph electronic signature (the pre-hashed Ed25519 variant eLabFTW uses) over an experiment record and an RFC 3161 trusted timestamp [10]. RFC 3161 is the IETF time-stamp protocol where a trusted Timestamping Authority returns a TimeStampToken over a hash of your document โ so you can later prove the content existed, unchanged, at that instant, without ever sending the content to the TSA [11]. The ingest pattern is again REST-first, sketched in the repo's illustrative examples/ingest/elabftw_ingest.py:
# examples/ingest/elabftw_ingest.py โ sign + timestamp the method record via the eLabFTW API
import elabapi_python
cfg = elabapi_python.Configuration()
cfg.api_key = {"api_key": ELAB_TOKEN}
cfg.host = "https://elabftw/api/v2"
api = elabapi_python.ExperimentsApi(elabapi_python.ApiClient(cfg))
# attach the AnIML/ASM result files to the experiment, then sign + timestamp it
api.post_experiment(body={"title": "HPLC titer โ BATCH-2026-001-DS",
"category": "release-testing"})
# the signature (Ed25519ph) and RFC 3161 token are applied through the UI/API
# and lock the entry; later edits create a new, separately signed version.
Once signed and timestamped, the entry locks; any later change creates a new version with its own signature, so the history is append-only.
Why it mattersโ
The lab is where a batch lives or dies. Every other chapter captures data about the process; this chapter captures the verdict on the product. That 128 ng/mg HCP result on BATCH-2026-004 is the difference between a released lot and a quarter-million-dollar write-off โ and if it could be silently edited, the whole quality system would be a fiction. So the controls here are not bureaucratic decoration. The preliminary โ verified status, the second-person verification, the immutable result rows, the signed-and-timestamped method record: each one exists to make a single sentence defensible to an inspector โ "this result was measured by this analyst, on this qualified instrument, against this validated method, and has not changed since." Get that sentence right and the batch record is trustworthy; get it wrong and nothing downstream matters.
In the real worldโ
In a commercial QC lab the system of record is almost always a validated commercial LIMS โ LabWare, STARLIMS, or Thermo SampleManager โ wired to a chromatography data system like Empower or OpenLab, with the instruments integrated through vendor drivers or, increasingly, SiLA/LADS. Our OSS stack does not pretend to replace those; it shows you the same shapes โ sample login, verification workflow, vendor-neutral result files, signed method records โ so the integration patterns transfer.
A few honest anchors for 2026:
- NIIMBL, the U.S. public-private Manufacturing USA institute for biopharmaceutical innovation, has funded analytical-method and data-standardization work precisely because lab-data interoperability is a recognised bottleneck; its SABRE facility (the NIIMBL / University of Delaware pilot-scale cGMP โ current Good Manufacturing Practice โ facility that broke ground in April 2024) is being built as a place to run next-generation processes, with QC and PAT data feeding the kind of platform this chapter sketches. SABRE is a facility, not a data program.
- LADS is genuinely new. OPC 30500 is from 2023; certified server implementations are still emerging in 2026, so on a real floor you will meet far more SiLA 2, plain OPC UA, and proprietary drivers than mature LADS servers. The standard is the right direction; it is not yet the default reality.
The honest OSS-vs-commercial verdict for this layer. Open source genuinely covers the mechanics: SENAITE runs a complete sample-login-to-verification workflow, eLabFTW signs and timestamps records, and AnIML/ASM give you durable, vendor-neutral data. But neither tool is 21 CFR Part 11 compliant out of the box, and the book is explicit about it. SENAITE's only published Part 11 gap analysis is from 2019 (against v1.3.2) and lists real, unclosed gaps โ electronic-signature controls, retention, and password/access controls all needing configuration or hardening [9]; the repo ships that gap list under /compliance/gap-analyses and treats SENAITE as a teaching LIMS, not a compliant one. eLabFTW's own documentation says the same thing in plainer words: it provides the cryptographic primitives, but compliance depends on how you configure, validate, and operate it [10]. The strong audit-trail and review expectations on release data are not optional [12], and Part 11 sets the bar these systems must clear [13]. Pure OSS gets you the workflow and the data shapes โ perhaps 80% of the way. The validated e-signature, the locked-down access control, the supplier accountability, and the formal IQ/OQ/PQ are the GxP last mile, and we build that hybrid honestly in Part V.
Key termsโ
- LIMS โ Laboratory Information Management System; manages sample login, test assignment, results, and the verification workflow (here, SENAITE).
- ELN โ Electronic Lab Notebook; records methods, experiments, and reasoning, with signatures (here, eLabFTW).
- At-line / offline assay โ a sample pulled from the process and measured on a bench instrument (VCD, viability, metabolites); the offline twin of an in-line tag.
- Release testing โ the QC panel (SEC/CEX HPLC, HCP, host-cell DNA, endotoxin, bioburden) that decides whether a batch can be released.
- OOS โ Out Of Specification; a result outside its validated limits, which must freeze the batch and trigger an investigation.
- Certificate of analysis (CofA) โ the set of release results, with specs and pass/fail, that accompanies a released lot.
- OPC UA LADS โ Laboratory and Analytical Device Standard (OPC 30500); a self-describing OPC UA information model for lab instruments.
- SiLA 2 โ a gRPC/HTTP2 standard for commanding and discovering lab devices; complements LADS.
- AnIML โ Analytical Information Markup Language; an ASTM XML format for vendor-neutral analytical data with sample/method/audit/signature sections.
- Allotrope ASM / AFO โ the JSON-native Allotrope Simple Model using the Allotrope Foundation Ontology for FAIR, machine-actionable analytical data.
- Verification (four-eyes) โ the control where a second qualified person reviews a preliminary result before it becomes a verified, releasable one.
- RFC 3161 timestamp โ a trusted-timestamp token over a hash of a document, proving its content existed unchanged at a moment in time.
Where this leadsโ
We have now captured the product's verdict โ every at-line sample and every release result, born in the lab and bonded to its batch. But the molecule still has to become a finished, labelled vial, and the clean space around that fill must be watched as closely as the product itself. The next chapter, Fill-Finish, Packaging & Environmental Monitoring, leaves the QC lab for the fill line and the cleanroom, where high-cardinality telemetry โ particle counts, fill weights, serialization events, PackML line states โ meets a hard GxP boundary.