The Analytical Lab: Instruments, LIMS & ELN

📍 Where we are: Part II, "Capturing the Process." We have captured everything the production floor produces — every in-line tag, every chromatography phase, every pooling decision. Now we leave the floor for the QC lab, where the molecule's quality is finally judged, and learn to capture the data that decides whether a batch is released or rejected.

The simple version

The bioreactor and the skids are like a kitchen: they tell you the oven was at 180 °C and the timer ran for 40 minutes. The analytical lab is the food critic. It does not care what the oven said — it tastes the cake and writes a verdict. That verdict (Is it pure? Is it the right antibody? Is it safe to inject?) is the highest-stakes data in the whole plant, because it is the data that lets you ship a medicine. So the lab's job, and this chapter's, is to capture each verdict with an iron-clad answer to one question: who measured this, on which instrument, against which spec, and can we prove nobody quietly changed it afterward?

What this chapter covers

The production floor measures the process; the QC lab measures the product. Once the drug substance (the purified antibody) exists — after Protein A capture, viral inactivation/filtration, polishing, and UF/DF (ultrafiltration/diafiltration, the concentrate-and-buffer-exchange step), the purification train Biologic Drug Manufacturing walks step by step from Capture chromatography through UF/DF and the drug substance — a sample goes to the lab, where a battery of instruments answers the release questions: how pure (size-exclusion HPLC — high-performance liquid chromatography), how correctly charged (cation-exchange HPLC), how much host-cell protein and DNA contaminate it, how much endotoxin. These results, plus the daily at-line samples taken throughout the run, are the certificate of analysis (CofA) — the lot's (the batch once it is dispositioned for release; lot and batch are used interchangeably here) official quality summary, the thing an inspector reads first.

This chapter shows how to capture that data in open source:

The deterministic offline / at-line assays and the HPLC release panel the simulator produces, and the lab.sample / lab.test / lab.result model that holds them.
Getting data off the instruments: OPC UA LADS device servers and SiLA 2 commands, and the vendor-neutral analytical formats — AnIML and the full Allotrope stack (AFO, ADM, ADF, ASM), unpacked layer by layer.
An open-source LIMS (SENAITE) for sample login, worksheets, and verified results, and an ELN (eLabFTW) for method and experiment provenance with cryptographic e-signatures.
Pulling verified results back into the batch record — and being brutally honest about the Part 11 gaps that mean none of these tools is compliant out of the box.

Every number below comes from a file you can regenerate byte-for-byte with SIM_SEED=2026.

Two kinds of lab data: at-line and release

The lab produces two distinct streams, and they have different rhythms.

The first is at-line / offline process monitoring: twice a day, an operator pulls a few millilitres from the bioreactor and runs it through a cell counter, a metabolite analyzer, and an osmometer. These tell you how the culture is doing right now — how many living cells there are (viable cell density), what fraction of them are alive (viability), the nutrients it is consuming (glucose, glutamine) and the wastes it is making (lactate, ammonia), and how concentrated the medium is overall (osmolality, the total dissolved-particle concentration). They are the offline twins of the in-line tags — the measurements a sensor streams continuously from inside the bioreactor — and Chapter 10's whole job was reconciling the two (matching each bench number to the matching point on the online curve). The companion repo generates them from the same kinetic state the in-line trace comes from, so a bench number agrees with the online curve — just noisier and sparser.

From examples/sim/bioproc_sim/offline_assays.py, the sampling cadence and the measurement model are explicit:

# examples/sim/bioproc_sim/offline_assays.py
def sample(result: BatchResult | None = None, batch_id: str = "BATCH-2026-001") -> pd.DataFrame:
    """Two offline samples per day from the fed-batch state, with assay noise + LoD."""
    if result is None:
        result = simulate(batch_id)
    s = result.state
    rng = stream_rng("offline_assays", result.batch_id)

    minutes = []
    day = 0.0
    while day <= 14.0 + 1e-9:
        for frac in (0.25, 0.75):  # ~06:00 and ~18:00
            m = int(round((day + frac) * 1440))
            if m < len(s):
                minutes.append(m)
        day += 1.0

Twice a day (around 06:00 and 18:00) over a 14-day fed batch (a run seeded with a modest number of cells that then grow, fed with periodic shots of concentrated nutrients as the culture builds — rather than the continuous feed-and-harvest of perfusion) gives 28 in-process samples. Each value is the true kinetic state plus a small, assay-specific noise term — a VCD (viable cell density) read is drawn as Xv × (1 + N(0, 0.05)), viability as state + N(0, 1.2), where Xv is the true cell count and N(mean, sd) is a random draw from a normal (Gaussian) distribution with that mean and standard deviation, so N(0, 0.05) is mean-zero noise of about ±5% and N(0, 1.2) adds scatter of standard deviation 1.2 — which is exactly how a bench instrument differs from a sensor: same truth, a little measurement scatter on top.

Run python -m bioproc_sim.offline_assays and the first committed rows of datasets/offline_assays.csv look like this — a wide, tidy table, one row per sample:

sample_id,batch_id,sample_time,sample_point,VCD_e6_per_mL,viability_pct,glucose_g_L,lactate_g_L,glutamine_mM,ammonia_mM,osmolality_mOsm_kg,titer_g_L,pH_offline
BATCH-2026-001-OFF-001,BATCH-2026-001,2026-01-05 06:00:00+00:00,BR101,0.34,96.6,6.18,0.13,4.13,0.68,293,0.002,7.06
BATCH-2026-001-OFF-002,BATCH-2026-001,2026-01-05 18:00:00+00:00,BR101,0.43,96.6,6.26,0.19,4.31,0.38,292,0.008,7.04
BATCH-2026-001-OFF-003,BATCH-2026-001,2026-01-06 06:00:00+00:00,BR101,0.56,99.0,6.01,0.32,3.83,0.45,287,0.014,7.05

The second stream is release testing: once the drug substance exists, the QC lab runs the panel that decides whether it can be released. This is the high-stakes data. From the same module, the release specs are coded as a table of (name, low, high, unit, target, sd):

# examples/sim/bioproc_sim/offline_assays.py
_RELEASE_SPECS = [
    ("SEC_monomer_pct", 95.0, 100.0, "%", 98.5, 0.4),
    ("SEC_HMW_pct", 0.0, 3.0, "%", 1.1, 0.3),
    ("CEX_main_pct", 60.0, 80.0, "%", 70.0, 2.0),
    ("HCP_ng_per_mg", 0.0, 100.0, "ng/mg", 22.0, 8.0),
    ("residual_ProteinA_ng_per_mg", 0.0, 20.0, "ng/mg", 4.0, 1.5),
    ("host_cell_DNA_ng_per_dose", 0.0, 10.0, "ng/dose", 1.2, 0.5),
    ("endotoxin_EU_per_mL", 0.0, 5.0, "EU/mL", 0.3, 0.15),
    # ... bioburden, SEC_LMW, CEX_acidic, CEX_basic
]

Reading the test names: SEC (size-exclusion chromatography) sizes the molecule, so SEC_monomer_pct is the percent that is the intact single antibody and SEC_HMW_pct the percent of unwanted high-molecular-weight aggregates; CEX (cation-exchange chromatography) sorts by charge, so CEX_main_pct is the percent in the correctly-charged main peak (the rest being slightly more acidic or basic variants); HCP is leftover host-cell protein and host_cell_DNA leftover host-cell DNA, both contaminants from the cells that made the antibody; residual_ProteinA is capture-resin that bled into the product; and endotoxin is a bacterial toxin measured in endotoxin units (EU). The (low, high) pair on each line is the validated acceptance window — e.g. the main charge peak must land between 60% and 80% — set by process characterisation, not chosen here.

The two assays the code leaves as a trailing comment — the CEX acidic/basic variants and bioburden — are worth a sentence each, because each is the direct readout of a specific downstream unit op. The acidic and basic species the CEX main peak excludes are charge variants: deamidation and sialylation push the molecule acidic, while C-terminal lysine and other modifications push it basic, and a polishing step (often the same cation-exchange or a mixed-mode column) is tuned precisely to trim those tails — so a rising acidic shoulder is the polishing chromatography's report card, not an isolated number. Bioburden — the count of viable microorganisms in a non-sterile in-process pool — is the readout of aseptic technique and of the viral filtration and 0.2-µm sterilising filters that bracket the train; it is the low-stakes, high-frequency sentinel that an endotoxin excursion or a sterility failure is usually downstream of. Both round out the panel from "what the molecule is" (SEC/CEX identity and purity) to "what the process let through" (HCP, DNA, residual Protein A, bioburden, endotoxin) — the capture, viral-inactivation, polishing, and UF/DF steps each owning one or more rows of the certificate.

Each test draws a value around its target and flags PASS or OOS (out of specification) against the limits:

# examples/sim/bioproc_sim/offline_assays.py
val = target + (rng.normal(0, sd) if sd > 0 else 0.0)
val = float(np.clip(val, low, high))
rows.append({
    "batch_id": bid, "test": name, "value": round(val, 3), "unit": unit,
    "spec_low": low, "spec_high": high,
    "result": "PASS" if low <= val <= high else "OOS",
})

Eleven tests per batch, six batches in the golden campaign — 66 rows in all. Look at datasets/hplc_results.csv and the simulator has planted exactly one deliberate failure — the kind of thing the rest of the trilogy's governance machinery exists to catch:

batch_id,test,value,unit,spec_low,spec_high,result
BATCH-2026-001,SEC_monomer_pct,98.611,%,95.0,100.0,PASS
BATCH-2026-001,HCP_ng_per_mg,28.203,ng/mg,0.0,100.0,PASS
...
BATCH-2026-004,HCP_ng_per_mg,128.0,ng/mg,0.0,100.0,OOS

BATCH-2026-004 has a host-cell-protein result of 128 ng/mg against a 100 ng/mg limit (that is 128 ppm — parts per million; because one milligram is a million nanograms, ng of host-cell protein per mg of antibody is exactly parts-per-million) — a single number that should freeze that batch and open an investigation. The whole reason we are so careful about how this number is stored is that, when it is an OOS, it must be tamper-evident, attributable, and impossible to quietly "fix." FDA's data-integrity guidance is explicit that QC release data carries the strongest audit-trail and quality-unit-review expectations [12] — the quality unit being the independent quality-assurance group that must sign off a release — and 21 CFR Part 11 — the FDA rule that says when an electronic record and e-signature may stand in for signed paper — sets the bar those results must meet [13] (Part V builds those controls clause by clause).

The lab data model: sample → test → result

All of this lands in three tables that are reused by every later chapter. From examples/platform/db/30-lab-events.sql:

-- examples/platform/db/30-lab-events.sql
CREATE TABLE lab.sample (
    sample_id    text PRIMARY KEY,
    batch_id     text REFERENCES s88.batch,
    sample_time  timestamptz NOT NULL,
    sample_point text NOT NULL,
    sample_type  text NOT NULL DEFAULT 'in_process'   -- in_process | release | stability
);

CREATE TABLE lab.test (
    test_id   text PRIMARY KEY,
    name      text NOT NULL,
    unit      text,
    spec_low  numeric,
    spec_high numeric
);

CREATE TABLE lab.result (
    result_id   bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
    sample_id   text NOT NULL REFERENCES lab.sample,
    test_id     text REFERENCES lab.test,
    value       numeric,
    text_value  text,
    unit        text,
    result_ts   timestamptz NOT NULL DEFAULT now(),
    analyst     text,
    instrument_id text,
    status      text NOT NULL DEFAULT 'preliminary',   -- preliminary | verified | rejected
    UNIQUE (sample_id, test_id, result_ts)
);
CREATE INDEX ON lab.result (sample_id);

Three columns carry most of the regulatory weight. sample.batch_id is a foreign key — a column whose value must match a real row in another table — pointing straight into the ISA-88/95 batch table (the plant's equipment-and-lot model, built in its own chapter), so every result is permanently bonded to the lot it judges — the attributability backbone. result.analyst and result.instrument_id answer "who and on what." And result.status encodes the lab workflow itself: a result is born preliminary, becomes verified when a second qualified person reviews it (the four-eyes principle — two people, four eyes, so no single analyst can wave a result through alone), and can be rejected. A preliminary result is not release data; only a verified one is. The UNIQUE (sample_id, test_id, result_ts) constraint means you never silently overwrite a result — a re-test is a new row with a new timestamp, never an edit, which is how the audit trail stays honest.

From instrument to batch record. Devices speak LADS / SiLA / AnIML / ASM; SENAITE owns sample login and the preliminary→verified transition; eLabFTW signs and timestamps the method record; only verified results cross into PostgreSQL and the batch record. The red notes mark where pure OSS does not yet meet Part 11.

Original diagram by the authors, created with AI assistance.

Anatomy of a verified result: dissecting one `lab.result` row

The three-table model is the skeleton; the muscle is one row. Just as Chapter 7 unpacked an OPC UA DataValue — the structured record that wraps every sensor reading with its quality flag, timestamp, and typed value rather than shipping a bare number — to show a sensor reading is never a bare number, a release result is never a bare 128 — it is an attributable, spec-framed, tamper-evident record. Take the one that decides BATCH-2026-004: the host-cell-protein assay came back at 128 ng/mg against a 100 ng/mg limit, the OOS line in examples/datasets/hplc_results.csv. Here is the lab.result row the verified-only sync writes for it (the value, sample_id, test_id, spec, and status are the real loaded data; the result_id and result_ts are shown illustratively — the loader writes analyst='auto' and lets the database mint the identity and timestamp):

result_id     4087                       -- GENERATED ALWAYS AS IDENTITY, immutable
sample_id     BATCH-2026-004-DS          -- FK -> lab.sample -> s88.batch
test_id       HCP_ng_per_mg              -- FK -> lab.test (spec_low 0, spec_high 100)
value         128.0      unit  ng/mg
result_ts     2026-01-21 14:32:07+00     -- part of UNIQUE (sample_id, test_id, result_ts)
analyst       auto                       -- who (the loader writes analyst='auto')
instrument_id ELISA-02                   -- on what (HCP is an ELISA immunoassay)
status        verified                   -- preliminary -> verified -> (rejected)

Walk it the way UaExpert (the desktop browser Chapter 7 used to click through an OPC UA node field by field) walked the OPC UA node — every column earns its place:

result_id — the immutable identity. GENERATED ALWAYS AS IDENTITY means the database, not the application, mints it, and it is never reused. This is the OPC UA NodeId's analog: a stable handle you can cite in an investigation forever.
sample_id — the attributability bond. A foreign key to lab.sample, whose own batch_id keys into the ISA-88/95 s88.batch table. That chain is what lets you say this number belongs to this lot — the spine of attributability, and the reason a result can never float free of the batch it judges. (The batch model itself is The Batch & Equipment Data Model: ISA-88/95 in PostgreSQL.)
test_id — the verdict frame. A foreign key to lab.test, which carries spec_low and spec_high. The result does not store its own pass/fail; the spec lives with the test, so the verdict (128 > 100, therefore OOS) is derived against a controlled limit, not a flag typed in beside the number.
value + unit — the measurement, never a bare scalar. 128.0 is meaningless without ng/mg; the pair travels together, exactly as the OPC UA Variant carried Double 4.902 and the Allotrope ASM leaf carried a value-and-unit pair. The unit is QUDT-mappable — QUDT being the open Quantities, Units, Dimensions and Types vocabulary, which gives every unit a stable machine-readable identity so software can convert between them — so the knowledge graph (Chapter 19, where a machine reasons over linked facts) can reason over it.
result_ts — the timestamp that makes the trail honest. It is the third column of UNIQUE (sample_id, test_id, result_ts). Because the uniqueness key includes the time, a re-test cannot overwrite the first result — it lands as a new row with a new timestamp. The history is append-only by construction, not by policy.
analyst and instrument_id — who and on what. The two columns that turn a number into testimony: measured by this person, on this qualified instrument (HCP is run by ELISA — an enzyme-linked immunoassay, a plate-based antibody test — so the instrument here is ELISA-02, not the HPLC-07 that runs SEC/CEX). Drop them and the result is anonymous, which is precisely the failure mode an inspector hunts for.
status — the four-eyes gate, encoded. A result is born preliminary, becomes verified only when a second qualified person reviews it, and may be rejected. Crucially, the row above is verified — the reading is confirmed real — yet its verdict is OOS and the batch is what gets rejected. Verified-but-OOS is the honest state: a trustworthy record of a bad result, not a quiet edit that makes the bad result disappear.

One verified result, fully unpacked: identity, the batch bond, the spec frame, the value-and-unit measurement, the timestamp that forces append-only re-tests, who and on-what, and the status gate — the lab's exact analog of the OPC UA DataValue card. Original diagram by the authors, created with AI assistance.

That single row is the smallest defensible unit of a release decision. Everything the rest of this chapter builds — the LADS/SiLA capture, the Allotrope canonicalization, the SENAITE workflow — exists to fill it in without ever letting a hand touch the number between the instrument and the row.

Where this sits in the trilogy

This lab.result row is the open-source implementation of an artifact the other two books describe. In Biologic Drug Manufacturing, Quality control and batch release is the physical step where a finished batch is judged and its Certificate of Analysis written, and Measuring quality and keeping the protein stable is where each critical quality attribute and its spec limit are defined (the developability Kd/Tm reads that begin the chain come from molecule discovery). In Data Management in Biopharmaceutical Manufacturing, Data Integrity and ALCOA+ frames the open problem — how to make that release verdict attributable, immutable, and tamper-evident. The verified row dissected above is where that spec and that data-point become running SQL.

The same row as a triple, a shape, and a provenance chain

The relational row is one rendering of the verdict; the same facts re-express cleanly as the RDF triples (subject–predicate–object facts) the knowledge-graph chapter builds, which is what lets a release result join the digital thread rather than sit in one table. The value-and-unit pair becomes a QUDT-typed literal hung on the lot node — bp:DS-004 bp:hcpPpm "128.0"^^xsd:float — and the test_id/spec_low/spec_high frame becomes a constraint a validator can enforce. That is not a metaphor: Book 4 models the exact release decision as a SHACL shape, where the spec_high of 100 ppm is a sh:maxInclusive and the four-eyes requirement is a sh:minCount 1 on the signature — see The release gate and SHACL, Instances and the graph, and Identifiers and units for the QUDT mapping. The point the two books make from opposite ends is one point: the SQL UNIQUE/status/spec constraints and the SHACL sh:maxInclusive/sh:minCount shape are the same release rule written in two dialects.

A competency question — the question the data model must be able to answer — makes the link concrete. "Which released lots carry an out-of-spec HCP result, and who signed each?" is one SPARQL query over the verified rows once they are in the graph:

PREFIX bp: <https://example.org/bioproc#>
SELECT ?lot ?hcp ?signer WHERE {
  ?lot bp:hcpPpm ?hcp ; bp:approvedBy ?signer .
  FILTER(?hcp > 100.0)                       # the spec_high frame, as a filter
}

The same analyst, instrument_id, and result_ts columns are exactly what a PROV-O record needs — the W3C provenance vocabulary whose prov:wasGeneratedBy, prov:wasAttributedTo, and prov:generatedAtTime map one-to-one onto on-what, who, and when. A verified result is a prov:Entity generated by an assay activity, attributed to its analyst, and bonded by sample_id to the lot it judges — the attributability backbone re-stated as a provenance graph the genealogy chapter walks. So the row is not just storable; it is FAIR-shaped (Findable, Accessible, Interoperable, Reusable) the moment its units map to QUDT and its terms to a shared vocabulary, which is the whole reason the chapter insists on ASM over a bare CSV.

The verified row as a model input: leakage, grouping, and drift

A verified release row is also the trusted label every downstream model learns from, and the discipline this chapter enforces is precisely what keeps those models honest. A preliminary number that a later model treats as ground truth is a silent data-leakage bug — the model trains on a value the lab had not yet stood behind; the review_state=verified gate is therefore not only a release control but a clean-label contract for the release predictor and the soft sensors. The connections are direct:

One independent unit is the batch, not the row. Eleven tests on one lot share a cell bank, a campaign, and an analyst, so they are not eleven independent observations. A model validated on row-wise random splits leaks batch identity across train and test and reports a flattering score; the honest protocol is the grouped / leave-one-batch-out cross-validation the models-and-validation chapter builds, where every result from a lot stays on one side of the split. The sample_id → batch_id foreign key dissected above is the exact grouping key that cross-validation must group on.
The spec window is an applicability-domain boundary. A model asked to predict a CQA for a lot whose inputs sit outside the range the training lots spanned is extrapolating, and its number should be flagged before it is trusted — the applicability-domain gate (a Hotelling T² / SPE check) the same chapter adds to its soft sensor. The validated (spec_low, spec_high) acceptance window is the analytical analog: a result outside it is OOS; an input outside the model's training envelope is out-of-domain.
Process drift and model drift are different clocks. The living culture wanders batch to batch (process drift, caught by SPC on the result stream); a static model watching that process decays separately (model drift, caught by PSI on its inputs and a residual chart against the slow offline reference). The verified lab.result stream is the ground truth both detections lean on — it is the lagging, authoritative signal the MLOps chapter charts residuals against, which is why a clean verify gate here is what makes drift detection downstream meaningful rather than chasing un-verified noise.

Model lineage closes the loop: a model trained on these rows pins them by dataset hash, exactly as the verified row pins its own identity, so "which results trained this model?" is a hash that matches or does not — the same append-only, attributable discipline, one layer up.

Getting data off the instruments

The hard, unglamorous truth of lab integration is that most analytical instruments are islands. The HPLC has its own chromatography data system; the plate reader exports a proprietary blob; the cell counter prints a PDF. The whole point of the standards below is to stop re-typing numbers.

A reader's orienting note before the survey: the next several sections walk the full standards landscape — OPC UA LADS, SiLA 2, AnIML, and the Allotrope stack — because a real lab meets all of them, but you can skim the ones you do not have on your bench. The two you will actually open are the vendor-neutral result files, AnIML and Allotrope ASM — read those closely, since they are the formats the companion repo ships and the rest of the chapter consumes. LADS and SiLA are the transport that gets data to those files; treat them as background unless you are wiring an instrument yourself.

Some of those islands are binary, not text — and that changes the integration order. A representative industry case is liquid chromatography–mass spectrometry: a Waters MassLynx station writes its acquisition into a proprietary .raw directory, and the compact LC-MS instruments aimed at multi-attribute-method (MAM) and QC release work — the Waters BioAccord is one such device — feed that same native format. The vendor-neutral, durable export here is mzML, the HUPO-PSI open XML for mass-spectral data [14]. This matters for sequencing your pipeline: the AnIML and Allotrope ASM canonicalizers below operate on text, Excel, and CSV exports — they cannot read a proprietary binary blob directly — so for an instrument like this the chain is binary .raw → mzML export → canonicalize, with the binary-to-open-XML hop coming first.

OPC UA LADS: one self-describing model for every instrument

The newest and most promising path off an instrument is OPC UA LADS — the Laboratory and Analytical Device Standard, OPC 30500-1, version 1.0.0, published 2023-11-30 by a joint working group of the OPC Foundation, SPECTARIS, and VDMA [1]. It is one of the OPC UA companion specifications introduced in Chapter 7 (a companion spec being a standard information model layered on base OPC UA for one device class), and it does for a titrator (an instrument that measures concentration by adding a reagent until a reaction completes) or an HPLC exactly what the base spec did for our bioreactor: gives it a self-describing address space a client browses to discover — meaning the instrument publishes its own structure on the network so a client can walk that tree and learn what it offers, instead of the vendor having to ship a custom software driver for it. The peer-reviewed design rationale is worth reading — LADS exists because networked labs needed one model instead of dozens of drivers [2].

What makes LADS more than "OPC UA, but for labs" is that it splits every device into two views, because the lab and the maintenance shop ask different questions [1]. The Hardware view is the physical machine: nameplates (manufacturer, model, serial — carried in a MachineIdentificationType reused from OPC UA for Machinery), the sub-components (a centrifuge's rotor, drive, lid), the calibration and validation status, and the NAMUR NE 107 device-health state (NAMUR being the process-industry user association whose NE 107 recommendation defines this standard set of health signals — NORMAL, FAILURE, CHECK_FUNCTION, OFF_SPEC, MAINTENANCE_REQUIRED) — everything an asset-management or service system needs. The Functional view is the running instrument: the functions it performs, the programs it runs, and the results it produces. One physical box, two browsable trees, decoupled so a single device can present several independent "virtual instruments."

The Functional view has a shape worth knowing, because it is the answer to "where does a result actually live." A LADSDeviceType holds a FunctionalUnitSet; each FunctionalUnit is a virtual instrument carrying a FunctionSet of typed Functions — an AnalogScalarSensorFunctionType for a single reading, an AnalogControlFunctionType for a setpoint, and crucially an AnalogArraySensorFunctionType whose value is an OPC UA Double array rather than a scalar (the hook a spectrum or chromatogram hangs on). Alongside the functions sits a ProgramManager with an ActiveProgram (live progress — current step, estimated runtime) and a ResultSet. Because LADSDeviceType derives from the DI (Devices) DeviceType, it inherits the device-health model (those NAMUR NE 107 states) rather than reinventing it, and borrows its nameplate from OPC UA for Machinery [1].

A run produces a Result object, and this is where LADS earns its keep in a regulated lab: the result is immutable once the run completes and carries its own provenance — the User who ran it, the Started and Stopped timestamps, a DeviceProgramRunId, the sample list, and an immutable copy of the program template actually used — with the measured data delivered either as OPC UA variables (a VariableSet) or as attached files (a ResultFileType carrying a MimeType and the bytes over the native OPC UA FileType). That file slot is the seam an AnIML or Allotrope document rides out through — though note the honest detail: the LADS spec defines the generic file-attachment mechanism, it does not mandate any particular analytical format, and the working group's own reference servers have so far demonstrated Allotrope ASM (JSON) results rather than the binary ADF [1].

Driving the instrument is method calls, not magic: a client invokes StartProgram on the functional unit's state machine — passing the program-template id, key-value properties, supervisory job and task ids, and the sample list, and getting back a DeviceProgramRunId — and the unit then walks an ISA-88-flavoured state machine (Stopped → Running → Stopping, with Abort and Clear) that a client follows by subscribing to its CurrentState, exactly the subscription mechanism Chapter 7 built.

One physical instrument, two trees: the Hardware view answers "what is this machine and is it healthy," the Functional view answers "what did it run and what did it produce." A scalar reads through a scalar-sensor function; a spectrum rides an array-sensor function or an attached result file. Original diagram by the authors, created with AI assistance.

A minimal LADS-style result node, as the repo's illustrative examples/ingest/lads_server.js sketches it, looks like this. The file is committed as a teaching sketch, modelled on the LADS information model — it is not a runnable, certified LADS server (the repo ships the asyncua OPC UA bioreactor server for Chapters 5/7 as its real OPC UA stack):

// examples/ingest/lads_server.js — illustrative LADS-shaped result node (not a certified LADS server)
const fnSet = addObject(device, "FunctionalUnitSet");
const hplc  = addFunctionalUnit(fnSet, "HPLC_Titer");
addAnalogResult(hplc, {
  name: "ProteinConcentration",
  value: 5.877, unit: "g/L",            // QUDT-mapped engineering unit
  sampleId: "BATCH-2026-001-DS",
  method: "SOP-AT-HPLC-001",
  measuredAt: "2026-01-20T10:15:00Z"
});

The scalar value: 5.877 above is the simple case, but a LADS result is not limited to one number: a full spectrum or chromatogram trace can ride a LADS array variable — an OPC UA array data type — so the whole curve lives in the same self-describing address space as the single titer, not in a side file.

SiLA 2: commanding the instrument

LADS publishes an instrument into the plant data fabric; SiLA 2 (Standardization in Lab Automation, version 2) comes at the same instrument from the other side — commanding it. Where LADS is an OPC UA companion spec, SiLA 2 is its own standard built on gRPC over HTTP/2 with Protocol Buffers (a modern, efficient way for programs to call each other over the network and exchange compact binary messages — replacing SiLA 1's older, verbose XML/SOAP web-service style), governed by the SiLA Consortium, with MIT-licensed reference stacks in Python, Java, and C# [6][7]. It was redesigned from the ground up — SiLA 1's XML/SOAP is gone, and SiLA 2 is deliberately not backward-compatible, a working-group decision to keep the design clean [7].

The self-description that LADS gets from a browsable address space, SiLA gets from a Feature. A feature is a unit of capability — "do HPLC titer," "report temperature" — described by a Feature Definition Language (FDL) document: an XML interface that lists the feature's Commands, Properties, Metadata, data-type definitions, and the errors it can raise [7]. Every SiLA server must implement one core feature, SiLAService, exposing ServerName, ServerType, ServerUUID, ServerVersion, and an ImplementedFeatures list; a client connects, reads that list, and calls GetFeatureDefinition to pull each feature's FDL at runtime — so it learns the device's full capability set with no pre-shared driver, the same payoff browsing buys in OPC UA. Servers announce themselves on the network by mDNS/DNS-SD (zero-configuration service discovery — the same local-network auto-advertising that lets a laptop find a printer) under the _sila._tcp service type, and the whole conversation is TLS-mandatory by spec (TLS being the standard transport encryption that secures HTTPS), with a self-signed server required to publish its CA in the discovery record.

The piece worth dwelling on is the observable command, because a lab instrument's job is rarely instantaneous. An unobservable command is one request/response RPC. An observable command — "run this 20-minute assay" — returns immediately with a command-execution id; the client then subscribes to an ExecutionInfo stream reporting a status (waiting, running, finishedSuccessfully, finishedWithError), a progress fraction, and an estimated time remaining, optionally receives intermediate responses along the way, and finally fetches the result by that id. It is the same report-by-exception spirit — send an update only when something changes, rather than re-polling — as an OPC UA subscription or a Sparkplug DDATA (both Chapter 7 push mechanisms where the device emits a message on change), expressed as a typed RPC lifecycle. Properties mirror the split: an unobservable one you read once, an observable one you subscribe to as a server-streamed sequence. Units and bounds are not free text either — a SiLA Unit constraint is built compositionally from SI base units with a factor and offset, the same machine-convertible discipline OPC UA's EngineeringUnits and AnIML's SIUnit enforce.

So LADS and SiLA are not rivals; they are two halves of one workflow, "currently coexisting" in real bioprocess labs [2]. SiLA drives the instrument — start the run, watch it progress, pull the result — and LADS/OPC UA publishes that instrument and its results into the plant's OT fabric, where the historian, the MES, and the collector this book builds can reach them. A modern lab runs both, often bridged by a gateway.

Vendor-neutral result files: AnIML

Whether or not an instrument speaks LADS or SiLA, you still want to archive its output in a format you can open in twenty years without the vendor's software. Two open standards do this, and the repo ships one example of each for the same HPLC titer.

AnIML (Analytical Information Markup Language) is the older one — an XML format governed by ASTM subcommittee E13.15 (under Committee E13 on Molecular Spectroscopy and Separation Science), built to modernise the JCAMP-DX and netCDF lineages [4]. One caveat sets expectations honestly: the core schema is still a draft, version 0.90 — its namespace literally reads …:schema:core:draft:0.90, which is why the example below pins version="0.90" — so AnIML is widely used and stable in shape, but it is not yet a finished, numbered ASTM standard [5].

Its document tree is the thing to understand, because it is a general container the way OPC UA's address space is. An AnIML document holds, in order, a SampleSet (the materials), an ExperimentStepSet (what was done to them), and then two sections that make it a records format rather than a bare data dump: an AuditTrailEntrySet and a SignatureSet. Each ExperimentStep names its Technique, its Method (author, device, software), an Infrastructure block whose ParentDataPointReferenceSet wires one step's output to the next as provenance, and one or more Result blocks. The numbers themselves live in a SeriesSet of Series, each tagged independent or dependent and typed (Float32, Float64, Int32, …) — and AnIML carries a scalar and a dense curve in the same structure: a one-value IndividualValueSet for our single titer, an EncodedValueSet (base64-packed, little-endian) for a thousand-point spectrum, or an AutoIncrementedValueSet (a start value plus an increment) for a regular wavelength axis you needn't enumerate point by point. Units are not free text — a Unit is built from SI base tokens with a factor and exponent, so "mAU" is machine-convertible rather than a string to be guessed.

Two sections earn AnIML its place on a release bench. The AuditTrailEntrySet records, in the file itself, each created / modified / signed action with its author, timestamp, reason, and a machine-readable Diff; the SignatureSet is a standard W3C XML Signature, so tamper-evidence is a known cryptographic primitive embedded in the document, not a bespoke scheme. The permissive generic core is then specialised per instrument by a Technique Definition (an ATDD file) that says, for example, that a UV/Vis result must carry wavelength in nm — the constraint the bare core leaves open [5]. The committed datasets/hplc_titer.animl.xml is a deliberately minimal but valid-shaped example:

<!-- examples/datasets/hplc_titer.animl.xml -->
<AnIML xmlns="urn:org:astm:animl:schema:core:draft:0.90" version="0.90">
  <SampleSet>
    <Sample sampleID="BATCH-2026-001-DS" name="Drug Substance"/>
  </SampleSet>
  <ExperimentStepSet>
    <ExperimentStep experimentStepID="titer-hplc" name="Protein A HPLC titer">
      <Result name="Titer">
        <SeriesSet name="titer" length="1">
          <Series name="concentration" dependency="dependent" seriesID="c" seriesType="Float32">
            <IndividualValueSet><F>5.877</F></IndividualValueSet>
            <Unit label="g/L"/>
          </Series>
        </SeriesSet>
      </Result>
      <Method name="SOP-AT-HPLC-001"/>
    </ExperimentStep>
  </ExperimentStepSet>
</AnIML>

The Allotrope stack: AFO, ADM, ADF, ASM

AnIML is one vendor-neutral home; the other is the Allotrope Foundation framework, and it is worth unpacking properly, because the four acronyms it throws at you — AFO, ADM, ADF, ASM — are not four competing formats. They are four layers of one system, and this book leans on the lowest-overhead layer (ASM) precisely because the rest sits behind a membership wall [3]. Read the stack as a dictionary, a grammar, and two ways of writing it down:

AFO — the dictionary. The Allotrope Foundation Ontologies are an OWL/RDF ontology: a controlled vocabulary that fixes the meaning of every term ("peak area," "sample," "injection volume") as a stable IRI (Internationalized Resource Identifier — a globally unique web-style name for the term), aligned to the BFO upper ontology (a tiny, domain-neutral set of top categories — object, process, quality — that every more specific term hangs from, so independently built ontologies still agree on what kind of thing each term is; see Chapter 19) and published openly under CC-BY (a permissive Creative Commons "give credit" licence) [3]. This is what makes an Allotrope field machine-actionable and not merely machine-readable: the key is not the loose string "concentration," it is a term with a definition a reasoner can follow.
ADM — the grammar. An Allotrope Data Model is where the structure lives. Each ADM constrains how the AFO vocabulary may be assembled for one analytical technique — which classes are required, what cardinality, which units are legal — and it expresses those constraints as SHACL shapes (the W3C — World Wide Web Consortium — Shapes Constraint Language: RDF triples, the subject–predicate–object facts dissected in Chapter 19, that a validator checks a document against) [15]. There is roughly one model per technique — liquid chromatography, mass spectrometry, pH, balance, cell counting, and a couple of dozen more — and a model is not tied to one file format: the same ADM is rendered two ways.
ADF and ASM — the two renderings. The Allotrope Data Format is the heavy one: an HDF5 binary file (HDF5 being a standard scientific container format built to hold large numeric arrays efficiently) internally split into three parts — a Data Description (an RDF graph of the semantic metadata), an n-dimensional Data Cube (the array payload — a spectrum, a chromatogram), and a Data Package (a virtual filesystem for companion originals) [3]. The Allotrope Simple Model is the light one: a JSON serialization of the same model, made for the common case where the result is a handful of scalars rather than a giant array.

That relationship — one ADM, validated as RDF against its SHACL shapes when it is an ADF, and against a published JSON Schema when it is an ASM — is the whole architecture, and it is why Allotrope's catalogue lists an "ASM version" and an "ADM version" side by side for each technique. The releases come three to four times a year, each model carrying a maturity level (Working Draft → Candidate Recommendation → Recommendation) [3].

Four acronyms, one system: AFO fixes meaning, an ADM fixes structure as SHACL shapes, and the same model is written down two ways — the heavy ADF binary cube for arrays and the light ASM JSON for scalars. Only AFO and the ASM schemas are open; the ADF libraries and full model set are membership-gated. Original diagram by the authors, created with AI assistance.

For this book the ASM rendering is the one we can actually ship, so it is worth reading field by field. Here is the titer — now beside the SEC monomer purity, the two release scalars the document binds to one sample — as datasets/hplc_titer.asm.json:

{
  "$asm.manifest": "http://purl.allotrope.org/manifests/core/REC/2024/06/manifest.schema",
  "measurement aggregate document": {
    "measurement document": [
      {
        "sample document": {
          "batch identifier": "BATCH-2026-001",
          "sample identifier": "BATCH-2026-001-DS"
        },
        "device system document": {
          "device identifier": "HPLC-07",
          "model number": "OpenHPLC-1"
        },
        "measurement identifier": "BATCH-2026-001-titer",
        "protein concentration": { "value": 5.877, "unit": "g/L" },
        "measurement time": "2026-01-20T10:15:00Z"
      },
      {
        "sample document": {
          "batch identifier": "BATCH-2026-001",
          "sample identifier": "BATCH-2026-001-DS"
        },
        "device system document": {
          "device identifier": "HPLC-07",
          "model number": "OpenHPLC-1"
        },
        "measurement identifier": "BATCH-2026-001-sec-monomer",
        "monomer percentage": { "value": 98.611, "unit": "%" },
        "measurement time": "2026-01-20T11:30:00Z"
      }
    ]
  }
}

Walk it the way the AFO would. The outer measurement aggregate document wraps a list of measurement document entries — Allotrope's pattern of "one business object and all the measurements about it" — and the list earns its plural here: the Protein A titer (protein concentration, 5.877 g/L) and the size-exclusion monomer purity (monomer percentage, 98.611%) are two measurements about the same drug-substance sample, so each is its own measurement document that shares the one sample document. Inside each, a sample document and a device system document carry the identifiers, and the reading itself — protein concentration, monomer percentage — is not a bare number but a value-and-unit pair, where the unit resolves to a QUDT term (g/L, %) so a downstream tool knows what it means and could convert it [16]. Each such field carries, in the published schema behind it, an $asm.property-class annotation pointing at the exact AFO IRI that defines it — that is the machine-actionability AFO promised, made concrete. (One honest clarification for the careful reader: ASM achieves this through those Allotrope-specific schema annotations and the QUDT/AFO IRIs, not through a JSON-LD @context block — so treat an ASM document as faithfully RDF-mappable rather than literally JSON-LD.)

Two honest trims keep this example legible, and they answer the natural question — is a value-and-unit leaf really all a measurement document holds? In a production Allotrope liquid-chromatography ASM, no: each measurement document nests far more — an injection document (volume, sequence position), a device control aggregate document (the gradient and column actually run), and a processed data aggregate document whose peak list carries every peak's retention time and area — and the aggregate carries the whole release panel (CEX main %, HCP, residual Protein A, and the rest) as further measurement document siblings, not just these two. The dense signal behind each scalar — the raw UV chromatogram the titer and monomer % were integrated from — does not live here at all; that array rides an ADF cube, as the next section shows. What this file deliberately keeps is the part the rest of the chapter consumes: the attributable, QUDT-typed, AFO-named scalars.

Two ASM results about one sample, fully unpacked: the aggregate-document nesting is the shape an ADM fixes, and the list earns its plural — a titer and a monomer-purity measurement document side by side; every leaf is a value-and-unit pair whose unit maps to QUDT and whose field name carries an AFO term IRI — meaning travelling with the number, exactly as the OPC UA DataValue carried it on the wire. Original diagram by the authors, created with AI assistance.

Notice both the AnIML and the ASM file agree on 5.877 g/L for sample BATCH-2026-001-DS — that is the point. The titer the lab measures is, within assay noise, the fed-batch harvest titer the upstream run reached — the same ~5.88 g/L the chromatography chapter feeds into Protein A capture as the load titer (the column then concentrates it ~3.8× to the 22.58 g/L eluate, which is a different number). The standard you choose is mostly a question of which downstream tool you feed: AnIML for archival and ASTM-aligned regulatory packages, ASM for FAIR (Findable, Accessible, Interoperable, Reusable) data lakes and ontology-driven querying — the data-stewardship goal taken up in Chapter 19 and Book 2's Ontologies and FAIR data. Either way the units map to QUDT so the knowledge graph (Chapter 19) can reason over them.

When the result is a curve, not a number

Both files above carry a single scalar. But a full spectrum or chromatogram is a dense numeric array, and that is exactly what ADF's Data Cube was built for — thousands of (x, y) points with their axes and acquisition metadata intact, rather than one reading [3]. So the practical branch is clean: a scalar release result — the 5.877 g/L titer, an SEC monomer percentage — serialises happily into the ASM JSON or AnIML XML the repo already ships; a heavy curve payload — the raw UV trace behind that titer, a mass spectrum — belongs in ADF.

There is an access asymmetry worth being honest about, and it is the reason this book stops where it does. The AFO ontologies are open (CC-BY) and the ASM JSON schemas are published — that is why the small example file can sit in the companion repo at all. The ASM model itself is offered under a tri-license, one track of which is CC-BY-NC (non-commercial). But the full ADF binary libraries and the complete model set are membership-gated — you join the Allotrope Foundation to get them. So the line we draw is precise: we demonstrate the result-shaped ASM and AnIML files end to end, we describe ADF and point at where the dense-array payload would go, but the binary cube itself sits behind a wall the OSS reader may not want to climb.

In the real world: the allotropy library

You do not have to hand-write ASM. Benchling maintains allotropy, an open-source Python library that parses a long list of vendor instrument exports and emits canonical ASM JSON — turning a plate-reader text dump or an analyzer's Excel sheet into the schema-valid shape above. Its hard boundary is the one this whole section turns on: it ingests text, CSV, and Excel exports, not proprietary binary like an ÄKTA .res archive or a MassLynx .raw directory. So for a binary instrument you still run the vendor's own export step first (binary → text/CSV, or → mzML for LC-MS), and only then does allotropy canonicalize it.

The multi-attribute method (MAM)

The array-versus-scalar tension becomes vivid in the multi-attribute method (MAM), the technique that has reshaped biologic QC over the last decade. Instead of running a separate assay per quality attribute, MAM uses a single LC-MS peptide-mapping run — digest the antibody into peptides and weigh the fragments by mass spectrometry — to monitor many critical quality attributes (CQAs — the molecular properties that must stay in spec for the drug to be safe and effective) at once — oxidation, deamidation, glycosylation, sequence variants — plus a new-peak detection step that flags any unexpected species against a reference. One injection, a dozen verdicts.

The data shape shifts accordingly. A classic release test gives one-scalar-per-test; a MAM run produces a spectral/array record — a mass spectrum at each retention time (the elution time at which each peptide leaves the chromatography column) — that is irreducibly a dense numeric array, not a single reading. Yet the two worlds reconcile cleanly in the model this chapter already built: the interpreted outcome still resolves into several (attribute, value) rows in lab.result (one row for oxidation_Met256_pct, one for main_glycan_G0F_pct, and so on), exactly the scalar shape SENAITE verifies and the batch record consumes, while the underlying raw mass-spectrum-over-retention-time belongs in SDMS-style archival — an SDMS being a Scientific Data Management System, the raw-file vault — as mzML or ADF, alongside, not inside, the result table.

In the real world: in-line glycan analysis

This is not hypothetical. A published in-line analytical testbed automates HILIC-HPLC (hydrophilic-interaction liquid chromatography) with fluorescence detection to produce a glycan chromatogram, together with a Protein A column reading the UV titer — characterising a trastuzumab biosimilar (a follow-on copy of an approved antibody drug) made in CHO cells (Chinese-hamster-ovary cells, the workhorse mammalian line for antibody production) in near-real time. It is the perfect anchor for everything above: the titer is a scalar that drops straight into lab.result, while the glycan chromatogram is exactly the dense (retention time, fluorescence) curve that wants an ADF cube or an mzML/AnIML array, not a single cell. Treat any specific vendor instrument named in this chapter as a representative industry example.

SENAITE: an open-source LIMS for the workflow

A pile of result files is not a lab. A LIMS (Laboratory Information Management System) is the system that logs the sample, assigns the tests, captures the analyst's result, and — crucially — runs the verification workflow that turns a preliminary number into a released one. The open-source LIMS this layer would use is SENAITE, an enterprise LIMS built on the Plone/Zope stack (a mature Python content-management framework) and licensed GPL-2.0 (the GNU General Public License — a copyleft open-source licence that requires anything you redistribute built on it to stay open) [8]. SENAITE is not in the shipped compose stack: this repo carries the integration sketch (examples/ingest/senaite_import.py) and the Part 11 gap register rather than a running service. (Note the cited sketch's own header still describes a lab compose profile and a pinned senaite/senaite:2.6.0 image; that profile is aspirational and is not in the shipped compose.yaml, so treat the file as an illustrative sketch only — docker compose --profile lab up brings up nothing today.) To stand it up yourself you would pin the Docker image senaite/senaite:2.6.0 (the :2.6.0 tag locks an exact version so the stack rebuilds reproducibly), and be warned that its first boot takes minutes because Plone bootstraps a lot.

The integration pattern is API-first. SENAITE ships a JSON REST API, so the repo's illustrative examples/ingest/senaite_import.py sketch registers a sample against the batch and posts the at-line results, then later reads back only the verified ones (the route names and POST body are faithful to the real senaite.jsonapi; the surrounding orchestration assumes a SENAITE instance you have stood up yourself):

# examples/ingest/senaite_import.py — register sample + push results via the SENAITE REST API
import requests

S = requests.Session()
S.auth = ("lab_importer", PASSWORD)            # service account, not a person
base = "http://senaite:8080/senaite/@@API/senaite/v1"  # /<plone-site-id>/@@API/...

# 1) create the analysis request (sample login) bonded to the batch
ar = S.post(f"{base}/create", json={
    "portal_type": "AnalysisRequest",
    "Client": "uid-of-internal-qc",
    "SampleType": "drug-substance",
    "ClientSampleID": "BATCH-2026-001-DS",
    "Analyses": ["SEC_monomer_pct", "HCP_ng_per_mg", "endotoxin_EU_per_mL"],
}).json()

# 2) submit a result for one analysis (still 'preliminary' until verified)
S.post(f"{base}/update", json={
    "uid": ar["items"][0]["Analyses"][0]["uid"],
    "Result": "98.611",
})

The workflow that follows — submit → verify → publish — is SENAITE's reason for existing. An analyst submits; a second qualified user verifies (SENAITE can enforce that the verifier is not the submitter); only then is the result publishable. Pulling that verified result into our PostgreSQL lab.result table is then a small, careful sync that refuses to import anything not yet verified:

# examples/ingest/senaite_import.py — only verified results cross into the system of record
for item in S.get(f"{base}/search",
                   params={"portal_type": "Analysis",
                           "review_state": "verified"}).json()["items"]:
    db.execute(
        "INSERT INTO lab.result (sample_id, test_id, value, unit, analyst, "
        "instrument_id, status) VALUES (%s, %s, %s, %s, %s, %s, 'verified') "
        "ON CONFLICT (sample_id, test_id, result_ts) DO NOTHING",
        (item["ClientSampleID"], item["getKeyword"], item["Result"],
         item["Unit"], item["getAnalyst"], item["Instrument"]))

This is the gate that keeps preliminary lab noise out of the batch record. The review_state=verified filter is the whole control in one line.

The verification lifecycle: submit, verify, publish

That one-line filter is the end of a workflow worth drawing in full, because every release decision walks the same five steps — and the OOS row we dissected is what is riding through them:

Sample login. The analyst (or the importer service account) registers an AnalysisRequest and bonds the sample — BATCH-2026-004-DS — to its batch. Nothing can be measured against a lot it is not tied to.
Submit. The analyst enters the number: HCP_ng_per_mg = 128.0 ng/mg. It lands as status = preliminary — visible in the LIMS, but not yet a record. A preliminary result is lab noise, not release data.
Verify — the four-eyes gate. A second qualified user reviews and verifies it; SENAITE can enforce that the verifier is not the submitter. The result flips to review_state = verified. The other exit matters too: a result can be rejected — and a rejected re-test is recorded as a new row, never an edit, so even a thrown-out reading leaves a trace.
Publish — the one guarded crossing. Only now does the verified-only sync (the review_state=verified filter above) INSERT the row into PostgreSQL lab.result. This is the single point where data crosses from the workflow tool into the system of record, and preliminary rows are physically excluded from it.
Verdict. The row is verified, but 128.0 exceeds the 100.0 limit, so the spec frame returns OOS: BATCH-2026-004 is frozen and an investigation opens — while the verified row itself stands, immutable. A trustworthy record of a bad result is exactly what a defensible quality system looks like.

This mirrors Chapter 7's protocol walkthroughs — an OPC UA handshake, a Sparkplug birth-to-death lifecycle — but for people and verdicts rather than packets: the same idea that a record is only trustworthy if you can name every transition it went through. The verified rows are also exactly what the contextualization layer later joins against the time-series stream, so a clean verify gate here pays off two chapters downstream.

When release data goes wrong: the field record

These controls are not hypothetical hygiene; they are the scar tissue of real enforcement. Across recent FDA drug-GMP inspections (GMP = Good Manufacturing Practice, the manufacturing-quality rules a plant must follow), data integrity is the single most-cited deficiency class — a data-integrity finding appears in the large majority of warning letters (a warning letter being the FDA's formal written notice that a firm is in significant violation, the document that can halt shipments), and the recurring mechanisms read like a checklist of attacks on exactly the columns above [17]:

"Testing into compliance." An analyst runs trial injections of a real sample, sees a failing result, discards it as a "test," and keeps only a passing rerun — the OOS that never officially happened. The UNIQUE (sample_id, test_id, result_ts) append-only rule is the direct countermeasure: a discarded injection is a row, not a void.
Disabled audit trails. Switching a chromatography data system's audit trail off, making changes, then switching it back on. FDA's data-integrity guidance is explicit that an HPLC run's audit trail must capture the user, the date and time, the integration parameters, and any reprocessing [12] — which is why analyst, instrument_id, and result_ts are not optional columns.
Deleted or reprocessed injections, and backdated timestamps. Removing inconvenient runs, or stamping a sample as tested on a different day. An immutable result_id and a server-set result_ts inside the uniqueness key are what make this visible rather than silent.
Shared logins. One generic account for the whole bench, so no result is attributable to a person. The analyst column only means something if the identity behind it is real — which is exactly the access-control hardening pure OSS leaves to you (see the verdict below).

Map each failure to its control and the chapter's obsession with how a number is stored stops looking like ceremony. Every column on that lab.result row is a closed door that an inspector has, somewhere, watched stand open.

eLabFTW: the ELN for method and experiment provenance

A LIMS records results; an ELN (Electronic Lab Notebook) records how you got them — the method, the deviations, the reasoning, the analyst's signed statement that "I ran SOP-AT-HPLC-001 on instrument HPLC-07 on this date." The open-source ELN this layer would use is eLabFTW, licensed AGPL-3.0 (an even stronger copyleft that extends the same share-alike obligation to software offered over a network) — and, like SENAITE, it is not in the shipped compose stack; the repo carries the integration sketch (examples/ingest/elabftw_ingest.py), not a running service. (As with SENAITE, the cited sketch's header still names a lab compose profile and a pinned elabftw/elabimg:5.1.15 image; that profile is aspirational and is not in the shipped compose.yaml — treat the file as an illustrative sketch only.) To stand it up yourself you would pin the Docker image elabftw/elabimg:5.1.15 alongside a MySQL sidecar (a companion database container) and run it standalone over the network, where its copyleft imposes nothing on your own code.

eLabFTW's standout features for a regulated lab are cryptographic: it can apply an Ed25519ph electronic signature (a modern public-key digital-signature scheme — the signer's private key stamps the record so anyone with the matching public key can verify it was signed and not altered; the ph is the pre-hashed variant eLabFTW uses) over an experiment record and an RFC 3161 trusted timestamp [10]. RFC 3161 is the IETF time-stamp protocol where a trusted Timestamping Authority returns a TimeStampToken over a hash of your document — so you can later prove the content existed, unchanged, at that instant, without ever sending the content to the TSA [11]. The ingest pattern is again REST-first, sketched in the repo's illustrative examples/ingest/elabftw_ingest.py:

# examples/ingest/elabftw_ingest.py — sign + timestamp the method record via the eLabFTW API
import elabapi_python

cfg = elabapi_python.Configuration()
cfg.api_key = {"api_key": ELAB_TOKEN}
cfg.host = "https://elabftw/api/v2"
api = elabapi_python.ExperimentsApi(elabapi_python.ApiClient(cfg))

# attach the AnIML/ASM result files to the experiment, then sign + timestamp it
api.post_experiment(body={"title": "HPLC titer — BATCH-2026-001-DS",
                          "category": "release-testing"})
# the signature (Ed25519ph) and RFC 3161 token are applied through the UI/API
# and lock the entry; later edits create a new, separately signed version.

Once signed and timestamped, the entry locks; any later change creates a new version with its own signature, so the history is append-only.

What "qualified instrument" actually costs: IQ/OQ/PQ and tech transfer

The verdict sentence this chapter is building toward — "measured on this qualified instrument, against this validated method" — hides two whole disciplines that the data model can only assume. The word qualified points at the GAMP 5 V-model and its IQ/OQ/PQ rungs (Installation, Operational, and Performance Qualification — documented proof that an instrument was installed to spec, operates to spec, and performs to spec on real samples), which Data Management in Biopharmaceutical Manufacturing builds in Validating computerized systems: GAMP 5 and CSA. For the HPLC and the ELISA reader behind these rows, that means: IQ confirms the chromatography data system is installed at its pinned version with the right detectors; OQ proves wavelength accuracy, injector precision, and the integration parameters hold in a test environment; and PQ proves the method — SOP-AT-HPLC-001 run on real drug substance — meets its accuracy and precision acceptance criteria in this lab. The instrument's own instrument_id column is the hook every qualification record hangs from.

Two adjacent disciplines complete the picture. First, the modern reading is risk-based: the FDA's shift from blanket CSV (Computerized System Validation) to CSA (Computer Software Assurance) says you spend scripted IQ/OQ/PQ effort where a wrong number reaches a patient — the SEC and HCP assays that decide release — and a lighter, unscripted check on a cosmetic report layout, exactly the discrimination the CSV-to-CSA chapter draws. A LIMS like SENAITE is itself a GAMP 5 Category 4 (configured product) system whose configuration must be qualified, not just installed. Second, a release method is rarely born in the lab that runs it: analytical method transfer moves a validated assay from development (or a sending site) to the QC lab, with a documented comparability study proving the receiving lab reproduces the method's accuracy and precision before a single release result it produces counts — the analytical mirror of the process tech transfer and scale-up that moves the molecule from the pilot suite to the plant. None of this lives in the three tables; all of it is the qualification evidence that makes a verified row mean what it claims, and it is the GxP last mile Part V assembles.

Why it matters

The lab is where a batch lives or dies. Every other chapter captures data about the process; this chapter captures the verdict on the product. That 128 ng/mg HCP result on BATCH-2026-004 is the difference between a released lot and a written-off batch — and if it could be silently edited, the whole quality system would be a fiction. So the controls here are not bureaucratic decoration. The preliminary → verified status, the second-person verification, the immutable result rows, the signed-and-timestamped method record: each one exists to make a single sentence defensible to an inspector — "this result was measured by this analyst, on this qualified instrument, against this validated method, and has not changed since." Get that sentence right and the batch record is trustworthy; get it wrong and nothing downstream matters.

In the real world

In a commercial QC lab the system of record is almost always a validated commercial LIMS — LabWare, STARLIMS, or Thermo SampleManager — wired to a chromatography data system like Empower or OpenLab, with the instruments integrated through vendor drivers or, increasingly, SiLA/LADS. The electronic-notebook side looks the same: where this book runs the open-source eLabFTW, biologics R&D and process-development groups overwhelmingly standardize on a cloud ELN-and-registry platform — most commonly Benchling, with Dotmatics the other name you will meet — for molecule registration, assay data, and experiment provenance. Our OSS stack does not pretend to replace those; it shows you the same shapes — sample login, verification workflow, vendor-neutral result files, signed method records — so the integration patterns transfer.

A few honest anchors for 2026:

LADS is genuinely new. OPC 30500 is from 2023; certified server implementations are still emerging in 2026, so on a real floor you will meet far more SiLA 2, plain OPC UA, and proprietary drivers than mature LADS servers. The standard is the right direction; it is not yet the default reality.

The honest OSS-vs-commercial verdict for this layer. Open source genuinely covers the mechanics: SENAITE runs a complete sample-login-to-verification workflow, eLabFTW signs and timestamps records, and AnIML/ASM give you durable, vendor-neutral data. But neither tool is 21 CFR Part 11 compliant out of the box, and the book is explicit about it. SENAITE's only published Part 11 gap analysis is from 2019 (against v1.3.2) and lists real, unclosed gaps — electronic-signature controls, retention, and password/access controls all needing configuration or hardening [9]; the repo ships that gap list under /compliance/gap-analyses and treats SENAITE as a teaching LIMS, not a compliant one. eLabFTW's own documentation says the same thing in plainer words: it provides the cryptographic primitives, but compliance depends on how you configure, validate, and operate it [10]. The strong audit-trail and review expectations on release data are not optional [12], and Part 11 sets the bar these systems must clear [13]. Pure OSS gets you the workflow and the data shapes — perhaps 80% of the way. The validated e-signature, the locked-down access control, the supplier accountability, and the formal IQ/OQ/PQ (Installation, Operational, and Performance Qualification — the documented proof that an instrument was installed, runs, and performs to spec) are the GxP (the umbrella for the Good-x-Practice rules — GMP, GLP, GCP — that regulators enforce) last mile, and we build that hybrid honestly in Part V.

Key terms

LIMS — Laboratory Information Management System; manages sample login, test assignment, results, and the verification workflow (here, SENAITE).
ELN — Electronic Lab Notebook; records methods, experiments, and reasoning, with signatures (here, eLabFTW).
At-line / offline assay — a sample pulled from the process and measured on a bench instrument (VCD, viability, metabolites); the offline twin of an in-line tag.
Release testing — the QC panel (SEC/CEX HPLC, HCP, host-cell DNA, endotoxin, bioburden) that decides whether a batch can be released.
Multi-attribute method (MAM) — a single LC-MS peptide-mapping run that monitors many CQAs (oxidation, deamidation, glycosylation, sequence variants) plus new-peak detection; its raw output is a mass-spectrum-over-retention-time array (an n-dimensional record), distinct from the single scalar a classic test writes to one lab.result row.
OOS — Out Of Specification; a result outside its validated limits, which must freeze the batch and trigger an investigation.
Certificate of analysis (CofA) — the set of release results, with specs and pass/fail, that accompanies a released lot.
OPC UA LADS — Laboratory and Analytical Device Standard (OPC 30500-1); a self-describing OPC UA information model for lab instruments, splitting each device into a Hardware view (nameplate, components, health) and a Functional view (functions, programs, results), built on the OPC UA DI base spec [1].
SiLA 2 — Standardization in Lab Automation v2; a gRPC/HTTP2 + Protocol Buffers standard for commanding and discovering lab devices. Capabilities are declared as Features in a Feature Definition Language (FDL); long-running work uses observable commands (a command-execution id plus a progress stream). Complements LADS — SiLA drives the instrument, LADS publishes it [7].
HILIC — hydrophilic-interaction liquid chromatography; the separation mode that resolves released glycans into a chromatogram (as in the in-line glycan-analysis testbed).
AnIML — Analytical Information Markup Language; an ASTM (E13.15) XML format, core schema still draft 0.90, with a SampleSet / ExperimentStepSet / Result→SeriesSet structure and built-in audit-trail and W3C XML-Signature sections; specialised per technique by ATDD definitions [4][5].
AFO — Allotrope Foundation Ontologies; the OWL/RDF dictionary fixing each term's meaning as a stable IRI (BFO-aligned, openly CC-BY) — the source of Allotrope's machine-actionability.
ADM — Allotrope Data Model; the grammar that constrains how AFO terms are assembled for one analytical technique, expressed as SHACL shapes. One ADM is rendered two ways — as ADF and as ASM — so the catalogue versions "ASM" and "ADM" separately [3].
Allotrope ASM — Allotrope Simple Model; the JSON rendering of an ADM, a measurement aggregate document whose leaves are value-and-unit pairs mapped to QUDT and tagged with AFO IRIs. Openly published (tri-licensed, one track CC-BY-NC); the JSON home for scalar results.
Allotrope ADF — Allotrope Data Format; the HDF5 binary rendering of an ADM, internally a Data Description (RDF graph) + an n-dimensional Data Cube (the array payload) + a Data Package, for dense spectra/chromatograms/curves. The ADF libraries and full model set are membership-gated.
SHACL / QUDT — the W3C Shapes Constraint Language an ADM uses to define and validate its structure, and the units ontology Allotrope (and SiLA, and OPC UA) resolve engineering units against so a g/L is convertible, not a string [15][16].
Verification (four-eyes) — the control where a second qualified person reviews a preliminary result before it becomes a verified, releasable one.
RFC 3161 timestamp — a trusted-timestamp token over a hash of a document, proving its content existed unchanged at a moment in time.
lab.result row (system of record) — the single verified, append-only row that carries a release result: identity (result_id), the batch bond (sample_id), the spec frame (test_id), the value-and-unit measurement, result_ts, analyst, instrument_id, and the preliminary → verified → rejected status. The lab's analog of an OPC UA DataValue.
Append-only re-test — because result_ts is part of UNIQUE (sample_id, test_id, result_ts), a repeat measurement is stored as a new row rather than overwriting the old one, so the audit trail can never be silently rewritten.
Charge variants (acidic / basic) — antibody forms shifted off the main CEX peak by post-translational modifications (deamidation/sialylation push acidic; C-terminal lysine pushes basic); a rising acidic shoulder is the polishing-chromatography step's report card.
Bioburden — the count of viable microorganisms in a non-sterile in-process pool; the high-frequency sentinel for aseptic technique and the sterilising/viral filters, usually upstream of any endotoxin or sterility failure.
Competency question / SHACL shape — a release rule re-expressed as data: a SPARQL question the graph must answer ("which lots are OOS on HCP and who signed them?"), and the same rule as a SHACL constraint (sh:maxInclusive 100, sh:minCount 1 on the signature) — the SQL spec/UNIQUE constraints written in RDF; see The release gate and SHACL.
PROV-O — the W3C provenance vocabulary whose wasGeneratedBy / wasAttributedTo / generatedAtTime map one-to-one onto the row's on-what (instrument_id), who (analyst), and when (result_ts), making the verified result a FAIR-shaped provenance entity.
Grouped / leave-one-batch-out CV — the cross-validation protocol that keeps every result from a lot on one side of the train/test split (grouping on sample_id → batch_id), so a model's score is not inflated by leaking batch identity; the honest validation a release-data model must use (see Models and validation).
Applicability domain — the input region a model was trained over; a prediction for a lot outside it is extrapolation and should be flagged (a Hotelling T²/SPE gate) — the model analog of the validated (spec_low, spec_high) acceptance window.
Process drift vs. model drift — the living culture wandering batch to batch (caught by SPC on the result stream) versus a static model decaying against it (caught by PSI on inputs and a residual chart against the verified offline reference); the verified lab.result stream is the ground truth both lean on (see MLOps and lifecycle).
IQ/OQ/PQ — Installation, Operational, and Performance Qualification; the GAMP 5 V-model evidence that an instrument was installed, operates, and performs to spec, plus the analytical method transfer that proves the QC lab reproduces a validated assay — the GxP last mile a verified row assumes (see GAMP 5 and CSA).

Where this leads

We have now captured the product's verdict — every at-line sample and every release result, born in the lab and bonded to its batch. But the molecule still has to become a finished, labelled vial, and the clean space around that fill must be watched as closely as the product itself. The next chapter, Fill-Finish, Packaging & Environmental Monitoring, leaves the QC lab for the fill line and the cleanroom, where high-cardinality telemetry — many fast-changing, fine-grained data streams: particle counts, fill weights, serialization events, and PackML line states (PackML being the ISA-88-based standard that gives packaging machines a common set of operating states) — meets a hard GxP boundary.

What this chapter covers​

Two kinds of lab data: at-line and release​

The lab data model: sample → test → result​

Anatomy of a verified result: dissecting one lab.result row​

The same row as a triple, a shape, and a provenance chain​

The verified row as a model input: leakage, grouping, and drift​

Getting data off the instruments​

OPC UA LADS: one self-describing model for every instrument​

SiLA 2: commanding the instrument​

Vendor-neutral result files: AnIML​

The Allotrope stack: AFO, ADM, ADF, ASM​

When the result is a curve, not a number​

The multi-attribute method (MAM)​

SENAITE: an open-source LIMS for the workflow​

The verification lifecycle: submit, verify, publish​

When release data goes wrong: the field record​

eLabFTW: the ELN for method and experiment provenance​

What "qualified instrument" actually costs: IQ/OQ/PQ and tech transfer​

Why it matters​

In the real world​

Key terms​

Where this leads​