Modeling Analytical Methods and Results: Allotrope and OBI
📍 Where we are: Part II · Discovery and Development, modeled — Chapter 8. The control strategy says which quality attributes must be checked. This chapter models the checking — methods, assays, and results — and draws the line between the number the graph keeps and the data it only points at.
Every CQA in the last chapter's control strategy is verified by a measurement, and a measurement is far more structured than the number it yields. The 98.611 monomer purity that has threaded this whole book is the tip of an iceberg: beneath it sits a method, a sample, an instrument, an analyst, a chromatogram of thousands of points, and a chain of reasoning from raw signal to reported result. This chapter models that iceberg with the ontologies built for it — and, just as importantly, decides which parts belong in the graph and which belong in a file the graph merely references.
A lab result is like a photograph with a caption. The caption — "monomer purity 98.611%, by size-exclusion chromatography, on this sample, this day" — is a compact fact you want in your searchable records. The photograph itself — the full high-resolution image, here a chromatogram of thousands of data points — is huge, and you keep it in an archive and link to it, not pasted into every index card. This chapter models the caption as facts a graph can query and the photograph as a file the graph points at, so the number is findable without drowning the graph in raw signal.
What this chapter covers
We model the method as a plan, the assay as an occurrent, and the result as a typed, vendor-neutral fact, reaching for OBI for the investigation and Allotrope (AFO) for the analytical meaning. We dissect one HPLC release result, draw the index-versus-payload boundary that keeps a scalar in the graph and a chromatogram in an ADF file, and close on the cost and the still-incomplete alignment of the three ontology worlds this chapter touches.
Method, assay, and result are three different kinds of thing
The same trio-of-categories discipline from the molecule chapter applies here, and it prevents the most common analytical-modeling error: treating "the test" as one node. A method — the validated procedure for size-exclusion chromatography — is a plan, a generically dependent continuant: information, copyable, the same SEC method whether run today or next year. An assay — the actual running of that method on a specific sample on a specific day — is an occurrent, a process, exactly the kind OBI was built to model [2]. And the result — 98.611 % monomer — is a specifically dependent continuant, a measured value that is about the sample and, through it, about the batch. Keep these three apart and the graph can say that one method was run as many assays producing many results, that two results came from the same method, or that a method was revised between two assays — none of which a single fused "test" node can express.
The sample deserves its own node too, because it is the bridge from the lab back to the process: a sample is a material entity derivedFrom the batch it was pulled from, so a result about the sample is, transitively, evidence about the batch. This is how an analytical number rejoins the genealogy: not by being stamped with a batch string, but by being about a sample that derives from a batch, a chain the transitive derivedFrom walks automatically.
Allotrope gives the result one meaning across vendors
The reason an analytical result needs more than QUDT is that its meaning is vendor-entangled. The same SEC purity can come off an Agilent or a Waters system, each exporting its own file format and its own field names — the heterogeneity the data book met at the instrument. Allotrope addresses this with a stack of ontologies and data formats — the Allotrope Foundation Ontologies (AFO) — that give laboratory results, instruments, samples, and methods one standardized meaning regardless of which vendor produced them [1]. Modeled against AFO, a monomerPct result is not just a number with a unit; it is a result of a known measurement type, produced by a typed device, on a typed sample, by a named method — the same fact whether the lab swaps instruments or the work moves to a contract lab. AFO is to the lab what the IOF biopharma ontologies are to the plant: the domain vocabulary that lets the knowledge graph merge lab and process data instead of adapting one to the other. In the dataset that meaning is one alignment file: the sample, the assay, its result, and its column each subclass a real, verified AFO or OBI term. The split is deliberate — Allotrope supplies the analytical-result meaning (af-p:AFP_0000843 size-exclusion chromatography, af-r:AFR_0000410 the chromatogram, af-e:AFE_0000354 the device), while OBI supplies the investigation frame (obo:OBI_0000070 assay, obo:OBI_0000747 specimen, obo:OBI_0000618 the column as an OBI artifact). The two are not redundant: a result is both an Allotrope analytical record and an OBI assay output, and a charge-variant ELISA reuses the same OBI frame (bp:HCPAssay → obo:OBI_0000661, ELISA) where AFO has no single verified IRI:
# align.ttl — the assay/result/column/sample typed up to Allotrope (AFO) and OBI (verified IRIs).
@prefix bp: <https://example.org/bioproc#> .
@prefix af-p: <http://purl.allotrope.org/ontologies/process#> .
@prefix af-r: <http://purl.allotrope.org/ontologies/result#> .
@prefix af-e: <http://purl.allotrope.org/ontologies/equipment#> .
@prefix obo: <http://purl.obolibrary.org/obo/> .
bp:Sample rdfs:subClassOf obo:OBI_0000747 . # OBI 'specimen' (material sample taken for study)
bp:Assay rdfs:subClassOf obo:OBI_0000070 . # OBI 'assay' (a planned process)
bp:SECAssay rdfs:subClassOf af-p:AFP_0000843 . # AFO 'size-exclusion chromatography'
bp:SECResult rdfs:subClassOf af-r:AFR_0000410 . # AFO 'size-exclusion chromatogram'
bp:SECColumn rdfs:subClassOf obo:OBI_0000618 , af-e:AFE_0000354 . # OBI 'size exclusion column' / AFO 'device'
bp:HCPAssay rdfs:subClassOf obo:OBI_0000661 . # OBI 'ELISA' (the HCP assay; verified via OLS4)
bp:ResidualDNAAssay rdfs:subClassOf obo:OBI_0000415 . # OBI 'PCR' (the residual-DNA qPCR; verified via OLS4)
With the types pinned, the release result becomes the structured iceberg the prose described — a method that is a plan; an assay that is an occurrent carrying its device, analyst, and date; a sample that traces back to the lot; and a scalar result holding its spec limit and verdict and pointing at its chromatogram rather than embedding it:
# instances.ttl — method (plan) / assay (occurrent) / sample / result, with the curve referenced.
bp:SEC-Method a bp:Method . # the validated SEC method — information
bp:SMP-DS-001 a bp:Sample ; bp:derivedFrom bp:DS-001 . # the sample, traceable to the lot
bp:SEC-Assay-001 a bp:SECAssay ; # one running of the method — an occurrent
bp:realizes bp:SEC-Method ;
bp:hasInput bp:SMP-DS-001 ;
bp:hasDevice bp:HPLC-07 ; # the typed instrument
bp:performedBy bp:Analyst-AB ; # the analyst (a prov:Agent)
bp:assayDate "2026-03-10"^^xsd:date ;
bp:hasResult bp:SEC-Result-001 .
bp:SEC-Result-001 a bp:SECResult ; # the result — about the sample
bp:isAbout bp:SMP-DS-001 ;
bp:monomerPct "98.611"^^xsd:float ;
bp:specLow 95.0 ; bp:verdict "PASS" ; # spec limit + verdict, in the graph
bp:hasChromatogram bp:ADF-SEC-001 . # the heavy curve, referenced not embedded
bp:ADF-SEC-001 a bp:AnalyticalDataFile ; # the Allotrope ADF payload node
rdfs:seeAlso <https://example.org/adf/SEC-Result-001.adf> .
This is not the only assay on that one sample. The same shape repeats for the rest of the release panel against the same bp:SMP-DS-001: a bp:CEX-Assay-001 yields bp:CEX-Result-001 carrying bp:cexMainPct "70.686", and a bp:HCP-Assay-001, typed up to OBI's ELISA, yields bp:HCP-Result-001 carrying bp:hcpPpm "12.0" — three assays, three results, one specimen, each result isAbout the sample and so transitively about the lot. That is the payoff of keeping method, assay, sample, and result as separate nodes: the panel is a set of results sharing a sample, not a row of columns sharing a string.
One release result, fully modeled: an OBI assay running an AFO-typed method on a sample that derives from the drug-substance lot, yielding a QUDT-typed value with its spec and verdict — and a pointer to the chromatogram file rather than the curve itself.
Original diagram by the authors, created with AI assistance.
The index-versus-payload boundary
Now the decision that keeps the graph usable. A scalar result — 98.611, a single typed number — maps cleanly into a triple, because one typed number is a fact the graph can reason over, constrain with SHACL, and compare to a spec. A chromatogram is not: it is a dense series of thousands of intensity-versus-time points, and a spectrum is a thousand intensity-versus-wavenumber points. Flattening either into subject-predicate-object triples would explode the graph and still lose the array's shape — the identical boundary the open-source chapter drew. So the heavy numeric payload lives where it belongs: in a vendor-neutral analytical container — Allotrope ADF (an HDF5 binary built around an n-dimensional data cube) or AnIML (the ASTM open XML format for analytical data) — and the graph holds a triple like result hasChromatogram <file://…/run.adf>, a pointer rather than the curve [1][3].
This is not a compromise; it is the right architecture. The graph is the index of analytical knowledge — every result findable, typed, spec-checked, and tied to its batch — and the analytical files are the warehouse of raw signal. An investigator queries the graph to find which results failed and what they were about, then follows the IRI to the chromatogram only when they need to re-examine the raw data. The graph stays small and queryable; the arrays stay in formats built to hold them. Drawing this line wrong — stuffing arrays into the graph, or leaving results as un-typed blobs outside it — is the single most common way an analytical graph becomes either unusable or useless.
The boundary that keeps an analytical graph honest: typed scalar results live in the graph as queryable facts; chromatograms and spectra live in ADF or AnIML files the graph points at by IRI — the index, not the warehouse.
Original diagram by the authors, created with AI assistance.
The unsolved part: three ontology worlds that do not yet fully align
The honest difficulty is that this chapter sits at the meeting point of three large ontology efforts that share a goal but not yet a seamless join. OBI models the investigation, AFO models the analytical result, and the IOF biopharma ontologies model the manufacturing process the sample came from — and while all are reconcilable in principle, there is no single turnkey mapping that fuses them. A result modeled in AFO and a batch modeled in IOF meet only through a crosswalk a team authors, the same OBO–IOF seam the discovery chapter named, now widened by a third party. Choosing how monomerPct in AFO relates to the bp:monomerPct the open-source loader attaches to a batch is a real modeling decision, not an import.
The second difficulty is cost. AFO is large and genuinely complex, and full adoption — emitting every result as a conformant ADF with complete AFO annotation — is a heavy lift that many labs have not made, which is why the data book noted FAIR-in-fact lags FAIR-in-claim. A plant can hold thousands of results as bare numbers with vendor metadata and call them standardized while none of them carry AFO meaning. The tooling exists and the standard is real; the discipline and the migration cost are the barrier. So the standard this chapter sets is sober: the architecture (index scalars, reference arrays, type everything) is clear and correct, while the adoption of the vendor-neutral semantics that make it pay off remains, in 2026, uneven and expensive.
Why it matters
Analytical results are the evidence that gates a batch, and a release decision is only as trustworthy as the results behind it are well-modeled. A result that carries its method, sample lineage, device, unit, and spec as typed facts can be checked mechanically at release — does every CQA have a conformant result about a sample that derives from this batch? — while a result that is a bare number in a vendor file requires a human to vouch for all of that context. The index-versus-payload discipline is what lets a plant hold millions of results without either drowning the graph or losing the raw data, and the vendor-neutral semantics are what let lab and process data finally merge. This chapter is where the data shadow's largest, richest stream becomes part of the queryable whole.
From the wire to the graph
Before any of this lands in the graph it has to come off the instrument, and the wire that carries it is its own standards story. The acquisition path — the analyzer streaming its measurement to the lab system — is increasingly modeled with SiLA 2 and OPC UA LADS, the device-integration standards that would hand off an SEC run as a structured payload rather than a vendor file dropped on a share. Both are real and being piloted in biopharma labs today; neither is yet the production default, so most plants still receive the result as an AnIML or ADF export. Either way, the payload that arrives needs to become the same typed bp:SEC-Result-001 the Turtle asserts.
Allotrope offers two ingest paths to that fact. The heavy one is a full ADF — an HDF5 data cube with complete AFO annotation. The lightweight one is ASM, the Allotrope Simple Model: a JSON-LD document whose @context maps each plain key onto the very same bp:, af-r:, and qudt: IRIs the Turtle uses. The companion examples/platform/ontology/asm-sec-result.jsonld is exactly that — cheaper to emit and parse, identical in meaning:
{
"@context": {
"bp": "https://example.org/bioproc#",
"af-r": "http://purl.allotrope.org/ontologies/result#",
"qudt": "http://qudt.org/schema/qudt/",
"xsd": "http://www.w3.org/2001/XMLSchema#",
"isAbout": { "@id": "bp:isAbout", "@type": "@id" },
"monomerPct": { "@id": "bp:monomerPct", "@type": "xsd:float" },
"specLow": { "@id": "bp:specLow", "@type": "xsd:float" },
"verdict": { "@id": "bp:verdict" },
"hasChromatogram": { "@id": "bp:hasChromatogram", "@type": "@id" }
},
"@id": "bp:SEC-Result-001",
"@type": [ "bp:SECResult", "af-r:AFR_0000410" ],
"isAbout": "bp:SMP-DS-001",
"monomerPct": "98.611",
"specLow": "95.0",
"verdict": "PASS",
"hasChromatogram": "bp:ADF-SEC-001"
}
The loader examples/platform/ontology/asm_to_rdf.py parses this one document into RDF and checks that the headline triples match the dataset — the same bp:SECResult typed up to af-r:AFR_0000410, the same monomerPct of 98.611, the same PASS verdict, the same hasChromatogram pointer to bp:ADF-SEC-001 that instances.ttl asserts in Turtle. The cheaper Allotrope ingest, the identical graph fact. Beyond this one SEC bridge, the companion datasets carry the campaign's batch titer as concrete container instances — a full Allotrope Simple Model document (examples/datasets/hplc_titer.asm.json, keyed on Allotrope's public manifest) and an ASTM AnIML file (examples/datasets/hplc_titer.animl.xml) — the lightweight-JSON and the open-XML forms of the vendor-neutral analytical containers this chapter names, captured in running code by the open-source analytical-lab chapter. Which of these standards a plant actually runs in production versus pilots is exactly the tiered reality Part VII maps in the ontologies and controlled vocabularies actually in use.
In the real world
Allotrope's ontologies and data formats are deployed in real laboratories, giving chromatography and spectroscopy results vendor-neutral meaning, and AnIML is an ASTM standard for analytical data interchange [1][3]; OBI underpins how investigations are described across biomedical research [2]. The open-source analytical-lab chapter shows the same pattern in running code — an HPLC purity result captured with its standardized meaning rather than as a loose column, and the raw file referenced rather than flattened. What remains the active frontier is making every result in a plant carry that meaning by default, and stitching the OBI, AFO, and IOF models into one graph without a hand-built adapter at every join — the work that turns a pile of conformant files into a single queryable body of analytical knowledge. Part VII surveys this ecosystem head-on: the Allotrope Foundation and its peer the Pistoia Alliance — the pre-competitive consortia that maintain these analytical-lab vocabularies — and the tiered reality of what is actually in production, where AFO and its lightweight JSON sibling ASM sit at the very top of the maturity list.
Key terms
- Method — the validated analytical procedure, modeled as a plan (information) distinct from any one running of it.
- Assay — one execution of a method on a specific sample, modeled as an occurrent (an OBI process) with its date, analyst, and device.
- Result — the measured value, modeled as a typed fact (
98.611,xsd:float, QUDTunit:PERCENT) that is about the sample. - Sample — the material pulled from a batch, modeled as
derivedFromthe batch, so a result about it is transitively evidence about the batch. - Allotrope Foundation Ontologies (AFO) — the ontologies giving analytical results, instruments, samples, and methods one vendor-neutral meaning.
- ADF / AnIML — vendor-neutral analytical data containers (Allotrope's HDF5-based Data Format; the ASTM XML format) that hold raw arrays the graph references by IRI.
- Index versus payload — the boundary that keeps scalar results in the graph as queryable facts and dense arrays (chromatograms, spectra) in referenced files, never flattened into triples.
Where this leads
We can model the methods and results that verify quality, and we have drawn the line between the number and the raw signal. Part II's final entity is the package of knowledge that ships from development to the plant. The next chapter, Modeling the Recipe and Tech Transfer: Portable Process Knowledge, models the recipe as an information artifact realized in production, the equipment it requires versus the equipment a site has, and the uncomfortable reality that a process modeled perfectly at one site must still survive the move to another.