The Running Example and the Proof Harness
📍 Where we are: Part I · Specification — the second move. The requirements are written. Now meet the one campaign the whole book models, the files that hold it, and the harness that proves the model answers its 23 competency questions.
The previous chapter wrote down what the ontology must answer. This one introduces what it answers about — a single monoclonal-antibody manufacturing campaign, the same batch the rest of the series follows — and the runnable proof harness that keeps the ORSD honest. Every Turtle, SPARQL, and SHACL snippet in this book is a true excerpt of the files introduced here, and every competency question in the catalog is a test those files pass.
Instead of a dozen toy examples, this book follows one medicine all the way through, like a single patient's chart that every specialist annotates. One vial of frozen cells becomes a batch, becomes a purified substance, becomes filled vials — and at each step the chart gains a few facts. The "proof harness" is just a program that re-reads the whole chart and checks that all 23 questions in the brief still have answers. If you change one fact, the program tells you which questions you broke.
Requirements as tests: each competency question is bound in
cq-catalog.json to the artifact that answers it, and validate.py runs all 23 as pass/fail acceptance tests.
Original diagram by the authors, created with AI assistance.
One campaign, end to end
The running example is one CHO-cell monoclonal antibody, carried from the discovery target to the patient's cold-chain vial. Its genealogy is the spine the whole graph hangs from:
a working cell bank
WCB-CHO-001seeds a seed trainSEED-001, which inoculates the production bioreactor batchBATCH-2026-001, whose harvest is clarified (CLAR-001) and captured on Protein A (PApool-001), cleared of virus (VIpool-001,VFpool-001), polished (POLpool-001), and concentrated into a drug-substance lotDS-001, filled into drug-product lotsDP-001andDP-002.
Each arrow becomes one bp:derivedFrom edge. Because that relation is transitive, DS-001 still traces back to WCB-CHO-001 in a single walk even though every intermediate sits between them — which is what makes CQ-01 and CQ-02 answerable. The campaign also carries a deliberately out-of-spec sibling: drug-product lot DP-004, derived from a separate substance lot DS-004, but sharing the cell bank WCB-CHO-001. That OOS lot is what lets the model answer the question an investigator actually asks — what else shares this lot's lineage? (CQ-04) — and what lets the release gate demonstrate it fails on exactly one path (CQ-11).
Throughout, we use the illustrative namespace bp: for https://example.org/bioproc#, the same one the open-source companion book uses, so a class here is the same class there.
The dataset: six files that load as one
The campaign lives in examples/platform/ontology/ as a small set of files that parse into a single RDF graph:
| file | what it holds |
|---|---|
bioproc.ttl | the local bp: vocabulary — classes and relations across the BFO spine (Material / Equipment / Quality / RealizableEntity / InformationArtifact / Process) with their load-bearing OWL axioms |
align.ttl | the alignment up to public ontologies — BFO, IOF Core and biopharma, OBO biology, Allotrope, QUDT, PROV, SOSA (the subject of Part II) |
instances.ttl | the running example as individuals — every node in the genealogy above, plus the design space, analytics, release panel, serialization, distribution, and provenance |
shapes.ttl | the SHACL gates — the release specification, the finish gate, the cell-bank gate, and the disjointness guards |
cq-catalog.json | the executable ORSD — the 23 competency questions, each bound to the artifact that answers it |
queries/CQ-*.rq | one SPARQL file per query-backed competency question |
The Turtle, SPARQL, and JSON are language-neutral, so the Korean edition quotes the very same files — there is no translated copy of the model, only of the prose around it.
The harness: validate.py
validate.py is the inspector from the previous chapter's analogy. It does three things, in order:
- Parse the three Turtle files into one graph.
- Reason over it with an OWL-RL closure [1], which derives the entailed facts — most importantly the long-range transitive
derivedFromedges and the inference that equipment is a BFO material entity (the structural competency questions, CQ-22). - Run the competency questions from
cq-catalog.json— each query-backed one as a SPARQL evaluation [2], each gate one as a SHACL validation [3], each structural one as a reasoner check — and print a pass/fail line per question.
It exits zero only if all 23 pass. Here is its real output — the green baseline this entire book is engineered to keep:
[1] parsed 2120 triples (bioproc + align + instances)
[2] reasoned: 2120 -> 7137 triples after OWL-RL closure
[3] competency questions (ORSD v1.0.0 acceptance tests):
CQ GROUP RESULT DETAIL
CQ-01 lineage PASS 11 row(s)
CQ-02 impact PASS descendant superset of {DP-001, DP-002, DP-004, DS-001} (26 total)
CQ-03 lineage PASS row {batch: BATCH-2026-001, monomer: 98.611} present
CQ-04 impact PASS affected = [DP-001, DP-002]
CQ-05 trajectory PASS material superset of {PApool-001, POLpool-001} (2 total)
CQ-06 qbd PASS parameter superset of {FeedRate, Temperature} (2 total)
CQ-07 qbd PASS row {parameter: FeedRate, lot: DS-001} present
CQ-08 release PASS DS-001 release panel complete and in spec
CQ-09 release PASS ASK = True
CQ-10 release PASS DP-001/DP-002 pass release + finish gates
CQ-11 release PASS OOS [DP-004, DS-004] on path [hmwPct]
CQ-12 viral PASS sum(lrv) = 8.7 over 2 step(s)
CQ-13 packaging PASS package = [CARTON-001, CASE-001, PALLET-001]
CQ-14 packaging PASS ASK = False
CQ-15 provenance PASS claim = [claim-batch-001, claim-vessel-001]
CQ-16 provenance PASS ASK = True
CQ-17 characterization PASS WCB-CHO-001 conforms to the cell-bank gate
CQ-18 characterization PASS ASK = True
CQ-19 units PASS 0 row(s)
CQ-20 units PASS row {host: CHO-host, taxon: NCBITaxon_10029} present
CQ-21 structural PASS row {run: CCP-001, vessel: BR-101, vesselType: ProductionBioreactor} present
CQ-22 structural PASS transitive lineage + equipment-is-material inferred
CQ-23 structural PASS Batch-as-process and Batch-as-bioreactor both caught
23/23 competency questions PASS
ALL CHECKS PASSED
Read a few lines and the design becomes concrete. CQ-01 walks 11 ancestors from DS-001. CQ-04 returns exactly the two siblings that share the cell bank with the failed lot. CQ-11 shows the release gate failing on only hmwPct for only the two -004 lots — every other panel value in spec — which is what a realistic out-of-spec event looks like. CQ-14 is an ASK that is correctly False: nothing a vial is packed inside is also something it was made from, so containment and genealogy stay separate. The DETAIL column is the evidence; the RESULT column is the contract.
Truth as a build rule
The non-functional requirement that "every snippet is a true excerpt" is not a stylistic promise — it is enforced. Because the snippets are excerpts of bioproc.ttl, instances.ttl, and shapes.ttl, and because validate.py runs the same files, a snippet that drifts from the dataset either breaks a competency question (and fails the build) or is caught by review against the live output. The numbers in this book — 98.611 % monomer, an 8.7 total LRV, 11 ancestors, 2120 triples closing to 7137 — are not illustrative round numbers; they are what the harness prints. When a later chapter shows you a Turtle block, you are reading the model, not a sketch of it.
The unsolved part: green is necessary, not sufficient
A passing harness proves the model is consistent and complete against its own questions. It does not prove the model is true. validate.py confirms WCB-CHO-001 is fully characterized and within its passage limit (CQ-17, CQ-18); it cannot confirm the vial in the freezer is actually WCB-CHO-001 and not a mislabeled neighbor. It confirms every quantity carries a unit (CQ-19); it cannot confirm the analyst entered the right number. The harness closes the loop between requirements and model; the loop between model and reality is closed by wet-lab characterization, data integrity, and human judgment — the limits the verdict returns to. Keep the distinction in view as the green table reappears, chapter after chapter: it certifies that the model does what it promised, which is a smaller and more honest claim than that the model is right.
Why it matters
A running example plus an executable harness is what makes the rest of this book trustworthy rather than merely plausible. Every modeling decision ahead can be checked the same way: change the model, run the harness, read 23 lines. If a class earns its place, a competency question depends on it and stays green; if it does not, nothing breaks when you remove it — which is the cleanest possible test of whether a class belongs. The medicine gives the book its concreteness; the harness gives it its rigor.
In the real world
A single, fully worked example backed by a runnable validator is how serious ontologies are actually shipped and regression-tested — the pattern SAMOD formalizes and the one large vocabulary projects use to keep a model from rotting as it grows [2][3]. On real platforms the same derivedFrom walk is a Palantir Foundry object link or a Neo4j Cypher traversal, and the same SHACL gate runs in a triplestore's validation step; the dataset here is small enough to read in an afternoon and complete enough to exercise every question a production graph must answer, which is exactly what a teaching artifact should be.
Key terms
- Running example — the one CHO mAb campaign (
WCB-CHO-001→ … →DP-001, with OOS siblingDP-004) modeled end to end, so every chapter builds on the same concrete graph. - Proof harness (
validate.py) — the program that parses, reasons over, and runs the 23 competency questions against the dataset, printing a pass/fail table and gating on it. - OWL-RL closure — the rule-based reasoning step that derives entailed facts (transitive lineage, equipment typing) before the queries run.
- True excerpt — a code snippet that is a verbatim part of the loadable dataset, not a simplified illustration; the book's standing rule.
- Green baseline — the all-pass state of the harness (23/23) that every later chapter must preserve.
Where this leads
The requirements are written and the example loads green. Now the lifecycle proper begins — and its first rule is do not build what you can borrow. The next chapter, The Upper Spine: Continuants, Occurrents, and Why Everyone Builds on BFO, opens Part II (Reuse) at the very top of the model: the small, domain-neutral set of categories every later class hangs from, and the first and most consequential reuse decision an ontology makes.