The Running Example and the Proof Harness

📍 Where we are: Part I · Specification — the second move. The requirements are written. Now meet the one campaign the whole book models, the files that hold it, and the harness that proves the model answers its 23 competency questions.

The previous chapter wrote down what the ontology must answer. This one introduces what it answers about — a single monoclonal-antibody manufacturing campaign, the same batch the rest of the series follows — and the runnable proof harness that keeps the ORSD honest. Every Turtle, SPARQL, and SHACL snippet in this book is a true excerpt of the files introduced here, and every competency question in the catalog is a test those files pass.

The simple version

Instead of a dozen toy examples, this book follows one medicine all the way through, like a single patient's chart that every specialist annotates. One vial of frozen cells becomes a batch, becomes a purified substance, becomes filled vials — and at each step the chart gains a few facts. The "proof harness" is just a program that re-reads the whole chart and checks that all 23 questions in the brief still have answers. If you change one fact, the program tells you which questions you broke.

Requirements as tests: each competency question is bound in cq-catalog.json to the artifact that answers it, and validate.py runs all 23 as pass/fail acceptance tests. Original diagram by the authors, created with AI assistance.

One campaign, end to end

The running example is one CHO-cell monoclonal antibody, carried from the discovery target to the patient's cold-chain vial. Its genealogy is the spine the whole graph hangs from:

a working cell bank WCB-CHO-001 seeds a seed train SEED-001, which inoculates the production bioreactor batch BATCH-2026-001, whose harvest is clarified (CLAR-001) and captured on Protein A (PApool-001), cleared of virus (VIpool-001, VFpool-001), polished (POLpool-001), and concentrated into a drug-substance lot DS-001, filled into drug-product lots DP-001 and DP-002.

Each arrow becomes one bp:derivedFrom edge. Because that relation is transitive, DS-001 still traces back to WCB-CHO-001 in a single walk even though every intermediate sits between them — which is what makes CQ-01 and CQ-02 answerable. The campaign also carries a deliberately out-of-spec sibling: drug-product lot DP-004, derived from a separate substance lot DS-004, but sharing the cell bank WCB-CHO-001. That OOS lot is what lets the model answer the question an investigator actually asks — what else shares this lot's lineage? (CQ-04) — and what lets the release gate demonstrate it fails on exactly one path (CQ-11).

Throughout, we use the illustrative namespace bp: for https://example.org/bioproc#, the same one the open-source companion book uses, so a class here is the same class there.

The dataset: six files that load as one

The campaign lives in examples/platform/ontology/ as a small set of files that parse into a single RDF graph:

file	what it holds
`bioproc.ttl`	the local `bp:` vocabulary — classes and relations across the BFO spine (Material / Equipment / Quality / RealizableEntity / InformationArtifact / Process) with their load-bearing OWL axioms
`align.ttl`	the alignment up to public ontologies — BFO, IOF Core and biopharma, OBO biology, Allotrope, QUDT, PROV, SOSA (the subject of Part II)
`instances.ttl`	the running example as individuals — every node in the genealogy above, plus the design space, analytics, release panel, serialization, distribution, and provenance
`shapes.ttl`	the SHACL gates — the release specification, the finish gate, the cell-bank gate, and the disjointness guards
`cq-catalog.json`	the executable ORSD — the 23 competency questions, each bound to the artifact that answers it
`queries/CQ-*.rq`	one SPARQL file per query-backed competency question

The Turtle, SPARQL, and JSON are language-neutral, so the Korean edition quotes the very same files — there is no translated copy of the model, only of the prose around it.

The harness: `validate.py`

validate.py is the inspector from the previous chapter's analogy. It does three things, in order:

Parse the three Turtle files into one graph.
Reason over it with an OWL-RL closure [1], which derives the entailed facts — most importantly the long-range transitive derivedFrom edges and the inference that equipment is a BFO material entity (the structural competency questions, CQ-22).
Run the competency questions from cq-catalog.json — each query-backed one as a SPARQL evaluation [2], each gate one as a SHACL validation [3], each structural one as a reasoner check — and print a pass/fail line per question.

It exits zero only if all 23 pass. Here is its real output — the green baseline this entire book is engineered to keep:

[1] parsed 2120 triples (bioproc + align + instances)
[2] reasoned: 2120 -> 7137 triples after OWL-RL closure
[3] competency questions (ORSD v1.0.0 acceptance tests):

      CQ     GROUP           RESULT DETAIL
      CQ-01  lineage         PASS   11 row(s)
      CQ-02  impact          PASS   descendant superset of {DP-001, DP-002, DP-004, DS-001} (26 total)
      CQ-03  lineage         PASS   row {batch: BATCH-2026-001, monomer: 98.611} present
      CQ-04  impact          PASS   affected = [DP-001, DP-002]
      CQ-05  trajectory      PASS   material superset of {PApool-001, POLpool-001} (2 total)
      CQ-06  qbd             PASS   parameter superset of {FeedRate, Temperature} (2 total)
      CQ-07  qbd             PASS   row {parameter: FeedRate, lot: DS-001} present
      CQ-08  release         PASS   DS-001 release panel complete and in spec
      CQ-09  release         PASS   ASK = True
      CQ-10  release         PASS   DP-001/DP-002 pass release + finish gates
      CQ-11  release         PASS   OOS [DP-004, DS-004] on path [hmwPct]
      CQ-12  viral           PASS   sum(lrv) = 8.7 over 2 step(s)
      CQ-13  packaging       PASS   package = [CARTON-001, CASE-001, PALLET-001]
      CQ-14  packaging       PASS   ASK = False
      CQ-15  provenance      PASS   claim = [claim-batch-001, claim-vessel-001]
      CQ-16  provenance      PASS   ASK = True
      CQ-17  characterization PASS  WCB-CHO-001 conforms to the cell-bank gate
      CQ-18  characterization PASS  ASK = True
      CQ-19  units           PASS   0 row(s)
      CQ-20  units           PASS   row {host: CHO-host, taxon: NCBITaxon_10029} present
      CQ-21  structural      PASS   row {run: CCP-001, vessel: BR-101, vesselType: ProductionBioreactor} present
      CQ-22  structural      PASS   transitive lineage + equipment-is-material inferred
      CQ-23  structural      PASS   Batch-as-process and Batch-as-bioreactor both caught

      23/23 competency questions PASS

ALL CHECKS PASSED

Read a few lines and the design becomes concrete. CQ-01 walks 11 ancestors from DS-001. CQ-04 returns exactly the two siblings that share the cell bank with the failed lot. CQ-11 shows the release gate failing on only hmwPct for only the two -004 lots — every other panel value in spec — which is what a realistic out-of-spec event looks like. CQ-14 is an ASK that is correctly False: nothing a vial is packed inside is also something it was made from, so containment and genealogy stay separate. The DETAIL column is the evidence; the RESULT column is the contract.

Truth as a build rule

The non-functional requirement that "every snippet is a true excerpt" is not a stylistic promise — it is enforced. Because the snippets are excerpts of bioproc.ttl, instances.ttl, and shapes.ttl, and because validate.py runs the same files, a snippet that drifts from the dataset either breaks a competency question (and fails the build) or is caught by review against the live output. The numbers in this book — 98.611 % monomer, an 8.7 total LRV, 11 ancestors, 2120 triples closing to 7137 — are not illustrative round numbers; they are what the harness prints. When a later chapter shows you a Turtle block, you are reading the model, not a sketch of it.

The unsolved part: green is necessary, not sufficient

A passing harness proves the model is consistent and complete against its own questions. It does not prove the model is true. validate.py confirms WCB-CHO-001 is fully characterized and within its passage limit (CQ-17, CQ-18); it cannot confirm the vial in the freezer is actually WCB-CHO-001 and not a mislabeled neighbor. It confirms every quantity carries a unit (CQ-19); it cannot confirm the analyst entered the right number. The harness closes the loop between requirements and model; the loop between model and reality is closed by wet-lab characterization, data integrity, and human judgment — the limits the verdict returns to. Keep the distinction in view as the green table reappears, chapter after chapter: it certifies that the model does what it promised, which is a smaller and more honest claim than that the model is right.

Why it matters

A running example plus an executable harness is what makes the rest of this book trustworthy rather than merely plausible. Every modeling decision ahead can be checked the same way: change the model, run the harness, read 23 lines. If a class earns its place, a competency question depends on it and stays green; if it does not, nothing breaks when you remove it — which is the cleanest possible test of whether a class belongs. The medicine gives the book its concreteness; the harness gives it its rigor.

In the real world

A single, fully worked example backed by a runnable validator is how serious ontologies are actually shipped and regression-tested — the pattern SAMOD formalizes and the one large vocabulary projects use to keep a model from rotting as it grows [2][3]. On real platforms the same derivedFrom walk is a Palantir Foundry object link or a Neo4j Cypher traversal, and the same SHACL gate runs in a triplestore's validation step; the dataset here is small enough to read in an afternoon and complete enough to exercise every question a production graph must answer, which is exactly what a teaching artifact should be.

Key terms

Running example — the one CHO mAb campaign (WCB-CHO-001 → … → DP-001, with OOS sibling DP-004) modeled end to end, so every chapter builds on the same concrete graph.
Proof harness (validate.py) — the program that parses, reasons over, and runs the 23 competency questions against the dataset, printing a pass/fail table and gating on it.
OWL-RL closure — the rule-based reasoning step that derives entailed facts (transitive lineage, equipment typing) before the queries run.
True excerpt — a code snippet that is a verbatim part of the loadable dataset, not a simplified illustration; the book's standing rule.
Green baseline — the all-pass state of the harness (23/23) that every later chapter must preserve.

Where this leads

The requirements are written and the example loads green. Now the lifecycle proper begins — and its first rule is do not build what you can borrow. The next chapter, The Upper Spine: Continuants, Occurrents, and Why Everyone Builds on BFO, opens Part II (Reuse) at the very top of the model: the small, domain-neutral set of categories every later class hangs from, and the first and most consequential reuse decision an ontology makes.

One campaign, end to end​

The dataset: six files that load as one​

The harness: validate.py​

Truth as a build rule​

The unsolved part: green is necessary, not sufficient​

Why it matters​

In the real world​

Key terms​

Where this leads​