Skip to main content

Validation: Competency Questions as Executable Queries

📍 Where we are: Part VI · Validation — the phase where the model is judged, not built. The governing methodology is SAMOD's test-first loop (write the question, model the slice, test it) closing over the whole ontology, and NeOn's principle that an ontology is validated by answering its requirements. This chapter is where the executable ORSD stops being a promise and becomes a run.

For five parts we have laid edges through a single mAb campaign: a working cell bank derived from its master and research banks, a seed train, a bioreactor batch, a capture pool, two orthogonal viral-clearance pools, a polished pool, the drug substance, and the drug-product lots filled from it. Each chapter trusted that those edges could answer a real manufacturing question — trace this vial to its cell bank, scope a failed lot's fate, prove the viral barriers sum. Validation is the moment we stop trusting and start running. The specification chapter fixed 23 competency questions in cq-catalog.json and called them an executable requirements brief; the running example showed all 23 pass as a single green table. This chapter opens that table and reads the queries underneath it — the actual SPARQL that turns each manufacturing question into a tested fact over a loaded, reasoned graph.

The simple version

A batch record promises that you can trace a vial back to the exact cell bank it came from. The quality reviewer does not re-read the promise — she runs the trace. A competency question is the promise the batch record makes ("can you tell me everything this vial was made from?"); the SPARQL query is running the trace; and the expected result in the catalog is the answer the record must produce — here, eleven ancestors ending at the cell bank. This chapter is the walkthrough where the reviewer runs every trace the campaign promises and writes PASS or FAIL next to each — and where one trace is supposed to come back empty, because keeping what-it-is-packed-in separate from what-it-was-made-from is itself a requirement.

Start from the questions

This chapter is the home of the question-as-query, so it carries the largest share of the ORSD catalog — and every question is a real thing an investigator, a release reviewer, or a recall team asks about this antibody. The lineage questions — what does this lot derive from, to any depth? (CQ-01) and what is the originating bioreactor batch and its monomer value? (CQ-03) — are the trace a deviation investigation starts with. The impact questions — what across the whole campaign descends from the working cell bank? (CQ-02) and which sibling lots share a failed lot's fate? (CQ-04) — are the question a recall team fires first. The trajectory question (CQ-05) follows a critical quality attribute down the purification chain; the QbD questions (CQ-06, CQ-07) cross from a feed-rate parameter set in development to the monomer purity of a released lot; the viral question (CQ-12) sums two orthogonal clearance barriers; the packaging pair (CQ-13, CQ-14) keeps what a vial is packed inside separate from what it was made from; provenance (CQ-15, CQ-16), characterization (CQ-18), and the units/structural questions (CQ-19, CQ-20, CQ-21) round out the set. Each is judged not by prose but by whether its query returns exactly what the catalog says it must.

Validate: the lineage walk answers CQ-01

The signature query is the lineage walk — reconstruct everything a lot derives from, however many hops away. This is the question a manufacturing record exists to answer: a cell bank is the single root every batch in the campaign descends from, and a misidentified or contaminated bank propagates with full confidence to every downstream material, so being able to walk back to it from any lot is not a convenience but a regulatory and safety necessity. Because derivedFrom is transitive, the walk is a single SPARQL property path: (bp:derivedFrom)+ follows one-or-more hops, so the same line works whether a lineage is three steps or twenty [1]. queries/CQ-01.rq walks back from the drug substance and binds the type of each ancestor:

# queries/CQ-01.rq — Lineage walk: everything DS-001 derives from, to any depth, in one property path.
# (bp:derivedFrom)+ = one-or-more hops; works whether the lineage is 3 steps or 20.
PREFIX bp: <https://example.org/bioproc#>
SELECT ?ancestor ?type WHERE {
bp:DS-001 (bp:derivedFrom)+ ?ancestor .
?ancestor a ?type .
FILTER(STRSTARTS(STR(?type), STR(bp:)))
} ORDER BY ?ancestor

The catalog's check is { "type": "row_count", "value": 11 }, and over the reasoned graph the query returns exactly eleven ancestors — every unit-operation intermediate the coarse CSV chain collapses. validate.py reports the pass as a single row of its acceptance table:

CQ-01 lineage PASS 11 row(s)

Those eleven ancestors are the eleven bound ?ancestor rows the SELECT walks back through (BATCH-2026-001, CLAR-001, MCB-CHO-001, PApool-001, POLpool-001, RCB-CHO-001, SEED-001, SEEDFLASK-001, VFpool-001, VIpool-001, WCB-CHO-001). The depth retraces the whole physical process in reverse: DS-001 traces back through the polishing pool (POLpool-001), the two orthogonal viral-clearance pools (VFpool-001, VIpool-001), the protein-A capture pool and clarified harvest (PApool-001, CLAR-001), the bioreactor batch, the seed train (SEED-001, SEEDFLASK-001), and the three cell-bank tiers — working, master, and research bank (WCB-CHO-001, MCB-CHO-001, RCB-CHO-001). Those three tiers are not an arbitrary hierarchy: the research bank seeds the master bank, the master bank seeds the working bank, and only the working bank is expanded into production, so that the genetically characterized origin of the antibody is preserved across the years a commercial product is made. The long-range reachability from DS-001 to RCB-CHO-001 is inferred by the transitive closure, never asserted by hand. That inference is exactly what CQ-01's PASS certifies: lay the edges faithfully — one per real material transformation — and the lineage is computed, not reconstructed by spreadsheet archaeology when an investigation needs it under deadline.

Validate: impact analysis answers CQ-02 and CQ-04

Impact analysis is the inverse, outward query a recall investigation actually fires. CQ-02 walks (bp:derivedFrom)+ inward to the root and returns the whole campaign that descends from WCB-CHO-001 — every batch, pool, substance, and product lot that traces to that one bank, which is the set a cell-bank-level concern (a misidentification flag, a sterility question) puts at risk in a single step. CQ-04 is the harder one. When DP-004 fails release, the question is what else is affected? — and the graph answers by walking up DP-004's lineage to a shared ancestor, then back down to every other drug product that traces to it:

# queries/CQ-04.rq — Impact analysis: when DP-004 fails, what shares its fate?
PREFIX bp: <https://example.org/bioproc#>
SELECT DISTINCT ?affected WHERE {
bp:DP-004 (bp:derivedFrom)+ ?shared . # an ancestor of the failed lot
?affected (bp:derivedFrom)+ ?shared . # anything else derived from it
?affected a bp:DrugProduct .
FILTER(?affected != bp:DP-004)
} ORDER BY ?affected

The failure that drives CQ-04 is concrete: DP-004 is the sibling lineage where polishing did not clear enough high-molecular-weight aggregate, so its hmwPct lands at 2.41 % against a 2.0 % release ceiling while its monomer purity stays in spec — the realistic out-of-spec mode that makes "what else is affected?" an urgent question rather than a hypothetical. The catalog demands { "type": "equals", "var": "affected", "values": ["DP-001", "DP-002"] }, and validate.py returns precisely those two siblings — the released lots filled from the same drug substance, sharing DP-004's WCB-CHO-001 cell bank. The equals check prints the matched set in the acceptance table:

CQ-04 impact PASS affected = [DP-001, DP-002]

Because every batch in the campaign traces to that one cell bank, a contamination concern is answerable across the whole campaign in a single traversal — the difference between a recall scoped by query and a blind campaign-wide quarantine. The forward fork the model carries (bp:DS-001 bp:fillsInto bp:DP-001 , bp:DP-002) is what makes the affected set complete by construction rather than by luck: it is the same shared-fate fork in the forward direction, mirroring the backward pooling forks the downstream chapters laid, so a single drug substance fanning out into multiple filled lots is fully traversable. CQ-04's equals check — not merely contains — is what holds the model to returning every sibling and no spurious one, which is exactly the standard a recall scope has to meet to be defensible. The same walk would also surface siblings sharing a capture resin or a filter if the failure traced to a consumable rather than the bank; here the failed path runs through the shared cell bank, so the bank is the ancestor the walk lands on.

Validate: the cross-lifecycle walk answers CQ-07

The third class crosses the seams between development, manufacturing, and release — the question only this book's modeling makes askable. It is the Quality-by-Design promise made queryable: in development a design-of-experiments study finds that feed rate moves monomer purity, and that finding is recorded as an authored affectsQuality edge from a process parameter to a quality attribute rather than buried in a study report's prose. CQ-07 starts at that critical process parameter, follows its development-era affectsQuality link, finds the run that realized that parameter as a setpoint on the production bioreactor, steps to the batch the run output, and walks derivedFrom forward to the released drug-substance lot carrying the monomer result:

# queries/CQ-07.rq — From a CPP, through affectsQuality, to the run that REALIZED it,
# to the batch it output, forward down derivedFrom to the released DS lot.
PREFIX bp: <https://example.org/bioproc#>
SELECT ?parameter ?attribute ?lot WHERE {
?parameter bp:affectsQuality ?attribute .
?setting bp:parameterType ?parameter .
?phase bp:realizesParameter ?setting .
?run bp:hasPhase ?phase ;
bp:hasOutput ?batch .
?lot bp:derivedFrom+ ?batch ;
bp:monomerPct ?m ;
a bp:DrugSubstance .
}

The catalog's check is { "type": "row_match", "match": { "parameter": "FeedRate", "lot": "DS-001" } } — and the query lands the feed-rate CPP on the released lot it ultimately shaped, joining three systems a fragmented plant keeps in three dialects: a development study, a plant historian, and a QC release record. The row_match check confirms the required row is present (the bound row carries ?attribute = MonomerPct-CQA, the monomer attribute the feed rate affects):

CQ-07 qbd PASS row {parameter: FeedRate, lot: DS-001} present

That is the single query the whole book was building toward [2]: a development-era affectsQuality assertion, the realized feed-rate setting on this run, and the QC verdict on DS-001, joined in one statement. Without the shared model these three live in three filing systems and the question "did the parameter we tuned in development actually land the purity we promised at release?" is a manual cross-reference; with it, the QbD knowledge is a connected, queryable structure.

Hero diagram of three competency questions running as queries over the assembled graph: the campaign spine from WCB-CHO-001 through SEED-001, BATCH-2026-001, PApool-001 and DS-001 forking to the released lots DP-001 and DP-002 and the out-of-spec sibling DP-004; a lineage-walk arrow traces a property path back from DS-001 to the cell bank as CQ-01 with its eleven-ancestor result; an impact overlay highlights DP-004 failing, walks up to the shared WCB-CHO-001 cell bank and back down to both siblings as CQ-04; and a cross-lifecycle arrow links a feed-rate parameter through affectsQuality and a realized run to the monomer release result on DS-001 as CQ-07 — three one-line questions answered over one graph. Three competency questions, one graph: a property path walks lineage back to the cell bank (CQ-01), an impact query scopes a failure across siblings via the shared cell bank (CQ-04), and a cross-lifecycle link ties a process parameter to a released attribute (CQ-07) — each run by validate.py and checked against the catalog. Original diagram by the authors, created with AI assistance.

Validate: the quality trajectory answers CQ-05

CQ-05 asks where did monomer purity get cleaned up? — and the honest answer is a trajectory, not a single point. The high-molecular-weight (HMW) aggregate fraction matters because aggregates are an immunogenicity risk; the release gate caps them, and the whole downstream train works to drive them down. But no single step "owns" the final number: capture raises purity, viral safety protects it, and polishing refines it — purity is a cumulative property of the whole purification sequence, almost emergent, never the act of one operation [3]. Polishing earns its place by separating the antibody from closely-related product variants — the aggregate bp:AGG-1 and the acidic charge variant bp:CHV-1, both modeled as real entities derivedFrom the viral-filtered pool and routedTo the polishing waste stream, so the graph records what was removed and to where, not just that the number improved. The query collects every in-process result carrying an HMW value:

# queries/CQ-05.rq — Quality-attribute trajectory: the in-process HMW aggregate value at
# each step along the purification chain, so an analyst can see WHERE a quality changed.
PREFIX bp: <https://example.org/bioproc#>
SELECT ?material ?hmw WHERE {
?result a bp:InProcessResult ;
bp:isAbout ?material ;
bp:hmwPct ?hmw .
} ORDER BY DESC(?hmw)

The catalog requires the result to contain PApool-001 and POLpool-001, and the query returns the two-point chain — HMW falling from 4.1 % after capture to 1.4 % after polishing, the loadable, ordered history of one quality attribute. This is derivedFrom read not just as lineage but as a chain of custody for quality: each edge carries the material and the evolving state of its attributes, so the graph answers not only "where did this come from?" but "how did its purity get here, step by step?" — which turns a genealogy into a process-understanding record an analyst can replay against the design space. CQ-05's PASS certifies that history is queryable rather than scattered across per-step test reports filed in different systems. What CQ-05 deliberately does not claim is causation: the trajectory shows where the value changed, but because the steps are not independent — what polishing can achieve depends on what capture left it — the graph cannot apportion credit among them. That apportionment is a causal-inference question the design-space models answer, not the lineage; the chapter's discipline is to record the trajectory and refuse to assert a single "purity-determining step."

Validate: orthogonal viral clearance answers CQ-12

Viral safety is where the model's honesty is tested, because the quantity that matters is not a product property at all but a measure of risk reduction — assurance about a contaminant you hope was never there. You cannot measure "zero virus" in a batch; instead the process is built from orthogonal barriers that clear virus by different mechanisms, so a virus surviving one is unlikely to survive the other. The running example carries the two standard barriers: a low-pH hold (VI-001) that inactivates enveloped viruses, and a virus-retentive nanofiltration step (VF-001) that removes virus particles by size. Because the mechanisms are independent, their log reduction values (LRVs) are independent too, and total clearance is their sum — a sum that is only defensible because the steps are orthogonal [4]. The model keeps a sharp distinction the safety case depends on: each step's LRV (4.5, 4.2) is a validated capability, proven once in dedicated clearance studies and hung off the step via bp:hasClearanceCapability, while the per-batch conditions those studies relied on — the pH 3.6 and 60-minute hold actually achieved this run — are measured qualities inhering in the cleared material. CQ-12 walks each step's validated capability and reads its LRV and mechanism:

# queries/CQ-12.rq — Viral safety: each clearance step's VALIDATED log reduction value
# (a capability, not a per-batch measurement) and its mechanism.
PREFIX bp: <https://example.org/bioproc#>
SELECT ?step ?lrv ?mechanism WHERE {
?step bp:hasClearanceCapability ?cap .
?cap bp:lrvValue ?lrv ;
bp:clearanceMechanism ?mechanism .
} ORDER BY ?step

The catalog's check is { "type": "sum", "var": "lrv", "round": 1, "value": 8.7 }. The query returns [('VF-001', 4.2), ('VI-001', 4.5)] — the nanofiltration step's size-based removal (nanofiltration) at 4.2 and the low-pH hold's enveloped-virus inactivation by low pH at 4.5 — and their sum, 8.7, matches the derived bp:TotalClearance-mAb-A individual exactly, the orthogonal total established once by bp:VClear-Study. The LRV hangs off the step through bp:hasClearanceCapability precisely so it can never be confused with a per-batch measurement; if a graph recorded an LRV as if it were observed on this batch's product, it would assert something no instrument saw — the single most safety-critical misrepresentation the model can make. CQ-12 validates that the safety argument is a connected, checkable structure — each LRV linked to its mechanism and the orthogonal sum derivable — rather than a manual tally in two unrelated filing cabinets.

Diagram of orthogonal viral clearance summing to a validated total: a spine of materials from the capture pool PApool-001 to the viral-inactivated pool VIpool-001 to the viral-filtered pool VFpool-001, with two orthogonal clearance steps between them — under the low-pH-hold step VI-001 its validated capability LRV 4.5 in green, under the nanofiltration step VF-001 its validated capability LRV 4.2 in green — the two capabilities summing into a total clearance of 8.7 established by the validation study, with an orthogonalTo arc linking the two steps; this is the value CQ-12 sums and checks. Two orthogonal barriers, summed and checked: CQ-12 walks bp:hasClearanceCapability to each step's validated LRV (4.5 and 4.2) and the catalog verifies their sum is the total clearance of 8.7 — a derived, queryable safety fact, not a spreadsheet total. Original diagram by the authors, created with AI assistance.

Validate: containment is not genealogy (CQ-13 and CQ-14)

Once the drug substance is filled into vials and the vials are packed for distribution, two transitive hierarchies meet on the same physical object — and the model must keep them sharp. Containment (what a vial is packed inside, now — a pallet holds a case holds a carton holds the vial, and a vial can be repacked) is not genealogy (what it was made from — the cell bank, the batch, the substance, permanent and backward in time). Conflating them would let a packaging change look like a lineage change, which is exactly the kind of error that breaks a recall trace. CQ-13 walks the transitive bp:contains aggregation hierarchy:

# queries/CQ-13.rq — Packaging containment (NOT lineage): walk the aggregation hierarchy
# pallet -> case -> carton -> serialized vial with the transitive bp:contains property.
PREFIX bp: <https://example.org/bioproc#>
SELECT ?package ?content WHERE {
?package bp:contains+ ?content .
?content a bp:SerializedUnit .
} ORDER BY ?package

It returns the packages [CARTON-001, CASE-001, PALLET-001], matching the catalog. CQ-14 is the one query that is supposed to come back empty — it asks whether any node is on both the containment and the genealogy path of the serialized vial VIAL-DP-001-000042, and the model is correct only when the answer is False:

# queries/CQ-14.rq — containment is NOT genealogy. The model is correct when this ASK is FALSE.
PREFIX bp: <https://example.org/bioproc#>
ASK {
?container bp:contains+ bp:VIAL-DP-001-000042 .
bp:VIAL-DP-001-000042 bp:derivedFrom+ ?container .
}

The catalog's check is { "type": "ask", "value": false }, and validate.py reports ASK = False. This is the subtle part of validation that prose alone misses: a correct model here produces no match, because nothing a vial is packed inside is also something it was made from. The carton is part of the vial's present packaging, not its manufacturing history; the cell bank is part of its history, not its packaging. An empty result is the requirement, and CQ-14's PASS is the proof the two transitive hierarchies stay disjoint — so a repack never looks like a re-manufacture, and a recall trace down derivedFrom never wanders off into the shipping container.

Validate: stable host identity answers CQ-20

CQ-20 closes the set with interoperable identity: what host organism, by stable identifier, does the line express its product in? The answer matters because "CHO" as a bare string is ambiguous across labs and decades, while the antibody's expression host is a regulated fact a partner organization must be able to read without guessing. The query resolves the cell bank's host through its class's alignment to a public taxonomy IRI rather than the local label:

# queries/CQ-20.rq — the working cell bank's host organism is named by a STABLE public identifier.
PREFIX bp: <https://example.org/bioproc#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX obo: <http://purl.obolibrary.org/obo/>
SELECT ?host ?taxon WHERE {
bp:WCB-CHO-001 bp:hasHostOrganism ?host .
?host a ?cls .
?cls rdfs:subClassOf ?taxon .
FILTER(STRSTARTS(STR(?taxon), STR(obo:)))
}

The catalog requires the row { "host": "CHO-host", "taxon": "NCBITaxon_10029" }, and the query lands on obo:NCBITaxon_10029Cricetulus griseus, the Chinese hamster whose CHO cells dominate commercial antibody production — so "what species?" resolves to a stable IRI a partner organization can dereference, not a local string. This is also where the cross-ontology seam shows: the biological side of the cell bank reaches up to an NCBI Taxonomy IRI, while its manufacturing description reaches up to the IOF biopharma module, and that bridge is authored and maintained in the alignment, not silently imported.

Evaluation: the whole catalog, green

These are eight of the twenty-three; validate.py reads the entire catalog, runs every entry against the loaded and OWL-RL-reasoned graph, prints a PASS/FAIL line per competency question, and exits non-zero if even one fails. The running-example chapter shows the full table; the rows this chapter's queries produce read:

CQ-01 lineage PASS 11 row(s)
CQ-02 impact PASS descendant superset of {DP-001, DP-002, DP-004, DS-001} (26 total)
CQ-04 impact PASS affected = [DP-001, DP-002]
CQ-05 trajectory PASS material superset of {PApool-001, POLpool-001} (2 total)
CQ-07 qbd PASS row {parameter: FeedRate, lot: DS-001} present
CQ-12 viral PASS sum(lrv) = 8.7 over 2 step(s)
CQ-13 packaging PASS package = [CARTON-001, CASE-001, PALLET-001]
CQ-14 packaging PASS ASK = False
CQ-20 units PASS row {host: CHO-host, taxon: NCBITaxon_10029} present

The RESULT column is the contract; the DETAIL column is the evidence. A SPARQL query answers seventeen of the twenty-three; the closed-world completeness questions — is the working cell bank fully characterized for identity, sterility, viral safety, and genetic stability? (CQ-17), does the released lot carry every required CQA? (CQ-08), is the out-of-spec lot flagged on exactly the failing path? (CQ-11) — are answered by SHACL gates the next chapter opens, because "is anything missing?" is not a question about the triples that exist. Those four are where the polishing-driven release specification and the cell-bank characterization requirements become enforceable rather than merely traceable. Validation, in short, is not a reading exercise: it is a run, and the model is validated exactly when every line says PASS.

The unsolved part: a passing query proves the model answers, not that the data is true

The competency-question suite certifies that the model can answer its requirements — but it runs over a derived view, and a query is only as true as its last load. The graph's triples are copied from the relational batch records, the plant historian, and the LIMS that are the actual sources of truth, so unless the load is validated, complete, and re-run under change control, the graph can silently drift from the systems it mirrors. The danger is sharpest exactly where this chapter's value lives: a lineage walk that looks authoritative can be quietly incomplete — a derivedFrom edge the loader skipped, a source edit made after the last load — and if the dock that fills the graph mistakes a batch for the bioreactor it ran in, or drops a fork to a sibling lot, the genealogy collapses and a recall trace silently misses a vial. CQ-01 returning eleven ancestors proves the property path works and the loaded edges connect; it cannot prove those eleven are the right eleven if the load that produced them was partial. The deeper limit is federation: the cross-lifecycle and lineage walks are ironclad inside the factory one organization controls, but the moment the thread reaches distributors and the supply chain it depends on other parties' graphs in their identifiers, with no one able to mandate the whole. The query was never the hard part; keeping the graph it runs over true is — engineering and governance work, not a missing SELECT.

Why it matters

Validation is where modeling stops being a claim and becomes a measurable property — and for a regulated antibody the stakes are concrete. Without the executable catalog, "can this model trace a vial to its cell bank, or scope a failed lot's recall?" has no test; with it, those questions reduce to running validate.py and reading the table. Every modeling decision in the book — the upper spine, the typed values, the faithful derivedFrom edges per real transformation, the orthogonal-clearance shape, the disjointness guards — exists to make these queries trustworthy, and each chapter earns its classes by pointing at the competency question they serve. A model that answers all its questions can still be useless for the twenty-fourth no one wrote down, but a model that fails one of its own questions — that cannot sum the viral barriers, or returns the wrong recall scope — is broken now, visibly, in the build.

In the real world

The competency-question discipline is the spine of the methodologies industry adopts, and on real platforms these same walks are everyday operations: the derivedFrom lineage path is a Palantir Foundry object link or a Neo4j Cypher traversal, and impact analysis is the query a recall team fires the moment a lot fails. The orthogonal-clearance summation and the cell-bank lineage trace map directly onto the viral-safety dossiers and bank-characterization records every commercial mAb process already maintains — the modeling advance is to link them so the safety argument and the genealogy are connected, queryable structures rather than separate filing cabinets reconciled by hand. What is still rare — and what this book argues for — is making the competency questions executable, so the requirements and the regression suite never drift apart [1]. The digital-thread idea, semantic links as lifecycle-spanning connective tissue, is established in the smart-manufacturing and digital-twin literature [2]; what separates a demo from a production thread is exactly this chapter's unsolved part — the validated, drift-controlled load — which is governance work, not a query no one has written.

Key terms

  • Competency question (CQ) — a query the finished ontology must answer, fixed before modeling; here, a real manufacturing question (trace, recall scope, QbD link) run by validate.py and checked against cq-catalog.json.
  • SPARQL property path — the (bp:derivedFrom)+ "one-or-more hops" construct that expresses a lineage or containment walk of arbitrary depth in one line.
  • Lineage walk (CQ-01) — the property-path query reconstructing all eleven ancestors of DS-001 back through the pools, batch, seed train, and three cell-bank tiers, with long-range reachability inferred by the transitive closure.
  • Impact analysis (CQ-04) — the inverse, outward query that scopes a failed lot's fate by walking up to the shared cell bank and back down to its two siblings DP-001 and DP-002 via the fillsInto forward fork.
  • Cross-lifecycle query (CQ-07) — the QbD question crossing development, manufacturing, and release seams (feed-rate parameter → realized run → released lot), made askable only by the shared model.
  • Quality trajectory (CQ-05) — the in-process HMW values down the chain (4.1 %1.4 %), derivedFrom read as a chain of custody for quality; shows where purity changed, not which step caused it.
  • Orthogonal clearance (CQ-12) — the sum of two mechanistically independent LRVs (low-pH inactivation 4.5 + size-based nanofiltration 4.2 = 8.7), valid because the barriers are orthogonal; a validated capability of the step, not a per-batch measurement.
  • Containment-is-not-genealogy (CQ-14) — the ASK that is correct only when False: no node is on both a vial's contains and its derivedFrom path.
  • Derived view / drift — the graph's triples are copied from source systems, so a passing query proves the model answers, not that an ungoverned, stale load is still true.

Where this leads

The competency questions a SPARQL query can answer are answered and green. But four of the twenty-three are closed-world questions a query cannot pose — is the cell bank fully characterized? is this lot in spec on every required attribute? The next chapter, The Release Gate and SHACL: Validating Completeness and Specification, turns to those: it builds the SHACL shapes that gate the release decision, shows why the out-of-spec sibling DP-004 fails on exactly one path (hmwPct at 2.41 % against a 2.0 % ceiling) and nowhere else, and completes the validation phase with the closed-world half the queries cannot reach.