Skip to main content

Modeling QC and the Release Gate: Specifications as SHACL

📍 Where we are: Part V · Fill-Finish and Release, modeled — Chapter 18. The medicine exists as sealed units. This chapter models the gate that decides whether it may ship — the destination the entire quality thread has been heading toward since the design space.

Every quality attribute this book has modeled — the monomer purity, the variant levels, the viral clearance, the container-closure integrity — exists to answer one question: may this lot be released to patients? Quality control (QC) runs the release tests; release is the formal decision, made against the specification, that the lot meets every acceptance criterion and the records are complete and signed. This is where a graph stops being a description and becomes a control: the release specification, modeled as SHACL shapes, becomes a gate that a lot's data must pass before it may claim release — the exact mechanism the open-source knowledge-graph chapter runs in code.

The simple version

Before a plane takes off, a checklist must be complete — every item present, checked, and initialed. A blank or a missing signature stops the flight, no matter how good the plane looks. Release is that checklist for a batch: every required test result present, within limits, recorded once, and signed off. Modeling the specification as a machine-readable checklist — a SHACL shape — means the graph itself can refuse to call a lot released until the checklist is genuinely complete. This chapter builds that gate, and is honest that a complete checklist is not the same as a good flight.

What this chapter covers

We model the specification as a set of SHACL shapes, the release decision as the gate those shapes enforce, and the certificate of analysis and signatures as the records that make release attributable. We dissect the release gate and its validation report, and close on the sharp, important limit: SHACL guarantees a record is complete and in range, not that it is true — the difference between a batch that passes the gate and a batch that is actually good.

The specification is a shape, and release is conformance

A release specification lists, for the drug substance and drug product, the required tests and their acceptance criteria — monomer purity within limits, aggregates below a threshold, the charge-variant main peak in its window, host-cell protein and protein concentration in range, sterility, and the rest, structured by guidance like ICH Q6B [2]. In Part I we learned that OWL reasons but SHACL validates: OWL's open world treats a missing result as "unknown," which is exactly wrong for release, where a missing required test is a failure. So the specification is naturally a SHACL shape — a closed-world rule that a valid released lot must satisfy [1]. This is the actual bp:ReleaseShape the running example validates against — the full release CQA panel, targeting both the drug substance and the drug product:

@prefix sh: <http://www.w3.org/ns/shacl#> .
@prefix bp: <https://example.org/bioproc#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

bp:ReleaseShape a sh:NodeShape ;
sh:targetClass bp:DrugSubstance , bp:DrugProduct ;

# Exactly one monomer result, a float, at or above the spec floor.
sh:property [
sh:path bp:monomerPct ;
sh:name "SEC %monomer" ;
sh:minCount 1 ; sh:maxCount 1 ;
sh:datatype xsd:float ;
sh:minInclusive 95.0 ;
sh:message "Monomer purity is missing, duplicated, or below the 95.0 % release limit." ] ;

# HMW aggregate at or below its upper limit — the criterion DS-004/DP-004 trip.
sh:property [
sh:path bp:hmwPct ;
sh:name "SEC %HMW aggregate" ;
sh:minCount 1 ; sh:maxCount 1 ;
sh:datatype xsd:float ;
sh:maxInclusive 2.0 ;
sh:message "HMW aggregate is missing or above the 2.0 % release limit." ] ;

# CEX main charge-variant peak within its window.
sh:property [
sh:path bp:cexMainPct ;
sh:name "CEX %main (charge variant)" ;
sh:minCount 1 ; sh:maxCount 1 ;
sh:datatype xsd:float ;
sh:minInclusive 60.0 ; sh:maxInclusive 80.0 ;
sh:message "CEX main peak is missing or outside the 60.0-80.0 % window." ] ;

# Host-cell protein at or below its upper limit.
sh:property [
sh:path bp:hcpPpm ;
sh:name "host-cell protein (ppm)" ;
sh:minCount 1 ; sh:maxCount 1 ;
sh:datatype xsd:float ;
sh:maxInclusive 100.0 ;
sh:message "HCP is missing or above the 100 ppm release limit." ] ;

# Protein concentration within the formulation window.
sh:property [
sh:path bp:proteinConcMgPerMl ;
sh:name "protein concentration (mg/mL)" ;
sh:minCount 1 ; sh:maxCount 1 ;
sh:datatype xsd:float ;
sh:minInclusive 45.0 ; sh:maxInclusive 55.0 ;
sh:message "Protein concentration is missing or outside the 45-55 mg/mL window." ] ;

# Release status drawn from a controlled set.
sh:property [
sh:path bp:releaseStatus ;
sh:minCount 1 ; sh:maxCount 1 ;
sh:in ( "PASS" "OOS" "PENDING" ) ] ;

# An attributable signature (21 CFR Part 11 / Annex 11).
sh:property [
sh:path bp:approvedBy ;
sh:minCount 1 ;
sh:message "Release record is unsigned." ] .

Read it as the checklist made executable: the lot must carry exactly one monomer result (minCount/maxCount — no cherry-picking among repeats) that is at or above the spec floor (minInclusive 95.0); it must carry an HMW aggregate result at or below the upper limit (maxInclusive 2.0); its charge-variant main peak (cexMainPct), host-cell protein (hcpPpm), and protein concentration (proteinConcMgPerMl) must each fall in their declared windows; its releaseStatus must be drawn from a controlled set; and the lot must bear a signature (approvedBy). A separate bp:DrugProductFinishShape targets only bp:DrugProduct and adds the finish-specific criteria the bulk substance does not carry — sterility (sterilityResult in STERILE), an appearance description, and a plausible fill volume. Run the validator and a conformant lot reports sh:conforms true; a lot missing a test, carrying an out-of-range value, or unsigned reports sh:conforms false. The release gate is not prose in an SOP a human must remember to apply — it is a rule the graph enforces on every lot, automatically and identically.

The validation report is a structured verdict, not a yes/no

SHACL does not merely pass or fail; it returns a validation report that is itself an RDF graph, so a failure is queryable like any other fact [1]. When the OOS lots fail, the report names each offending lot in sh:focusNode, the failing test in sh:resultPath, the rule that fired in sh:sourceConstraintComponent, a severity of sh:Violation, and a human-readable message. The honest detail of the running example is what kind of failure this is: the drug-substance lot DS-004 and its filled product DP-004 both have a monomer purity of 98.687 %, comfortably above the 95.0 % floor, so they pass the monomerPct shape — and every other panel value (CEX main, HCP, protein concentration) is in spec too; what trips the gate is their HMW aggregate of 2.41 %, above the 2.0 % limit — a sh:MaxInclusiveConstraintComponent violation at sh:resultPath bp:hmwPct. This is the realistic out-of-spec mode: a lot can be pure by one criterion and still fail on an aggregate, and the failure stays isolated to exactly the one path that is genuinely out of range. Here is the real pySHACL report for the whole-graph validation — two results, one per OOS lot, both on the same path:

Validation Report
Conforms: False
Results (2):
Constraint Violation in MaxInclusiveConstraintComponent (http://www.w3.org/ns/shacl#MaxInclusiveConstraintComponent):
Severity: sh:Violation
Source Shape: [ sh:datatype xsd:float ; sh:maxCount Literal("1", datatype=xsd:integer) ; sh:maxInclusive Literal("2.0", datatype=xsd:decimal) ; sh:message Literal("HMW aggregate is missing or above the 2.0 % release limit.") ; sh:minCount Literal("1", datatype=xsd:integer) ; sh:name Literal("SEC %HMW aggregate") ; sh:path bp:hmwPct ]
Focus Node: bp:DP-004
Value Node: Literal("2.41", datatype=xsd:float)
Result Path: bp:hmwPct
Message: HMW aggregate is missing or above the 2.0 % release limit.
Constraint Violation in MaxInclusiveConstraintComponent (http://www.w3.org/ns/shacl#MaxInclusiveConstraintComponent):
Severity: sh:Violation
Source Shape: [ sh:datatype xsd:float ; sh:maxCount Literal("1", datatype=xsd:integer) ; sh:maxInclusive Literal("2.0", datatype=xsd:decimal) ; sh:message Literal("HMW aggregate is missing or above the 2.0 % release limit.") ; sh:minCount Literal("1", datatype=xsd:integer) ; sh:name Literal("SEC %HMW aggregate") ; sh:path bp:hmwPct ]
Focus Node: bp:DS-004
Value Node: Literal("2.41", datatype=xsd:float)
Result Path: bp:hmwPct
Message: HMW aggregate is missing or above the 2.0 % release limit.

Because every part is a triple, the report drops back into the same graph machinery: you can query "which lots failed which criteria across the campaign" rather than reading a stack trace. This is the difference between a constraint language and a bare assertion — the failure carries enough structured context to route an investigation, linking straight to the digital thread that scopes which sibling lots share the problem. Because the report is RDF, the question "which lots failed, on which path, with what value" is itself a SPARQL query over the report graph:

PREFIX sh: <http://www.w3.org/ns/shacl#>
PREFIX bp: <https://example.org/bioproc#>
SELECT ?focus ?path ?value ?component WHERE {
?r a sh:ValidationResult ;
sh:focusNode ?focus ;
sh:resultPath ?path ;
sh:value ?value ;
sh:sourceConstraintComponent ?component .
}
# -> bp:DS-004 bp:hmwPct 2.41 sh:MaxInclusiveConstraintComponent
# -> bp:DP-004 bp:hmwPct 2.41 sh:MaxInclusiveConstraintComponent

Hero diagram of the release gate: on the left the DS-001 drug-substance node with its CQA results (monomerPct 98.611, HMW 1.287, sterility) and a signature; in the center a SHACL gate box holding the release specification as shapes (each required test present, in range, exactly one, signed); a conformant lot flows through to a green released state with releaseStatus PASS; below it the OOS lot DP-004 hits the gate and is diverted to a red conforms-false path that emits a validation report (focus node, result path, constraint, severity) feeding an investigation and a digital-thread impact query over sibling lots; a caption reads the specification is a shape, release is conformance. Release as a gate: a lot's CQA results and signature are checked against the specification expressed as SHACL shapes; conformance yields a released status, while a violation emits a structured report that routes an investigation and an impact query — the quality thread's destination, made enforceable. Original diagram by the authors, created with AI assistance.

Signatures and the certificate make release attributable

Release is a regulated act, so the gate must check not only the science but the attribution. Under the electronic-records rules — 21 CFR Part 11 in the US and Annex 11 in the EU — a released record must be signed, attributable, and audit-trailed [3]. The model treats the signature as a first-class entity (who approved, when, in what role) that the SHACL gate requires, and the certificate of analysis (CofA) — the document summarizing the results against spec — as an information artifact derivedFrom the lot's results. Modeling these means "released" is not a bare status flag anyone could set; it is a state that cannot be asserted unless the signed, complete, in-spec evidence exists in the graph to back it. The data-integrity discipline of the whole series becomes, here, a precondition the gate enforces.

Identity card dissecting the release gate and its report: a target row (drug-substance / drug-product lot); a required-results block listing each CQA shape with its constraints — present (minCount 1), single (maxCount 1), typed (xsd), in range (the spec limit); an attribution block requiring a Part 11 / Annex 11 compliant signature and linking the certificate of analysis as an artifact derived from the results; a verdict row showing sh true or false; and a validation-report block for a failing lot naming focusNode DP-004, resultPath hmwPct, sourceConstraint MaxInclusive, value 2.41, severity Violation, and a message — annotated that the report is itself queryable RDF that routes an investigation. The gate, unpacked: shapes for every required result plus an attributable signature and a derived certificate, yielding a conformance verdict and, on failure, a structured RDF report — so "released" is a state the evidence must earn, not a flag someone sets. Original diagram by the authors, created with AI assistance.

The unsolved part: SHACL checks completeness, not correctness

Here is the limit that matters most, and it generalizes the axioms chapter's warning that consistent is not correct. A SHACL gate verifies that a lot's record is complete, well-formed, in range, and signed. It cannot verify that the record is true. A result that was confidently mislabeled — the right number entered against the wrong test, a sample mix-up, a transcription error that happens to land in range — passes the gate cleanly, because every structural rule is satisfied. SHACL is a powerful guard against the common failures (a missing test, an out-of-range value, an unsigned record, a duplicated result), and those are most of them; but it is blind to a plausible, in-range falsehood. The gate proves the checklist is filled, not that the checks were honest.

This is why release, in reality, is never only a gate. The SHACL pass is a necessary precondition that automation can enforce tirelessly and identically — a genuine advance over a human re-checking a paper checklist — but the release decision still rests on QC judgment, deviation review, and the qualified-person sign-off that the gate records but does not replace. A graph that conflates "passed the SHACL gate" with "is a good batch" commits exactly the over-trust this book warns against at every step. The honest standard: model the specification as SHACL so completeness and range are mechanically guaranteed and investigations are routable, and be clear that the gate defends against incompleteness and out-of-range error, while correctness — that each result truly describes this lot — depends on data integrity upstream and human judgment the model supports rather than supplants.

Why it matters

Release is the decision the entire process and its data exist to serve, and modeling the specification as SHACL turns a remembered, manually-applied checklist into an enforced, auditable gate. Every CQA the book modeled — back to the design space that defined which attributes are critical — converges here as a shape a lot must satisfy, and a failure routes straight into the impact analysis that scopes the damage. This is where the quality thread becomes a control rather than a record. And the gate's honest limit — completeness, not correctness — is the clearest statement of what an ontology is for: it makes the structure of trust enforceable, while the substance of trust still depends on the integrity of what enters it.

From the wire to the graph

The release gate this chapter built signs off on bp:DS-001 as an internal genealogy node — a drug-substance lot reachable by derivedFrom from its pool, carrying its CQA results, a signature, and a certificate of analysis. But the moment that lot enters a regulated submission, the regulator does not care about your internal IRI; it wants the substance's official identity, the one it can re-check against its own master-data registry. So the running example gives DS-001 a second, regulated name without forking it into a parallel system: in examples/platform/ontology/instances.ttl, the same graph node the bp:ReleaseShape gate already validated acquires an ISO 11238 / IDMP substance identifier — an FDA UNII (GSRS) code and an IDMP Medicinal Product Identifier — attached directly to it.

# ISO IDMP substance identity: the regulator's name for the substance the graph tracks as DS-001
bp:DS-001 bp:hasSubstanceIdentifier bp:IDMP-DS-001 .
bp:IDMP-DS-001 a bp:SubstanceIdentifier ; rdfs:label "IDMP substance identity for DS-001" ;
bp:isAbout bp:DS-001 ;
bp:uniiCode "ILLUSTRATIVE-UNII-0001" ; # an FDA UNII / GSRS code (value illustrative)
bp:mpid "ILLUSTRATIVE-MPID-mAb-A" . # an IDMP Medicinal Product Identifier (value illustrative)

Because IDMP-DS-001 isAbout the very same bp:DS-001 — not a copy of it — the released lot crosses into a PQ-CMC / eCTD Module 3 / SPL submission carrying the identity the regulator re-checks, rather than being re-described from scratch in a disconnected filing system. The UNII and MPID strings here are illustrative placeholders, not assigned codes; the loadable artifact is the structural fact that the regulated identity hangs off the exact node the gate signed. And unlike the piloted shop-floor standards, IDMP/SPOR, UNII, and SPL are mandated, production-grade regulatory semantics — explored next in the regulatory semantics already mandated.

In the real world

Releasing a lot against a specification, with signed records under Part 11 and Annex 11, is the legally binding reality of GMP manufacturing, and the structure of biotech specifications is long codified [2][3]. SHACL is a settled W3C standard, and the open-source book runs exactly this kind of BatchShape gate in tested code — sh:conforms false with a validation result naming the focus node and constraint — proving the mechanism is not hypothetical [1]. What remains a human and organizational reality, not a solved technical one, is everything the gate cannot see: the deviation investigations, the data-integrity culture, and the qualified-person judgment that stand behind a release the graph can structurally verify but not vouch for. This gate has a real regulatory cousin: the regulatory-semantics chapter shows the FDA’s KASA platform applying rule-based structured checks to CMC data, and IDMP’s mandated structured master data carrying checkable meaning at the submission boundary — even as the GxP validation regime keeps a live reasoner off the floor.

Key terms

  • Release specification — the required tests and acceptance criteria a lot must meet, modeled as SHACL shapes the lot's data must satisfy.
  • SHACL release gate — the closed-world validation that every required CQA is present, single, typed, in range, and signed before a lot may claim release.
  • Validation report — the queryable RDF graph SHACL returns on failure (focus node, result path, constraint, severity, message), used to route an investigation.
  • Certificate of analysis (CofA) — the information artifact summarizing results against spec, modeled as derived from the lot's results.
  • Attribution (signature) — the Part 11 / Annex 11 requirement that release be signed and audit-trailed, modeled as a first-class entity the gate requires.
  • Completeness versus correctness — SHACL guarantees a record is complete, well-formed, and in range, not that it is true; a plausible in-range falsehood passes, so the gate supports but does not replace human release judgment.

Where this leads

The lot is released — proven complete, in spec, and signed. The next chapter, Modeling Packaging and Serialization: GS1 and Unit Identity, follows it into packaging, where the lot finally becomes individually identified units — each vial a unique serialized item under GS1 standards — and the model must reconcile its derivedFrom lineage with a second, global identity system built for tracking medicines across the supply chain.