Modeling Formulation and Fill-Finish: From Substance to Product

📍 Where we are: Part V · Fill-Finish and Release, modeled — Chapter 17. The drug substance is the release-anchoring lot. This chapter models turning it into the thing a patient actually receives — and the moment the process stops being about bulk material and starts being about countable units.

The drug substance is a bulk liquid, formulated and tested, but it is not yet a medicine anyone can use. Formulation and fill-finish turn it into the drug product (DP): the DS is combined with excipients into the final formulation, then filled — under sterile conditions — into vials or syringes, stoppered, and sealed [1]. In our campaign this yields DP-001. Two modeling shifts happen here, both consequential: the lineage forks forward (one DS lot fills many DP lots), and the product changes kind — from a bulk material measured by concentration to a population of discrete, countable units that, after serialization, will each have their own identity.

The simple version

A vat of soup becomes lunch only when it is ladled into bowls. The vat is one thing measured by volume; the bowls are many things you count. Filling changes what you are tracking — and one vat fills many bowls, so if a problem turns up in the vat, every bowl from it is implicated. Fill-finish ladles the drug substance into vials. This chapter models that change from a bulk you measure to units you count, and the one-to-many link that means a single substance lot's fate is shared by every product lot filled from it.

What this chapter covers

We model formulation as combining the DS with excipients, fill-finish as the process producing discrete drug-product units, and the container-closure system as a modeled entity that is part of the product's identity. We make the DS-to-DP forward fork concrete, dissect the DP-001 node, and confront the shift from bulk to countable units — and the question of whether the model tracks individual vials or only the lot.

Formulation, container-closure, and the change of kind

Formulation combines the drug substance with excipients — buffers, stabilizers, surfactants — that keep the antibody stable and deliverable. In our dataset these are real individuals — bp:EXC-PS80 (a bp:Surfactant, polysorbate 80), bp:EXC-HIS (a bp:BufferExcipient, histidine), and bp:EXC-SUC (a bp:Stabilizer, sucrose) — attached to the product with bp:hasComponent, so the formulation is traceable as structure rather than buried in a batch record. The three bp:Excipient subclasses align up to a real ChEBI anchor: align.ttl asserts bp:Excipient rdfs:subClassOf obo:CHEBI_24431 — ChEBI's generic chemical entity — exactly as bp:Antibody aligns up to GO's obo:GO_0071735 IgG immunoglobulin complex. What remains honestly illustrative is only the leaf level: a production model would point each excipient at a specific ChEBI leaf (a polysorbate-80, an L-histidine, a sucrose IRI), and the dataset deliberately carries no such leaf, anchoring at the generic class instead. Fill-finish then produces the drug-product lot — bp:FILL-001, a bp:FillFinishProcess whose bp:hasInput is DS-001 and bp:hasOutput is DP-001: a sterile filling occurrent whose output is a population of filled, sealed units. The container-closure system — the vial, stopper, and seal — is not packaging incidental to the product; it is part of what the product is, because container-closure integrity is a quality attribute that affects sterility and stability over shelf life [3]. The model treats it as the real entity bp:CCS-001 (a bp:ContainerClosureSystem carrying bp:containerClosureIntegrity "PASS") that the product bp:isContainedIn, a quality the release gate will check.

Those excipient and process classes are grounded up in align.ttl, with the genuine IOF gap named rather than faked:

# align.ttl — formulation and fill-finish grounded UP (excerpt).
bp:DrugProduct        rdfs:subClassOf iof:MaterialProduct .               # IOF biopharma 'material product'
bp:FormulationProcess rdfs:subClassOf iof:DrugProductFormulationProcess . # IOF biopharma 'drug product formulation process' (Released)
bp:Excipient          rdfs:subClassOf obo:CHEBI_24431 .                   # ChEBI 'chemical entity' (generic anchor; a polysorbate-80 / L-histidine / sucrose leaf stays ILLUSTRATIVE)
bp:derivedFrom        rdfs:subPropertyOf obo:RO_0001000 .                 # RO 'derives from' (also owl:TransitiveProperty)
# An honest IOF gap: there is NO fill-finish / aseptic-fill / lyophilization class in IOF biopharma, so
# bp:FillFinishProcess and bp:ContainerClosureSystem stay ILLUSTRATIVE local classes; bp:fillsInto is local too.

The deeper shift is one of kind. Everything upstream was bulk material, individuated by unit operation and measured by concentration. The drug product is discrete: a lot is now fundamentally a count of units, each fillable, inspectable, and eventually serializable. This is the same continuant the model has tracked, but its relevant granularity has changed — from "how much" to "how many" — and the model must be ready for the day, at serialization, when "which one" becomes answerable too.

The forward fork: one substance, many products, shared fate

The drug-substance chapter flagged that one DS lot typically fills many DP lots — and here that fork is laid down as real edges: DP-001 derivedFrom DS-001, DP-002 derivedFrom DS-001, with the convenience forward edge bp:DS-001 bp:fillsInto bp:DP-001 , bp:DP-002 capturing the one-to-many direction explicitly. Far from a bare stub, the DP-001 node in our running example is the lineage edge up to its substance, the excipient components, the container-closure it is contained in, the product-concept it conformsTo, the full release panel, the finish-gate qualities (sterility, appearance, fill volume), and an attributable signature [2]:

@prefix bp:  <https://example.org/bioproc#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

bp:DP-001 a bp:DrugProduct ;
    bp:derivedFrom bp:DS-001 ;
    bp:conformsTo bp:PC-mAb-A ;
    bp:isContainedIn bp:CCS-001 ;
    bp:hasComponent bp:EXC-PS80 , bp:EXC-HIS , bp:EXC-SUC ;
    bp:releaseStatus "PASS" ;
    bp:monomerPct "98.611"^^xsd:float ;
    bp:hmwPct "1.287"^^xsd:float ;
    bp:cexMainPct "70.686"^^xsd:float ;
    bp:hcpPpm "12.0"^^xsd:float ;
    bp:proteinConcMgPerMl "50.2"^^xsd:float ;
    bp:sterilityResult "STERILE" ;
    bp:appearance "clear, colourless, essentially free of visible particles" ;
    bp:fillVolumeMl "1.0"^^xsd:float ;
    bp:approvedBy bp:SIG-DP-001 .
bp:DS-001 bp:fillsInto bp:DP-001 , bp:DP-002 .

Because bp:derivedFrom is declared owl:TransitiveProperty (and align.ttl makes it a sub-property of the OBO Relation Ontology's obo:RO_0001000, derives from), this single edge transitively reconstructs the whole genealogy back through DS-001, POLpool-001, VFpool-001, VIpool-001, PApool-001, CLAR-001, BATCH-2026-001, SEED-001, SEEDFLASK-001, to the working, master, and research cell banks — eleven ancestors in all, no join table, just inference. This forward branching is the mirror of capture's backward pooling fork, and it carries a sharp consequence the graph makes computable: shared fate. If DS-001 is later found defective, every DP lot deriving from it (DP-001, DP-002) is implicated — a single traversal down the fork finds them all. The dataset's OOS lot makes the dual reasoning concrete: DP-004 derives not from DS-001 but from a separate substance lot DS-004, yet the two forks share the cell bank WCB-CHO-001. So when DP-004 fails, queries/impact.rq walks up DP-004's lineage to the shared ancestor and back down to every drug product reachable from it — returning ['DP-001', 'DP-002'], the siblings whose fate is now in question. The OOS failure mode is itself instructive: DP-004 is in-spec on monomer purity (98.687) yet still flagged OOS, because the SEC high-molecular-weight aggregate sits above the release limit:

@prefix bp:  <https://example.org/bioproc#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

bp:DP-004 a bp:DrugProduct ; bp:derivedFrom bp:DS-004 ;
    bp:releaseStatus "OOS" ;
    bp:monomerPct "98.687"^^xsd:float ;
    bp:hmwPct "2.41"^^xsd:float ;
    bp:cexMainPct "69.40"^^xsd:float ;
    bp:hcpPpm "18.0"^^xsd:float ;
    bp:proteinConcMgPerMl "49.7"^^xsd:float ;
    bp:sterilityResult "STERILE" ;
    bp:appearance "clear, colourless, essentially free of visible particles" ;
    bp:fillVolumeMl "1.0"^^xsd:float ;
    bp:approvedBy bp:SIG-DP-004 .

The release gate trips this on a SHACL sh:maxInclusive 2.0 constraint over bp:hmwPct, not on monomer — the realistic OOS mode where one passing attribute does not rescue a lot. Running the whole graph through SHACL therefore reports conforms: False, with the violation isolated to a single path — hmwPct — on exactly the focus nodes DS-004 and DP-004; the DS-001-only subgraph conforms True. This is precisely the impact-analysis the knowledge-graph chapter demonstrated with the OOS lot, and fill-finish is where the forward fork that makes it possible is created. Modeling the fork explicitly — rather than treating each DP lot as an island — is what turns a DP failure from a guessing game into a query.

The drug-product lot, unpacked: it derives from the DS as one branch of a forward fork (the root of shared-fate impact analysis), combines excipients, lives in a modeled container-closure, and is counted in units rather than measured in concentration — the change from bulk to discrete. Original diagram by the authors, created with AI assistance.

One substance fills many lots, and every lot shares a fate back to the bank: DS-001 forks forward into sibling lots DP-001 and DP-002, while the out-of-spec DP-004 traces through DS-004 to the same WCB-CHO-001 — the structure that scopes a recall by query rather than guesswork. Original diagram by the authors, created with AI assistance.

The unsolved part: lot versus item, and where to stop individuating

The honest question fill-finish forces is how far down to individuate. A lot may be tens of thousands of vials. Does the model mint a node per vial? Almost never at this stage — it models the lot as the unit, because that is what gets filled, inspected, and released together, and a node per vial would explode the graph for no query anyone yet asks. But serialization is about to change that: regulation will require each saleable unit to carry a unique identity, so the graph must be ready to individuate to the item level for some purposes (track-and-trace) while keeping lot-level granularity for others (release, quality). Holding two granularities — lot and item — coherently, without either drowning the graph in vial-nodes or losing the item identity regulation demands, is a real and unresolved modeling tension, and fill-finish is where it is born even though serialization is where it bites.

There is a subtler issue too: the change of kind from bulk to discrete is not clean. During filling, the bulk and the units coexist — the lot is becoming discrete vial by vial — and modeling that transition crisply (when does "the bulk DP" become "the filled units"?) is the same individuation softness, now at the bulk-to-item boundary. The pragmatic convention is to model the filled lot as a new material with a unit count and defer item identity to serialization, which works — but it is a convention papering over a continuous filling process, and a model that treats the lot count as a hard fact should remember it is an abstraction over thousands of individual fill events. The honest standard is that lot-level modeling is right for now, item-level identity is coming, and the model must be designed to hold both rather than retrofitted in a panic when serialization arrives.

Why it matters

Fill-finish creates the forward fork that makes shared-fate impact analysis possible, and it is the hinge where the product becomes the discrete, countable thing a patient receives. Model the DS-to-DP fork explicitly and a DP failure becomes a scoped, queryable investigation across siblings; model the container-closure as part of the product and its integrity is a checkable quality rather than an afterthought; anticipate the bulk-to-item shift and serialization becomes an extension rather than a rebuild. Treat each DP lot as an island and the most consequential question in a recall — what else is affected? — reverts to manual cross-referencing exactly when speed matters most. This chapter is where the genealogy fans out toward the patient.

In the real world

Formulation with excipients, sterile fill-finish into a qualified container-closure system, and the release of discrete drug-product lots are universal, and container-closure integrity is an explicit regulatory concern with its own guidance [1][2][3]. Plants already track the DS-to-DP relationship for exactly the recall-scoping reason this chapter models — it is one of the most safety-critical links in the supply chain. The modeling advance is to make the forward fork a first-class, traversable structure and to design the lot model so item-level serialization slots in cleanly, which the serialization chapter builds on directly. On the floor the fill line itself runs as an ISA-88 PackML state machine (Idle → Starting → Execute → Holding and back), and the companion examples/datasets/packml_log.csv is that state sequence for BATCH-2026-001 on FILL-LINE-01, captured live in the open-source fill-finish chapter. The graph indexes the resulting drug-product lot and its release facts, not the raw state log — the same index-versus-payload discipline that keeps dense machine and sensor streams out of the triples.

Key terms

Drug product (DP) — the final formulated, filled, sealed medicine in its container (DP-001); a population of discrete units, derivedFrom the drug substance.
Formulation / excipients — combining the DS with stabilizing components, modeled as lineage plus component edges rather than a buried recipe.
Container-closure system — the vial, stopper, and seal, modeled as part of the product's identity with integrity as a checkable quality.
DS-to-DP forward fork — the one-to-many lineage where a DS lot fills many DP lots; the structure enabling shared-fate impact analysis.
Shared fate — the computable consequence of the fork: a DS defect implicates every sibling DP lot, and a DP failure scopes to its siblings, by traversal.
Bulk versus discrete (lot versus item) — the change of kind from a concentration-measured bulk to a counted population of units, with item-level identity deferred to serialization but designed for in advance.

Where this leads

The medicine now exists as discrete, sealed units — but it cannot ship until it is proven good. The next chapter, Modeling QC and the Release Gate: Specifications as SHACL, models the decision that has been the quality thread's destination all along: checking that every required CQA has a conformant, in-specification, signed result before a lot may claim release — and shows how a release specification becomes a SHACL shape that gates the graph, the exact mechanism the open-source book runs in code.

What this chapter covers​

Formulation, container-closure, and the change of kind​

The forward fork: one substance, many products, shared fate​

The unsolved part: lot versus item, and where to stop individuating​

Why it matters​

In the real world​

Key terms​

Where this leads​