Modeling the Production Bioreactor: A Process, Its Phases, and Its Parameters
📍 Where we are: Part III · Upstream, modeled — Chapter 11. The seed train delivered enough cells. Now they go into the production bioreactor, where the molecule is actually made — and where the continuant/occurrent distinction from Part I stops being theory and becomes the most consequential modeling choice in the book.
This is the chapter the whole upstream points at. Cells from SEED-001 inoculate a large stirred tank, and over a couple of weeks they grow, then switch to producing antibody, monitored every few seconds by a battery of probes. It is also where a careless model quietly breaks, by fusing into one fuzzy node three things Part I insisted are different: the vessel, the batch of material, and the process of making it. Keep them apart and the graph can say anything; collapse them and it can barely say what ran where.
Think of a concert. There is the hall (used for many concerts), the performance (one evening, with a beginning, middle, and end), and the recording of it. Confuse the hall with the performance and you cannot say two concerts happened in the same hall. A production bioreactor is the same: there is the vessel (used for many batches), the cell-culture run (one performance, with growth and production movements), and the batch of product it yields. This chapter keeps the three distinct — which is what lets the graph say this vessel hosted that run which made this batch.
What this chapter covers
We model the vessel as equipment playing a role, the batch as the material it produces, and the cell-culture process as the occurrent that links them — the run occursIn the vessel and hasOutput the batch — then model the run's phases (growth, then production) as sub-processes and attach the design space's parameters to this real run. We dissect the BATCH-2026-001 node — typed as a Batch and nothing else, with the vessel kept as a separate individual — and draw the now-familiar line that keeps dense PAT sensor streams indexed in the graph but stored in the historian.
Three entities where it is tempting to model one
The vessel, the batch, and the run are three BFO categories, and the spine chapter already classified them. The bioreactor is a material entity — equipment that persists across campaigns — that bears the role of "this campaign's production reactor," a role it sheds when the batch ends. The batch (BATCH-2026-001) is a different material entity: the broth, then the antibody-bearing culture, which exists only for this run. And the cell-culture process is an occurrent in which the cells participate, which occursIn the vessel and hasOutput the batch material, and which realizes the recipe. Modeling all three means the graph can state that the same BR-101 vessel ran a different batch last month (a new material, a new process, the same equipment) without any contradiction — exactly the capability the disjointness axioms from the axioms chapter protect. Batch owl:disjointWith CellCultureProcess makes the "the batch is the run" conflation a flagged error; Material owl:disjointWith Equipment does the same for the subtler "the batch is the vessel" conflation — so the run-to-vessel link rides on an explicit occursIn edge, never on a second rdf:type smuggled onto the batch.
The genealogy edge is laid here too: BATCH-2026-001 derivedFrom SEED-001, the trace of the inoculation that handed the seed train's cells into the production vessel. From this node the chain continues downstream — the harvest will derivedFrom this batch — so the production bioreactor sits at the hinge of the whole lineage, receiving from the seed train and handing to downstream.
Phases are sub-processes, not labels
A cell-culture run is not uniform; it passes through distinct phases that obey different rules — a growth phase where cells multiply, then a production phase where they make antibody — and the design space often specifies different parameter ranges for each [1]. The model treats each phase as a sub-process, a temporal part of the overall cell-culture occurrent, with its own start and end process boundaries. This is what lets the graph attach a parameter setpoint to the phase it applies to — a feed rate that is critical during production but not growth — rather than to the run as an undifferentiated whole. Phases are the occurrent-side analogue of the recipe's ISA-88 operations: the recipe prescribes the phases as information, and the run realizes them as happenings, and the two line up step for step. In the loadable dataset the run is one occurrent with two sub-phases, the batch is its output, and a per-phase parameter range hangs on the phase it governs:
# instances.ttl — the run as an occurrent with growth/production sub-phases.
bp:CCP-001 a bp:CellCultureProcess ;
bp:occursIn bp:BR-101 ; # the run occurs IN the vessel...
bp:hasInput bp:SEED-001 ;
bp:hasOutput bp:BATCH-2026-001 ; # ...and the batch is its OUTPUT, not the run
bp:hasPhase bp:CCP-001-growth , bp:CCP-001-production ;
bp:realizes bp:Recipe-mAb-A ;
bp:hasTrace bp:Trace-BR101-Temp .
bp:CCP-001-production a bp:Phase ; # a temporal sub-process...
bp:setpoint 36.5 ; bp:norLow 36.0 ; bp:norHigh 37.0 ; # ...with its own parameter range
bp:realizesParameter bp:RPS-temp-CCP001 , bp:RPS-feedrate-CCP001 . # the run's actual settings
bp:BATCH-2026-001 a bp:Batch ; # the material, typed ONCE — not also a Bioreactor
bp:participatesIn bp:CCP-001 .
bp:BR-101 a bp:ProductionBioreactor . # the vessel — a separate Equipment individual
The parameters from development finally have a run to attach to
In process development we modeled bp:FeedRate bp:affectsQuality bp:MonomerPct-CQA as a relationship with evidence and ranges — but those were type-level facts about the process in general. Here, in a specific run, the parameter becomes an instance: this run's feed rate had this setpoint, held within this NOR, during this phase, and produced this monomer result. The dataset carries each as a real node — bp:RPS-feedrate-CCP001 (setpoint 0.40, NOR 0.35–0.45) and bp:RPS-temp-CCP001 (setpoint 36.5) — attached to the production phase by bp:realizesParameter and pinned to their type-level parameter by bp:parameterType. The design space said what should be true; the run records what was. That link is what lets the release gate and any investigation check a real batch against the knowledge that governs it — and it is loadable end to end: queries/cross-lifecycle.rq walks from a CPP, through its affectsQuality link, to the run that realized it, forward along derivedFrom to the released drug-substance lot that carries the result, returning (FeedRate, MonomerPct-CQA, DS-001) in one query. The abstract affectsQuality graph from development and the concrete sensor record from the plant meet on this node.
The upstream hinge, modeled: a persisting vessel bears a role, a two-week cell-culture process with growth and production phases yields the batch material, the design space's parameters attach to the phase they govern, and the second-by-second sensor stream is referenced in the historian, not flattened into triples.
Original diagram by the authors, created with AI assistance.
Anatomy of the batch node, and a multi-source conflict resolved cleanly
Unpack BATCH-2026-001 and it carries the run's whole story — and it carries it as one kind of thing, a Batch, the material the run produced. That single-typing is a deliberate fix to a real integration wrinkle the open-source knowledge-graph chapter surfaces: two systems described this physical run. The batch register (MES) called it a batch; a genealogy loader recorded that the capture pool's parent "ran in a bioreactor." A naive union of those sources typed the one node as both a Batch and a Bioreactor — fusing the material with the vessel that held it, the exact category error this chapter warns against. The faithful resolution is not to keep both types but to send the genealogy's "bioreactor" to the place it belongs: the separate vessel BR-101, with the run occursIn it, and a small PROV-O trail recording that each rdf:type claim came from a different source. Conflict reconciled, provenance kept, no node wearing two incompatible BFO categories. The batch node then derivedFrom SEED-001; participates in the cell-culture process with its phases; carries the run's realized parameter values; and bears the SEC monomerPct of 98.611 — with the honest caveat that monomer is really a drug-substance-stage release result, carried here on the upstream batch so the digital thread can read the originating value, where a fully faithful release model hangs it on the drug-substance lot.
The batch node, fully unpacked: typed once as a Batch, with the vessel kept separate (the run occursIn BR-101), deriving from the seed train, participating in a phased process, carrying its realized parameters and CQA, and pointing at the historian for the raw stream — the upstream lineage and quality at one address, with the multi-source vessel/material conflict reconciled by provenance, not by a second type.
Original diagram by the authors, created with AI assistance.
The graph indexes the sensor stream; the historian holds it
A production run produces a torrent of PAT data — temperature, pH, dissolved oxygen, and more, sampled every few seconds [1][3]. It is the same shape as the chromatogram: a dense time series that must not be flattened into triples. So the graph applies the same index-versus-payload boundary — it holds the fact that this batch hasTrace a temperature signal, names the tag (BR101.Temp.PV), and points by IRI at the historian or data shadow where the millions of points live, while a single setpoint, an excursion event, or a phase-average result can sit in the graph as a typed value. This is the precise architecture the trilogy keeps converging on: the tag from the preface and the historian row from the data book are the payload; the graph is the index that makes them findable and ties them to the batch.
So how does a PI tag actually become a graph node? Not by copying points, but by a mapping that a virtual-graph engine or an R2RML/RML processor runs over the historian's own rows. The companion historian-map.rml.ttl maps one historian row — the shape the PI Web API bridge produces, (ts, tag, value, unit, quality, batch_id) — into a SOSA observation, the W3C vocabulary built for exactly this [4][5]:
# historian-map.rml.ttl — one PI/historian row (tag BR101.Temp.PV) -> one sosa:Observation.
ex:ObservationMap a rr:TriplesMap ;
rml:logicalSource [ rml:source "sensor_reading.csv" ; rml:referenceFormulation ql:CSV ] ;
rr:subjectMap [ rr:template "https://example.org/historian/obs/{tag}/{ts}" ;
rr:class sosa:Observation ] ;
rr:predicateObjectMap [ rr:predicate sosa:observedProperty ;
rr:objectMap [ rr:template "https://example.org/historian/tag/{tag}" ] ] ;
rr:predicateObjectMap [ rr:predicate sosa:hasSimpleResult ;
rr:objectMap [ rml:reference "value" ; rr:datatype xsd:float ] ] ;
rr:predicateObjectMap [ rr:predicate bp:fromBatch ;
rr:objectMap [ rr:template "https://example.org/bioproc#{batch_id}" ] ] .
A companion map emits one bp:hasTrace edge per batch-and-tag — the index — while the points stay in PI. Running historian_to_rdf.py over a few recorded values from the PI Web API stub shows the discipline holding: a bounded set of triples, one trace edge per signal, no flattened stream [6]:
historian -> RDF: 23 triples from 3 rows
bp:hasTrace index edges (one per batch/tag — the stream stays in PI):
BATCH-2026-001 hasTrace https://example.org/historian/tag/BR101.Temp.PV
BATCH-2026-001 hasTrace https://example.org/historian/tag/BR101.pH.PV
The unsolved part: time, continuity, and how much of a run to model
The honest difficulties cluster around time. A graph of triples is fundamentally timeless — BATCH-2026-001 derivedFrom SEED-001 has no clock — yet a bioreactor run is time: phases, ramps, excursions, setpoint changes. Representing temporal extent in RDF is possible (process boundaries, time intervals, reified time-stamped states) but verbose and contested, with no single dominant pattern, so most graphs model time coarsely — a phase has a start and end — and leave the fine-grained temporal behavior in the historian. That is a reasonable division, but it means the graph cannot answer "what was the pH trajectory's shape?" without leaving the graph; it can only point at where the answer lives. The boundary between what the graph models temporally and what it defers is a judgment, and modeling too much time into triples is as much a failure as modeling too little.
The second difficulty is the granularity of the run itself, the same individuation question the seed train raised. How many phases? Is a feed addition an event-node, or just a point in the historian? Is a perfusion run one continuous process or a sequence of daily cycles? There is no canonical answer, and the right one depends on the questions the graph must serve. The model is genuinely powerful — it ties a real run to the knowledge that governs it and the lineage around it — but it is an abstraction whose resolution is chosen, not given, and a graph that looks authoritative can still be silently coarser than the process it claims to describe.
Why it matters
The production bioreactor is the hinge of the entire genealogy and the place where development knowledge meets manufacturing evidence. Model the vessel, batch, and process as distinct entities and the graph can reason about equipment reuse, run lineage, and per-phase control all at once; attach the realized parameters to the design space and a release decision can be checked against the knowledge that justifies it; index the sensor stream rather than swallowing it and the graph stays queryable while the raw data stays where it belongs. Get this node right and the upstream half of the digital thread is sound; fuse its entities or drown it in time-series triples and the most data-rich step in manufacturing becomes the least trustworthy node in the graph.
From the wire to the graph
The historian-map.rml.ttl bridge above starts from a PI row — (ts, tag, value, unit, quality, batch_id). But that row is not where the signal begins. One hop below, on the real bioreactor, the probe is an OPC UA variable node on the DCS/PLC, and a read is a DataValue: a NodeId, a Value, a SourceTimestamp, a StatusCode, and the node's EUInformation engineering unit. The companion examples/platform/ontology/opcua_to_rdf.py maps those reads into the same sosa:Observation the historian map mints — because the NodeId's string identifier is the historian tag, so ns=2;s=BR101.Temp.PV and tag BR101.Temp.PV land the same observation IRI, and EUInformation "Cel" resolves to qudt:hasUnit unit:DEG_C plus qudt:ucumCode "Cel". The whole path is now continuous: probe → OPC UA node → historian row → sosa:Observation → bp:hasTrace index. (OPC UA is production at the wire; OPC UA LADS for lab devices is still piloted.)
# opcua_to_rdf.py — a read maps into the SAME sosa:Observation the historian map mints.
OPCUA_READS = [
{"node_id": "ns=2;s=BR101.Temp.PV", "value": 36.51, "source_timestamp": "2026-03-02T08:00:10Z",
"status_code": "Good", "eu_ucum": "Cel", "batch_id": "BATCH-2026-001"},
]
STATUS_TO_QUALITY = {"Good": 192} # OPC UA StatusCode -> PI quality
UCUM_TO_QUDT = {"Cel": UNIT["DEG_C"]} # EUInformation UCUM -> QUDT unit
# obs/{tag}/{ts} -> same IRI as the historian; the NodeId identifier IS the tag.
# OPC UA -> RDF: 16 triples from 2 variable reads
The unit no longer travels bare. The realized setpoint bp:RPS-temp-CCP001 now carries bp:setpointValue bp:RPS-temp-CCP001-qv, a qudt:QuantityValue with qudt:numericValue 36.5, qudt:hasUnit unit:DEG_C, qudt:hasQuantityKind qkind:Temperature, and qudt:ucumCode "Cel" — and historian_to_rdf.py emits the unit too ("historian -> RDF: 23 triples from 3 rows"). The same discipline reaches the as-run record: b2mml_to_rdf.py reads examples/platform/ontology/b2mml/batch-production-record.xml and turns its batch ID BATCH-2026-001 into bp:BATCH-2026-001, typed once as a bp:Batch, while the record’s equipment reference to BR-101 becomes bp:CCP-001 bp:occursIn bp:BR-101 on the run — an edge, never a second rdf:type smuggled onto the batch — with the per-phase actuals carried as QUDT quantity values ("B2MML -> RDF: 8 triples from the master recipe, 25 more from the as-run batch record, 33 total"). Real MES platforms like Körber PAS-X and Siemens Opcenter (production) exchange exactly this B2MML.
The vessel itself acquires its context here. ISA-95 places it in a hierarchy — bp:BR-101 bp:isEquipmentPartOf bp:CELL-BR101 → bp:AREA-USP → bp:SITE-A → bp:ENT-Acme (unit → process cell → area → site → enterprise, production) — and an Asset Administration Shell (IEC 63278, piloted in pharma) names the same physical thing: bp:AAS-BR-101 a bp:AssetAdministrationShell ; bp:describesAsset bp:BR-101 ; bp:assetSerialNumber "BR101-SN-0001" ; bp:maxWorkingVolumeL 2000.0. The twin's asset node is the ontology's equipment node, which is what lets the shop floor and the digital twin speak about one bioreactor in one identity.
In the real world
The phased cell-culture run, its critical parameters, and its dense PAT monitoring are exactly how commercial antibody production operates, and real-time spectroscopic monitoring of multiple parameters in mammalian-cell bioreactors is established practice [1][3]. The vessel-batch-process distinction is implicit in the ISA-88 batch model every control system already follows [2]; the open-source upstream chapter shows the live capture of these signals, and its historian is the very store this chapter's graph points its hasTrace IRIs at. What the ontology adds on top is the ability to ask cross-cutting questions — which batches ran on this vessel, within passage limits, holding which parameters, yielding which CQA — that no single control system or historian answers on its own.
Key terms
- Bioreactor (vessel) — the equipment, a persisting material entity that bears the role of production reactor for one campaign; distinct from the batch and the run.
- Batch — the material the run produces (
BATCH-2026-001); a material entity existing only for this run,derivedFromthe seed train. - Cell-culture process — the occurrent that
occursInthe vessel, that the cells participate in, and that outputs the batch; disjoint from the batch, so the two cannot be conflated. occursIn— the explicit edge linking the run to the persisting vessel it ran in (aligned to BFO 'occurs in'); the proper home for the run-to-vessel fact, in place of typing the batch as equipment.- Phase (growth / production) — a temporal sub-process of the run, with its own boundaries, to which phase-specific parameter ranges attach.
- Realized parameter — a run's actual setpoint value, the instance-level counterpart of the design space's type-level
affectsQualityranges. - Multi-source reconciliation — when two systems describe one physical thing with conflicting types, the fix is to map each claim to the right entity (here, the vessel to
BR-101) and record provenance — not to keep bothrdf:types on one IRI, whichMaterial owl:disjointWith Equipmentflags as an error. hasTrace(index vs payload) — the edge naming a dense PAT signal and pointing by IRI to the historian, so the second-by-second stream is referenced, not flattened into triples.
Where this leads
The product is made; it is suspended in a tank full of cells. The next chapter, Modeling Harvest and Clarification: A Material Transformation, models the first downstream step — separating the antibody-bearing fluid from the cells — as a unit operation that consumes one material and produces two, and uses it to confront the question lurking under every derivedFrom edge: where exactly one material ends and the next begins.