Conceptualization: Classes and the Taxonomy Under BFO
📍 Where we are: Part III · Conceptualization — the third phase of the lifecycle, governed by NeOn (which frames conceptualization as its own scenario) and run on SAMOD's test-first rhythm. The requirements are written and the reuse decisions are made. Now we do the act everyone wrongly thinks comes first: we name the entities of the antibody campaign.
The temptation, on the day you finally get to draw classes for a monoclonal-antibody program, is to draw the ones that feel most concrete — Bioreactor, Batch, Antibody — and worry about where they sit later. That is exactly backwards. A class without a category is a label, not a model: it does not tell a reasoner what kind of thing it is, what it can and cannot be confused with, or which relations may attach to it. So conceptualization here means fixing, for every entity the competency questions demand — the target the antibody binds, the engineered CHO line, the banked vials, the production run, the released vial — its place under the BFO upper spine before anything else, because that placement is what makes the rest of the model checkable. The molecule that travels the whole length of the process is the one to get right first, and it is the one that fools you most.
Before you stock a warehouse you decide on aisles: this aisle is for tools (they persist, you reuse them), this one for stock (it flows through and ships out), this one for the logbook (information about the other two). Put a forklift in the stock aisle and inventory counts it as sellable goods — a category error that costs you. An ontology's "aisles" are BFO's top categories: a thing that persists (a bioreactor), a thing that flows through (a batch of broth), a happening (a two-week cell-culture run), a measurable feature (monomer purity), and information (the antibody's sequence, the product concept). This chapter assigns every entity of the campaign to its aisle, and the most important assignment in the whole book is keeping the vessel, the batch, and the run in three different ones.
Start from the questions
Conceptualization is still answerable to the brief. The classes named below exist to make specific competency questions about a real campaign askable. The continuant/occurrent split this chapter draws is what underwrites the structural CQ-21 — which vessel did a cell-culture run occur in? — a question that is only coherent if the run (an occurrent) and the vessel (equipment) are different entities joined by an edge, never one node wearing two types. It sets up CQ-20 (the host organism the line expresses its product in — for this program, Chinese hamster), the lineage questions CQ-01 and CQ-02 (which need a derivedFrom spine over materials so a drug-substance lot can be walked back to the working cell bank), and the disjointness guards CQ-23 tests (a Batch typed as a process, or as a bioreactor, must be caught). Every class below earns its place by pointing at one of these — and a class that serves no question does not get drawn.
The spine made local: six top categories, one occurrent
The first move is to make BFO's categories local — to declare the handful of top classes everything else hangs from, so the model never has to import all of BFO to say which aisle a thing lives in. Five are continuants (things that persist through time); one is the occurrent (a happening). These are the verbatim declarations the running example opens with:
# bioproc.ttl — the BFO spine, made local. Five continuants and one occurrent.
bp:Material a owl:Class ; rdfs:label "Bioprocess material" ;
skos:definition "A physical material that participates in or is produced by the process: cells, media, broth, harvest, pools, drug substance, drug product, excipients, packaging." .
bp:Equipment a owl:Class ; rdfs:label "Equipment" ;
skos:definition "A persisting piece of equipment or consumable apparatus; distinct from the material it processes — it survives across many batches." .
bp:Quality a owl:Class ; rdfs:label "Quality" ;
skos:definition "A measurable attribute that inheres in a material entity (purity, concentration, temperature, turbidity, integrity)." .
bp:RealizableEntity a owl:Class ; rdfs:label "Realizable entity" ;
skos:definition "A specifically dependent continuant — a role, disposition, or function — that inheres in a bearer and is realized in some process." .
bp:InformationArtifact a owl:Class ; rdfs:label "Information artifact" ;
skos:definition "Copyable information (a generically dependent continuant): a recipe, specification, record, certificate, model, identifier, or claim." .
bp:Process a owl:Class ; rdfs:label "Bioprocess process" ;
skos:definition "Something that happens and unfolds in time (a fermentation, a unit operation, an assay, a shipment) — an occurrent with temporal parts." .
This is the continuant/occurrent cut drawn at the very top, and it is the most consequential line in the model. The split is not philosophy for its own sake — it is what buys batch traceability. The fermentation is an occurrent: it happened once, over roughly two weeks, and is gone. The batch it produced is a continuant: a material that persists, gets sampled, gets a release verdict, and ships. Because the two are different aisles, the graph can say the same vessel hosted a different run last month, attach a purity quality to the batch but a duration to the run, and trace a lot back through every material that preceded it — none of which is possible if the run and the batch are one fuzzy node. Material, Equipment, Quality, RealizableEntity, and InformationArtifact are all continuants; Process is the occurrent; everything downstream is a subclass of exactly one of these six, so a reasoner can tell, of any entity, which aisle it lives in. The OWL 2 substrate that lets these declarations carry logical force is the standard RDF/OWL stack the vocabulary chapter introduced [1][2].
These six categories are the OBO Foundry's and the IOF's shared currency — the Relations Ontology standardizes the verbs that join them (a run participates in nothing, but cells participate in it; a lot derives from its parent), and a relation declared once against Material works for every material the campaign ever names. That standardization is why the discovery side of the program (GO, PRO, OBI) and the manufacturing side (IOF) can meet at all, a seam this chapter returns to.
Middle-out modeling: name the obvious things, then reach up and down
You do not build a taxonomy strictly top-down (start from Material and refine forever) or strictly bottom-up (list every leaf and generalize later). The practical method, and the one this model follows, is middle-out: start from the entities the domain experts name without hesitation — Batch, DrugSubstance, CellLine, Bioreactor — then reach up to confirm each sits under the right top category, and down to add the distinctions a competency question needs.
The cell-bank tier is the cleanest example, and it shows why a manufacturing program cannot model the bank as one flat thing. A scientist names "the cell bank" without hesitation — but a manufacturing program does not hold one bank, it holds a tightly governed three-tier hierarchy, and the tiers exist for release reasons, not tidiness. A research cell bank (RCB) is the early aliquot; a master cell bank (MCB) is the regulatory anchor, made once and characterized exhaustively; a working cell bank (WCB) is drawn from the MCB and is what actually inoculates a campaign. Each tier sits at a different passage number, carries a different burden of testing, and answers a different release question. Middle-out modeling reaches down from "the cell bank" to those three tiers and up to anchor them all in Material:
# bioproc.ttl — cell-line materials, reached at by middle-out modeling from "the cell bank".
bp:CellLine a owl:Class ; rdfs:subClassOf bp:Material ; rdfs:label "Cell line" ;
skos:definition "An engineered CHO production cell line expressing the product antibody; the population the cell banks are aliquoted from." .
bp:CellBank a owl:Class ; rdfs:subClassOf bp:Material ; rdfs:label "Cell bank" .
bp:ResearchCellBank a owl:Class ; rdfs:subClassOf bp:CellBank ; rdfs:label "Research cell bank (RCB)" .
bp:MasterCellBank a owl:Class ; rdfs:subClassOf bp:CellBank ; rdfs:label "Master cell bank (MCB)" .
bp:WorkingCellBank a owl:Class ; rdfs:subClassOf bp:CellBank ; rdfs:label "Working cell bank (WCB)" .
bp:HostOrganism a owl:Class ; rdfs:subClassOf bp:Material ; rdfs:label "Host organism" ;
skos:definition "The biological host of the cell line (Chinese hamster, Cricetulus griseus)." .
Making CellBank a superclass with three subtypes — rather than one flat class with a "tier" string — is what lets the impact and release CQs be class queries ("return every working cell bank within passage limit"), and it lets formalization state that the tiers are mutually exclusive: a vial is an RCB, an MCB, or a WCB, never two at once. The instances chapter carries this slice forward with that disjointness axiom and the passage-limit and characterization constraints. The detailed work of the conceptualization phase is exactly this: deciding, question by question, how far down to refine and where to stop — and the RCB/MCB/WCB tiers are not optional good practice but the structure regulatory guidance on cell substrates expects, the reason CHO production lines (sequenced, public genome, the dominant host for commercial antibodies) are banked this way at all.
Naming the first entities: the target and the product concept
Conceptualization begins at the program's first day, where the medicine is still an idea about a disease. The target — the disease-relevant protein the antibody will bind — and the product concept — what the program intends to make — are named here, and their category assignment is the lesson. The target is a bp:Material (it is a physical protein, even if named by reference to a Protein Ontology IRI so the program is wired into public biomedicine from its first entry), while the product concept is an InformationArtifact — a generically dependent continuant that exists on day one, before any cell makes the molecule [3]:
# bioproc.ttl — the first entities, each placed in its BFO aisle.
bp:Target a owl:Class ; rdfs:subClassOf bp:Material ; rdfs:label "Molecular target (antigen)" ;
skos:definition "The disease-relevant protein the antibody is designed to bind; named by a Protein Ontology IRI." .
bp:ProductConcept a owl:Class ; rdfs:subClassOf bp:InformationArtifact ; rdfs:label "Product concept" ;
skos:definition "The day-one information artifact naming what the program intends to make; the released drug product conformsTo it, closing the loop from idea to vial." .
bp:MonoclonalAntibodyProduct a owl:Class ; rdfs:subClassOf bp:ProductConcept ; rdfs:label "Monoclonal antibody product (concept)" .
Typing the product concept as information — not as a material that does not yet exist — solves a real program need rather than a philosophical one: on the day the program begins, no cell makes the antibody and no vial holds it, yet the team reasons about it constantly (its intended target, its planned indication, its desired properties). Categorizing the concept as a generically dependent continuant lets the graph hold that design intent before there is anything to point a sensor at, and lets the realized drug-product lot, years later, conformsTo it — closing an unbroken thread from idea to vial across discovery, cell-line, manufacturing, and release. The mechanism of action that links target to concept is modeled the same way: as relations (the antibody binds the target; the binding inhibits a molecular function), not a sentence in a slide, so a later question — which programs target this pathway? — is a query rather than a literature review. This is the BFO category buying a real modeling capability.
The molecule is three things at once
The lead molecule looks like one thing and is modeled as several, because a single antibody is genuinely three kinds of thing and a graph that fuses them loses its most important claims. There is its sequence — the amino-acid string, pure information you can copy and email, a generically dependent continuant in the same aisle as the product concept and the recipe. There is each batch of protein actually expressed from it — material entities that exist only once a cell makes them. And there is its modality — monoclonal antibody, bispecific, antibody-drug conjugate — a class the candidate instantiates, supplied by the Protein Ontology and modality vocabularies. Confuse the score, the performances, and the genre, and you cannot say "the same molecule was made twice." Keep them apart and the graph can state that two physically distinct lots are "the same molecule" precisely because they realize the same sequence and instantiate the same shared type:
# bioproc.ttl — the molecule is a Material; its developability is a branch of RealizableEntity.
bp:Antibody a owl:Class ; rdfs:subClassOf bp:Material ; rdfs:label "Antibody molecule" .
bp:CandidateMolecule a owl:Class ; rdfs:subClassOf bp:Antibody ; rdfs:label "Candidate antibody molecule" .
bp:Disposition a owl:Class ; rdfs:subClassOf bp:RealizableEntity ; rdfs:label "Disposition" ;
skos:definition "A realizable tendency of a material entity to behave a certain way under triggering conditions (a resin's tendency to bind antibody; a molecule's tendency to aggregate)." .
bp:Developability a owl:Class ; rdfs:subClassOf bp:Disposition ; rdfs:label "Developability disposition" ;
skos:definition "A molecule's manufacturability tendency assessed in discovery (aggregation propensity, thermal stability, expression titer, viscosity)." .
bp:AggregationPropensity a owl:Class ; rdfs:subClassOf bp:Developability ; rdfs:label "Aggregation propensity" .
The modality is not idle taxonomy: it decides which downstream process classes even apply. A bispecific implies steps and quality attributes a plain mAb does not, so typing the candidate by modality lets the graph reason about which manufacturing template fits — the reason CandidateMolecule sits under Antibody (and through it under the shared IgG class the alignment file borrows) rather than floating free.
The molecule's developability — its aggregation propensity, thermal stability, expression titer, viscosity — is the harder modeling call, and the categories pull it apart again. A candidate can bind the target perfectly and still be undevelopable: prone to aggregation, unstable, viscous at concentration, hard to express. The naive model records these as a column of numbers. The better model, and the one BFO makes available, treats each as a disposition — a realizable entity, a real tendency the molecule bears whether or not it is currently measured — distinct from the assay result that is the evidence for it [4]. "This candidate aggregates" is a claim about the molecule's nature; "the assay read a high-molecular-weight fraction" is one measurement supporting it. Placing Developability under RealizableEntity rather than Quality is what lets the graph accumulate multiple lines of evidence for one underlying property, flag when two assays disagree, and — the payoff — carry an aggregation tendency forward as a manufacturability risk into process development, where it becomes a specific aggregation CQA to control and a specific release test in QC. Model the disposition and that forward thread is automatic; model it as a number in a report and the manufacturing team re-derives from scratch what discovery already knew.
The selection itself is worth a category. Discovery is a sieve — a program generates many candidates, measures them, and narrows to one lead — and that screening campaign is an occurrent: a happening with candidate participants, assays, and ranking criteria. Typing it as a bp:Process means the graph records not just that this lead was chosen but which candidates it beat, on which assays, by which criteria, so when a regulator asks "why this molecule?" the answer is a reconstructible subgraph rather than a recollection. The losing candidates stay linked as participants; the decision stays defensible years later.
The conceptualization that matters most, drawn: a persisting vessel (equipment), a batch (material), and a phased cell-culture run (occurrent) are three different entities in three different BFO aisles — the run occursIn the vessel and hasOutput the batch, joined by edges, never fused into one node.
Original diagram by the authors, created with AI assistance.
The sharp cut: a vessel, a batch, and a run are three things
This is the chapter's hardest and most valuable distinction, and it is the one a careless model quietly breaks by fusing three entities into one fuzzy node — at the most data-rich step in the whole campaign. The bioreactor BR-101 is Equipment: it persists across campaigns and ran a different batch last month. The batch BATCH-2026-001 is a Material: the broth, then the antibody-bearing culture, which exists only for this run. The cell-culture process CCP-001 is an occurrent: the two-week happening that occursIn the vessel and hasOutput the batch, and which is itself split into a growth phase (cells multiply) and a production phase (cells make antibody), each obeying different parameter ranges from the design space. Three BFO categories, named as three branches of the taxonomy:
# bioproc.ttl — three categories where it is tempting to model one.
bp:Batch a owl:Class ; rdfs:subClassOf bp:Material ; rdfs:label "Bioreactor batch" ;
skos:definition "The material produced by one cell-culture run; exists only for that run (distinct from the vessel that held it)." .
bp:Bioreactor a owl:Class ; rdfs:subClassOf bp:Equipment ; rdfs:label "Bioreactor" .
bp:ProductionBioreactor a owl:Class ; rdfs:subClassOf bp:Bioreactor ; rdfs:label "Production bioreactor" .
bp:CellCultureProcess a owl:Class ; rdfs:subClassOf bp:Process ; rdfs:label "Cell-culture process" .
Keeping these apart is what lets the graph state that the same BR-101 vessel ran a different batch last month — a new material, a new process, the same equipment — without any contradiction, and it is what makes CQ-21 ("which vessel did this run occur in?") a coherent question rather than a tautology. The run-to-vessel fact rides on an explicit occursIn edge from the process to the equipment; it is never a second rdf:type smuggled onto the batch. This is also where the design space's parameters finally have a real run to attach to: the type-level fact that feed rate affects monomer purity becomes, in this run, a setpoint held within a phase — the design space said what should be true, the run records what was. And because phases are sub-processes, not labels, a feed rate critical during production but irrelevant during growth attaches to the phase it governs, not to the run as an undifferentiated whole. The standard ISA-88 batch model every control system already follows draws this same vessel-batch-process line implicitly [2] — the ontology just makes it explicit and checkable.
A second value of the cut is integration honesty. When two source systems describe one run — the MES batch register calls it a batch, a genealogy loader notes its parent "ran in a bioreactor" — a naive union types the one node as both a Batch and a Bioreactor, fusing the material with the vessel. Because the taxonomy keeps Batch (Material) and Bioreactor (Equipment) in disjoint aisles, that conflation is a flagged error, and the faithful resolution is forced: send the genealogy's "bioreactor" to the separate vessel BR-101, with the run occursIn it, and record with provenance that each rdf:type claim came from a different source. Conflict reconciled, no node wearing two incompatible categories. (The torrent of PAT sensor data the run produces is indexed in the graph and stored in the historian, never flattened into triples — a discipline the relations chapter draws with the hasTrace edge.)
The branches kept disjoint
A taxonomy of well-named classes is still only a taxonomy until the categories are declared mutually exclusive where BFO says they must be. The running example states the continuant/occurrent and material/equipment disciplines as disjointness axioms over the top categories — the formal teeth that turn "a batch is not a run" from advice into an enforced rule:
# bioproc.ttl — the spine's disjointness: nothing is two top categories at once.
bp:Material owl:disjointWith bp:Process , bp:Equipment , bp:Quality , bp:RealizableEntity , bp:InformationArtifact .
bp:Process owl:disjointWith bp:Quality , bp:RealizableEntity , bp:InformationArtifact , bp:Equipment .
bp:Quality owl:disjointWith bp:RealizableEntity , bp:InformationArtifact .
The full force of these axioms — how a reasoner uses them to catch a Batch typed also as a Bioreactor, and why the runnable pipeline needs a SHACL guard alongside them — belongs to the next phase, formalization. Conceptualization's job is to draw the disjoint branches correctly; formalization gives them their logical bite. (The class-level bp:Batch owl:disjointWith bp:CellCultureProcess axiom is also declared, pinning the specific conflation this chapter warns against, and the three cell-bank tiers are declared pairwise disjoint so a vial cannot be both a master and a working bank.)
Anatomy of a single campaign batch, all six aisles at work
The payoff of the taxonomy is visible in one slice of the campaign, where every BFO category does a job at once. The batch material BATCH-2026-001 is a Material; the vessel BR-101 that held it is Equipment; the cell-culture run that linked them is a Process (an occurrent); the monomer purity measured on the lot downstream is a Quality; the production-reactor role the vessel bears for this one campaign is a RealizableEntity; and the master batch record that prescribed the run, like the recipe and the antibody's sequence, is an InformationArtifact. Six entities, six aisles, joined by relations — and because each is categorized, the graph can ask of the same batch which vessel hosted it, which parent it derived from, what purity it carried, and which recipe it realized, all without any of those facts colliding. That is the single capability the whole conceptualization phase exists to buy.
Evaluation: does the taxonomy hold its categories?
The conceptualization is validated the SAMOD way — by loading the vocabulary and the campaign's instances and confirming the categories survive contact with the data. validate.py parses the model and reasons over it; the structural checks that exercise this chapter's splits report the categories are kept clean:
# Real output from validate.py (owlrl OWL-RL closure + SHACL over the running example).
[1] parsed 2120 triples (bioproc + align + instances)
[2] reasoned: 2120 -> 7137 triples after OWL-RL closure
[3] competency questions (ORSD v1.0.0 acceptance tests):
CQ GROUP RESULT DETAIL
----- -------------- ----- ----------------------------------------
CQ-01 lineage PASS 11 row(s)
CQ-22 structural PASS transitive lineage + equipment-is-material inferred
CQ-23 structural PASS Batch-as-process and Batch-as-bioreactor both caught
The two planted conflations — a Batch typed as a process, and a Batch typed also as a bioreactor — are both caught, which is CQ-23 passing: the vessel-batch-run cut this chapter drew is not just documented but enforced. (Because OWL-RL does not act on owl:disjointWith, the runnable catch is the SHACL guard; a full DL reasoner like HermiT or ELK enforces the disjointness axioms directly [5].) The long-range derivedFrom inference confirms the Material branch is wired into one transitive lineage spine — eleven ancestors deep, from the drug substance back to the research cell bank — the precondition for CQ-01 and CQ-02.
The cell bank's category buys safety-critical genealogy
The hardest entity to categorize is the one with the most at stake: the cell bank is a living thing, and the taxonomy is what lets the graph treat it as a stable manufacturing root despite that. A WCB is typed as a Material so its vials can be derivedFrom the MCB and can hand cells forward to the seed train; but the choices the taxonomy forces — a host organism, a creating transfection, and a burden of characterization — are what make the root trustworthy. Two of those relations are deliberately functional, which deduplicates provenance: a bank descends from exactly one clone and exactly one transfection, so two source systems both asserting createdBy cannot mint two parents — the functional property collapses duplicate provenance assertions into one fact. The transfection itself is an occurrent (bp:Transfection rdfs:subClassOf bp:Process), with the genetic construct as input and the engineered line as output — the event that created the line, modeled as a happening, not a static attribute.
The category also lets formalization make characterization mandatory. A working cell bank is constrained, open-world, to bear at least one characterization result, and the runnable SHACL gate enforces the same condition closed-world — a bank with no characterization recorded is "incomplete," which for a manufacturing root is a hard failure, not a charitable "unknown":
# shapes.ttl — the cell-bank gate (closed-world): a WCB must be characterized and have a passage count.
bp:CellBankShape a sh:NodeShape ;
sh:targetClass bp:WorkingCellBank ;
sh:property [
sh:path bp:hasCharacterization ;
sh:minCount 1 ;
sh:message "A working cell bank must carry at least one characterization result." ] ;
sh:property [
sh:path bp:passageNumber ;
sh:minCount 1 ; sh:maxCount 1 ; sh:datatype xsd:integer ;
sh:message "A working cell bank must record exactly one passage count." ] .
The four characterizations WCB-CHO-001 actually carries are not generic completeness fields; each prevents a specific catastrophe. Identity/authentication guards against the misidentification that has cross-contaminated cell lines across the life sciences for decades — and for a manufacturing root node, misidentification is the worst possible error, because it propagates transitively through every derivedFrom edge to every downstream batch with full confidence, and no downstream integrity check can catch it. Sterility/mycoplasma, adventitious-agent/viral safety, and genetic stability each close a different hole that regulatory guidance on cell substrates requires closed. The passage number, meanwhile, is bounded by a validated passage limit because passage count tracks culture duration, productivity drift, and product-quality shift: a culture at passage 60 is not obviously "the same" as the one at passage 5, and owl:sameAs cannot answer whether a living, mutating population is still itself after N generations. The taxonomy gives the bank a stable IRI; it cannot make the biology stand still, and naming that limit honestly is part of the model.
Anatomy of one class definition
To see naming as a craft rather than a list, take bp:Batch apart field by field — every line a real declaration, every line buying something. The rdfs:label ("Bioreactor batch") is the human-readable name, distinct from the IRI a machine uses. The rdfs:subClassOf bp:Material places it in the material aisle and, through that, under IOF Core and BFO. The skos:definition records why the class exists and what makes it distinct — here, that it "exists only for that run," the very phrase that keeps it apart from the persisting vessel. The disjointness it inherits from Material is what guards the category error. And the release attributes a lot carries — releaseStatus, monomerPct — are scoped to Material, not to Batch, on purpose: it is the released drug-substance and drug-product lots, also materials, that actually carry a release verdict and a monomer result, so scoping these to Batch would have wrongly typed every lot as a batch. This is the same identity-card discipline the series applies to a data point, now applied to the schema: a class travels with its alignment, its category, and its definition, so the meaning is in the model, not in a modeler's memory.
One class, fully unpacked: a label for humans, a subclass chain that places it in the material aisle and aligns it upward, a disjointness that guards a category error, the properties it bears, and the annotation that records why it exists — the conceptualization captured per class.
Original diagram by the authors, created with AI assistance.
The forward fork and the bulk-to-discrete shift
The taxonomy also has to span the moment the product changes kind. The drug substance DS-001 is one material; from it two distinct drug-product lots, DP-001 and DP-002, are filled — a forward fork in the genealogy, the mirror image of the backward pooling fork upstream. Because both lots are Material typed and joined to DS-001 by derivedFrom, the fork is traversable structure: when one lot goes out of specification, a query walks up to the shared substance and back down to its siblings, computing shared-fate impact rather than guessing it. The campaign's deliberately out-of-spec sibling makes this concrete — DP-004 is OOS on HMW aggregate (above the release limit) while still in spec on monomer — and the impact walk from it climbs derivedFrom to the cell bank the whole campaign shares. The release attributes the categories support are exactly the threads that gate disposition: monomer purity, HMW aggregate, fill volume, sterility, appearance — each a Quality on a lot, each a reason a vial does or does not ship.
The fill step is also where the modeled kind shifts from bulk to discrete: upstream the material is measured by concentration and titer; the drug product is, in the end, a counted population of vials. The container-closure system becomes part of product identity, its integrity a quality on which sterility and stability depend, and excipients enter as components aligned to ChEBI. The taxonomy carries the lot-versus-item individuation tension forward — when does a "lot" stop being the unit of release and a serialized vial become the thing tracked? — but defers it to serialization, designing the classes wide enough that the future answer fits without re-categorizing.
The unsolved part: where to stop refining, and the seam you must own
Conceptualization has no natural floor. Should Batch have subclasses for fed-batch versus perfusion? Should Disposition enumerate every developability property, or stop at four? Every extra distinction is a class a reasoner must hold and a maintainer must keep true, and a taxonomy that refines without discipline becomes the boundaryless model the ORSD was written to prevent. The honest standard is not "is the taxonomy complete?" — no model of a living, mutating CHO culture ever is — but "does each class earn its place by serving a competency question, and does the model stop there?"
A second honest difficulty is the seam this chapter straddles. The classes that best describe the target and the molecule — the Gene Ontology, the Protein Ontology, OBI — grew up in the OBO Foundry, the biomedical world; the classes that best describe making the antibody — IOF Core, the IOF biopharma ontologies — grew up in the Industrial Ontologies Foundry. Both descend from BFO, which is exactly why a Target (a PRO protein) and a Batch (an IOF material) can live in one taxonomy; but "can meet" is not "have met." There is no single settled bridge that says how an OBI assay relates to an IOF measurement process, which is why the running example keeps ProductConcept an information artifact and parks the OBO–IOF boundary between the research terms and the making terms rather than through the concept itself. That seam is the program's to author, review, and maintain — not a feature it imports — and it is where this book's two ontology worlds shake hands. Both judgments are made by people, against the brief, which is why conceptualization remains a reviewed, governed activity rather than a generated artifact.
Why it matters
The category you assign an entity is the most leveraged decision in the whole model, because every later axiom, every relation's domain and range, and every disjointness guard is stated in terms of these six top classes. Place the batch, the vessel, and the run in three aisles and the graph can reason about equipment reuse, run lineage, and per-batch quality all at once; fuse them into one node and the most data-rich step in manufacturing becomes the least trustworthy node in the graph. Model the molecule as a sequence, a material, a modality, and evidenced dispositions and a discovery aggregation risk travels forward as a controlled CQA; flatten it to a name at the tech-transfer wall and the manufacturing team re-derives what discovery already knew. Conceptualization is cheap to do well on day one and ruinous to retrofit, because re-categorizing a class after instances and axioms depend on it touches everything downstream.
In the real world
Naming entities under a shared upper ontology is established practice in research informatics and a maturing one in manufacturing. The BFO spine and the OBO Foundry's coordinated ontologies give a program ready-made categories for biological entities [3], and the developability-aware modeling of candidate molecules mirrors how antibody discovery actually reasons about manufacturability risk [4]. The RCB/MCB/WCB hierarchy and its characterization burden are how every commercial CHO antibody line is actually banked, driven by regulatory expectations on cell substrates rather than by ontology taste. What is still uneven is the discipline of drawing the manufacturing taxonomy — the vessel, the batch, the run, the pools, the lots — under the same spine as the research entities, so the category cut this chapter draws is honored across a plant's many source systems rather than re-broken at every integration. The vessel-batch-process distinction is implicit in the ISA-88 model every control system follows [2]; making it an explicit, checkable taxonomy is the value the ontology adds.
Key terms
- Continuant / occurrent — the load-bearing cut: a continuant persists whole through time (a vessel, a batch, a quality, a sequence); an occurrent happens and has temporal parts (a cell-culture run, a screening campaign, a transfection). The top of the taxonomy, and what buys batch traceability.
- Top category — one of the six local spine classes:
Material,Equipment,Quality,RealizableEntity,InformationArtifact(continuants) andProcess(the occurrent); every class is a subclass of exactly one. - Sequence / substance / modality — the antibody as three things at once: its amino-acid sequence (information), each protein batch made from it (material), and its kind (a class that decides which manufacturing template applies).
- Vessel / batch / run — three entities it is tempting to model as one:
Bioreactor(Equipment, persists),Batch(Material, exists only for the run),CellCultureProcess(the occurrent); joined byoccursInandhasOutput, never by a sharedrdf:type. - Cell-bank tiers (RCB/MCB/WCB) — the three-tier
CellBankhierarchy a campaign holds; the working bank is the genealogy's root, constrained to carry passage count and four characterizations (identity, sterility, viral safety, genetic stability). - Disposition — a
RealizableEntity: a real tendency a material bears (a molecule's developability) that is realized in a process and measured by an assay, distinct from the value that is its evidence — the thread that carries a manufacturability risk forward. - Information artifact — a generically dependent continuant (the product concept, the sequence, the recipe, a result); copyable information that can exist before — or about — a material that does not yet exist.
Where this leads
The entities are named and placed in their aisles, the branches drawn disjoint. But a taxonomy of well-categorized classes still cannot do anything until the threads between them are declared. The next chapter, Conceptualization: Relations and the Genealogy Spine, wires the classes together — the transitive derivedFrom lineage that makes a drug substance walkable back to its working cell bank, the occursIn edge that keeps the run tied to its vessel without re-fusing them, and the participation and quality relations that turn six disconnected aisles into one navigable graph.