The Upper Spine: Continuants, Occurrents, and Why Everyone Builds on BFO

📍 Where we are: Part II · Reuse — the first move of standing on what already exists. The preface promised a model built step by step. We start at the very top of that model, with the small set of categories every later term hangs from.

The preface made a promise that sounds almost too easy: build one shared model and every system can read every notebook as one story. But there is a trap hiding in it. If we let the cell-culture team build their ontology and the lab build theirs and the warehouse build theirs, we have not escaped the heterogeneity problem — we have promoted it one level up. A biologist's "process" and an engineer's "process" will drift apart exactly the way BR-101 and Lot 26-001 did, only now the disagreement is buried in the structure of the model itself, where it is far harder to see and fix.

The cure is to agree, before anyone writes a domain term, on what the most general kinds of thing even are. That agreement is an upper ontology, and this chapter is about the one most of science and industry has settled on.

The simple version

Before you can write a dictionary, you need parts of speech. "Noun," "verb," and "adjective" are not in any particular dictionary — they are the categories every dictionary's entries fall into, which is exactly why dictionaries written by different people are still compatible. An upper ontology is parts of speech for reality: a tiny, domain-neutral set of categories — things that persist, things that happen, qualities, roles — that every bioprocess term, however specialized, slots into. Agree on those first, and a Bioreactor built by one team and a CellCultureProcess built by another cannot help but fit together.

What this chapter covers

We meet the Basic Formal Ontology (BFO) — the standardized upper ontology — and its one load-bearing distinction between continuants (things that persist) and occurrents (things that happen). We classify the everyday furniture of a bioprocess onto that spine, dissect a single batch to see all of BFO's categories at work at once, and then descend one level to the IOF Core mid-level that makes the abstraction usable for manufacturing. We close on the honest limit: a shared spine guarantees compatibility of structure, not agreement of choice.

What an upper ontology is, and why bioprocess needs one

An ontology, in this book's sense, is an explicit, machine-readable vocabulary of the kinds of thing in a domain and how they relate. An upper (or foundational) ontology is a small, domain-neutral vocabulary of the most general categories that everything falls under — independent of biology, of chemistry, of manufacturing [1]. It deliberately contains no Bioreactor and no Antibody. What it contains is the answer to "what kind of thing is a bioreactor, fundamentally?" — an object that persists — versus "what kind of thing is a fermentation?" — a process that unfolds. Build every domain ontology on the same upper categories and they become reusable and combinable by construction, the way the data-management book first argued; this chapter shows the spine itself.

The upper ontology that science and engineering have largely converged on is BFO, the Basic Formal Ontology. It is not a hobby project or one vendor's invention: it is published as an international standard, ISO/IEC 21838-2, which certifies BFO as a conformant top-level ontology [1], and its design rationale is laid out at book length by its authors [2]. When the Industrial Ontologies Foundry and the biopharma working groups chose a top, they chose this one — so learning BFO is learning the grammar the manufacturing ontologies you will actually import are written in.

The one distinction that does the work: continuants and occurrents

BFO's core move is to split everything in the world into two families, and almost every modeling decision in this book traces back to which side of the line a thing sits on [2].

A continuant is a thing that persists through time as a whole, keeping its identity while it gains and loses parts. A cell, a bioreactor, a vial, a batch of drug substance, the purity of that batch, the role a vessel plays as "the production reactor" — all continuants. They exist in full at every instant they exist at all. If you point at one now and again in an hour, you are pointing at the same whole thing.

An occurrent is a thing that happens — it unfolds in time and is never present all at once, because it has temporal parts. A cell-culture run, a Protein A capture step, the act of taking a sample, the whole manufacturing campaign — all occurrents. You cannot point at a cell-culture run the way you point at a tank; at any instant you catch only a slice of it.

The split sounds abstract until you notice how often people model the wrong one. A "batch" is a continuant — the material — but the making of that batch is an occurrent. Conflate them and your graph cannot say that one bioreactor (a persisting object) hosted three different fermentations (three distinct happenings) last month, because you have only one fuzzy "batch" idea doing both jobs. Keeping the two apart is what lets the model speak precisely about equipment, material, and activity all at once.

Inside the continuants: objects, qualities, and realizables

Continuants subdivide in a way bioprocess uses constantly [2]. An independent continuant is a thing that exists on its own — a material entity like the bioreactor, the cell, the drug-substance lot. A specifically dependent continuant is a thing that exists only by inhering in something else: a quality such as the 98.611 % monomer purity (the fraction of the product that is intact, single-molecule antibody rather than clumped aggregate — a higher number is better), which cannot float free but must be the purity of some lot; or a realizable entity — a role (a vessel playing the role of production reactor in this campaign, a part it could stop playing), a disposition (a Protein A resin's affinity for the antibody's Fc region — a tendency it has whether or not any antibody is present), or a function (the designed purpose of a viral filter). (The mechanics of each of these process exemplars — how Protein A capture binds the antibody, how a viral filter clears virus — are the subject of Book 1's capture-chromatography and viral-filtration chapters; here they are only illustrations of the categories.) A generically dependent continuant is the odd, important one: information that can be copied across carriers — the recipe, the master batch record, the specification, the certificate of analysis. The spec is not the paper it is printed on and not the screen it shows on; it is the content, which BFO lets us model as a thing in its own right that those carriers bear.

That three-way cut — object, quality, information — is the difference between a model that can and cannot answer real questions. "What is the purity?" asks for a quality inhering in a lot. "What does the spec require?" asks about a generically dependent continuant. "What vessel was it?" asks about an independent continuant playing a role. One word, "batch," cannot carry all three; BFO's categories can.

And in a regulated plant that quality is not just any quality: SEC (size-exclusion chromatography) %monomer is the product's critical quality attribute (CQA) — an attribute whose value must stay within limits to assure the product's quality, safety, and efficacy — for size purity, and the line that decides whether a lot ships (is released to market) — SEC monomer at least 95.0 % — is a requirement specification, a generically dependent continuant the lot's value is checked against. So one familiar fact, "the batch is 98.611 % monomer," already splits three ways on the spine: a quality (the measured purity inhering in the lot), a generically dependent continuant (the acceptance criterion in the spec), and the occurrent of the release test that produced the number. The running example carries exactly this — bp:MonomerPct-CQA constrained by bp:AC-monomer (bp:specLow 95.0), with 98.611 the value inhering in bp:DS-001. (The prefix: before each name — bp: for our local bioprocess vocabulary, obo: and iof: for the shared ones introduced below — is just shorthand for a long web address, the full IRI; it decodes every later bp:/obo:/iof: token too.) The release-gate chapter turns that same line into the SHACL constraint that completes the quality-to-spec-to-gate arc.

These categories are not vague labels — each is a published BFO 2020 term with a stable IRI (Internationalized Resource Identifier — a globally unique web address that names the term), and the alignment file pins every local class to one. Material entity is obo:BFO_0000040, quality is obo:BFO_0000019, and process is obo:BFO_0000015. The realizable family is three sibling terms under realizable entity (obo:BFO_0000017): role (obo:BFO_0000023), disposition (obo:BFO_0000016), and function (obo:BFO_0000034) — which is why a resin's binding tendency, a vessel's production-reactor role, and a viral filter's designed purpose each land in a different box rather than blurring into one "property" field. Those opaque numeric IRIs are the point: a label like "role" drifts between languages and teams, but obo:BFO_0000023 is the same identifier in every BFO-grounded ontology on earth, so two systems that both align to it agree on meaning without ever having compared their English.

The six categories this chapter leans on, with the published BFO 2020 IRI each local class pins to and the bioprocess exemplar that lands in it:

BFO category	IRI	Side of the line	Bioprocess exemplar
material entity	`obo:BFO_0000040`	continuant (independent)	the batch material; the bioreactor vessel
quality	`obo:BFO_0000019`	continuant (specifically dependent)	the monomer purity of the lot
role	`obo:BFO_0000023`	continuant (realizable)	the vessel's production-reactor role
disposition	`obo:BFO_0000016`	continuant (realizable)	the resin's antibody-binding tendency
function	`obo:BFO_0000034`	continuant (realizable)	the viral filter's designed purpose
process	`obo:BFO_0000015`	occurrent	the cell-culture run

Every later chapter that says "a quality (obo:BFO_0000019)" is pointing back at this row.

The BFO spine with bioprocess furniture hung on it: things that persist (objects, the qualities that inhere in them, the roles they play, the information about them) on the left; things that happen (the processes that make and purify) on the right. Original diagram by the authors, created with AI assistance.

Anatomy of one batch, placed on the spine

The payoff of an upper ontology is best seen by taking one familiar thing apart and watching every BFO category show up inside it. We use the batch for this — rather than, say, the vial or the chromatography column — because it is the one entity in the whole campaign that is quietly all six categories at once: object, separate-object equipment, process, quality, role, and information all collide in the single word "batch," which is exactly why conflating them is the most common modeling mistake on the floor. Take BATCH-2026-001 — not as a single record, but as the cluster of distinct entities a careful model sees there.

The batch material itself — the cells and broth and, later, the purified protein — is a material entity, an independent continuant. The production bioreactor that held it is also a material entity, but a different one: equipment persists across many batches, so the model must not fuse the vessel into the material. The cell-culture run — the days of growth and production — is an occurrent, a process, with the batch material and the vessel both participating in it. The 98.611 % monomer purity is a quality, a specifically dependent continuant that inheres in the drug-substance lot and nowhere else. The vessel's standing as "this campaign's production reactor" is a role, a realizable entity it bears for the duration and could shed. And the master batch record that prescribed the whole thing is a generically dependent continuant — information, copyable, distinct from any printout.

One batch is many BFO entities at once: material (the broth and the vessel, kept separate), a process (the cell-culture run they participate in), a quality (the purity that inheres in the lot), a role (what the vessel is being used as), and information (the record that prescribed it). Original diagram by the authors, created with AI assistance.

Notice what the discipline bought us. Because the vessel and the material are separate material entities, the graph can later say a different batch ran in the same vessel without contradiction. Because the cell-culture run is an occurrent that both participate in, "when did this happen?" has somewhere to attach. Because purity is a quality of the lot, it cannot be accidentally asserted of the empty tank. The upper ontology did not add facts; it gave every fact the right kind of home, which is what keeps the model honest as it grows.

There is a subtlety worth making explicit, because it is where OWL and SHACL divide the labour. (OWL, the Web Ontology Language, is what the categories and axioms are written in; SHACL is a companion language for data-validation rules; a reasoner is the program that draws logical conclusions from the axioms. The classes-and-taxonomy and release-gate chapters introduce them in full — here we only need that they exist.) Saying a Batch is a continuant and a CellCultureProcess an occurrent is not just commentary — the model writes it as an axiom, bp:Batch owl:disjointWith bp:CellCultureProcess. Under a description-logic reasoner (HermiT, ELK) that axiom makes any graph that types one node as both logically inconsistent: the reasoner refuses it. But the running example's offline validator runs an OWL-RL closure (owlrl), and OWL-RL deliberately does not act on owl:disjointWith — so to actually catch a planted batch-typed-as-a-run at validation time, the project adds a closed-world SHACL guard (bp:BatchNotProcessShape in shapes.ttl) that fails loudly. This is the practical division every adopter eventually meets: OWL disjointness is checked by a DL reasoner for consistency; SHACL enforces the same firewall as a data-validation rule. Competency question CQ-23 exists precisely to prove the guards fire, a thread the release-gate and competency-questions chapters pick up.

Written as triples in the running example, that one batch fans out into the distinct BFO kinds — a material entity, a separate material entity for the vessel, the role it bears, and the occurrent that outputs the batch and occurs in the vessel. Each line below is an RDF triple — subject, then predicate (the relationship), then object — written in Turtle notation, where a means "is a," ; continues another statement about the same subject, and ^^xsd:float tags a value as a decimal number; the inline comments explain the rest:

# instances.ttl — one batch, its vessel, and its run, each a different BFO kind.
bp:BATCH-2026-001 a bp:Batch ;                    # a material entity (independent continuant), typed once
    bp:derivedFrom bp:SEED-001 ;
    bp:participatesIn bp:CCP-001 ;                # ...that participates in an occurrent
    bp:monomerPct "98.611"^^xsd:float .           # ...and records a quality VALUE (a datatype-property shortcut)
bp:BR-101  a bp:ProductionBioreactor ;            # the vessel — a SEPARATE material entity
    bp:hasRole bp:BR-101-role .                   # ...bearing a realizable role
bp:BR-101-role a bp:ProductionReactorRole .       # a role (realizable entity)
bp:CCP-001 a bp:CellCultureProcess ;              # the run — an occurrent (process)
    bp:occursIn  bp:BR-101 ;                      # ...that occurs in the vessel
    bp:hasOutput bp:BATCH-2026-001 .              # ...and outputs the batch material

One honest seam to flag before we climb: the body above calls the 98.611 % purity a quality that inheres in the lot, but the running graph records it as a datatype-property literal (bp:monomerPct), not yet a free-standing quality individual the lot bears. That is the pragmatic shortcut almost every working ontology takes — a literal is cheaper to store and query than a reified entity — and it is exactly the kind of how deep do we model? judgment the formalization chapter makes deliberately, where the fully reified quality is shown alongside the shortcut. The category is right; the encoding is a choice.

The same three-way cut is what lets the plant data systems load cleanly. A historian tag stream off BR-101 — temperature, pH, dissolved oxygen sampled every few seconds — is not the run and not the vessel; it is a quality trace inhering in a continuant (the companion models it as bp:Trace-BR101-Temp a bp:Quality). The MES batch record that prescribed the run is a generically dependent continuant, copyable across the executed-record screen and the archived PDF. And the run itself, bp:CCP-001, is the occurrent the timestamps actually belong to. When the OPC UA and historian loaders in from-the-wire-to-the-graph and the open-source knowledge-graph chapter turn wire data into RDF, this is the spine they hang it on: traces become qualities, records become information artifacts, and time attaches to the process — not to the tank. That same wire-to-graph mapping is the ISA-95 and OPC UA information model re-expressed on a foundational spine: ISA-95's equipment hierarchy lands on the independent continuant branch, its process segments on the occurrent branch, and a B2MML batch record on the generically dependent continuant branch, which is exactly the semantic-interoperability bridge the data-management book argues turns four standards into one queryable graph rather than four silos.

Why a learning model wants the spine underneath it

The same discipline that keeps the graph honest is what makes it usable as the ground truth a machine-learning model is grounded against — the thread the ML/AI book opens on and this book closes on in Ontologies as the ground truth for AI. A model learns the shape of an answer; it does not learn what is true of BATCH-2026-001. Put the right kind of home under every fact and three things a learning system needs fall out of the spine for free.

First, the continuant/occurrent split is the honest validation fold. Because the bp:derivedFrom genealogy edge is a typed relation on the spine (a material entity to the parent material it came from), a (bp:derivedFrom)+ walk back to the shared cell bank recovers exactly which lots are not independent observations — sibling batches off one WCB-CHO-001 are near-twins, not separate draws. That walk is the group key for a leave-one-batch-out (or leave-one-cell-bank-out) split, the discipline Book 5's data chapter makes the default and the models-and-validation chapter turns into GroupKFold and nested cross-validation. The graph that traces a lot's lineage and the graph that defines an honest train/test boundary are the same graph; getting the spine wrong — fusing the vessel and the material, so a "batch" silently spans three reuses of one tank — is how a row-wise split sneaks past review and a model reports a fantasy score.

Second, the quality category is the typed, unit-bearing feature a model can reason over instead of a bare float. A historian tag that arrives as BR101.Feed.PV = 0.40 is semantically mute; the same reading landed on the spine as a feed-rate quality inhering in the production phase of BATCH-2026-001, carrying its unit and its lineage, is a feature a model can actually use to ask whether a feed deviation explains an aggregate excursion. The classification is what makes the feature retrievable, which is the whole point of the AI-readiness work the frontier survey catalogs.

Third — and this is the validation paradox worth naming — a reasoned graph and a learned model check different things, and a serious pipeline needs both. The SHACL release gate (bp:BatchNotProcessShape and the spec shapes the release-gate chapter builds) certifies that a subgraph is complete and well-typed — every lot has its bp:derivedFrom parent, every CQA its value — before that subgraph is ever handed to a model or a retrieval-augmented LLM. A graph that fails its shapes is the hollow, mislabeled input a fluent model will cheerfully complete from training memory; SHACL is how you catch it before the model does, confidently and at scale. The model supplies fluency over the facts; only the graph, grounded on this spine, supplies the facts — and only the spine makes the graph mean what it claims, the structure-versus-substance distinction restated as ground truth.

From spine to plant: the IOF Core mid-level

BFO is deliberately too abstract to model with directly — it has no Equipment and no Material, only object and process. The bridge from that thin top to a usable manufacturing vocabulary is a mid-level ontology, and for industry that is the IOF Core, published by the Industrial Ontologies Foundry as a BFO-grounded set of concepts every manufacturing domain can share — equipment, materials, processes, capabilities, and the relations among them [5]. IOF, in turn, was modeled explicitly on the OBO Foundry, the life-sciences community that proved coordinated, principle-based ontology building works at scale: dozens of biomedical ontologies that interlock instead of overlap because they share an upper grounding and design rules [3].

So the stack has three rungs, and the knowledge-graph chapter of the open-source book climbs all of them: BFO at the top (what kind of thing), IOF Core in the middle (generic manufacturing), and a biopharma domain ontology at the bottom (Bioreactor, MasterRecipe, ChromatographyColumn). Our local bp: namespace is the fourth rung — a tiny site-specific vocabulary that aligns up to the domain ontology rather than reinventing it. Each rung specializes the one above, so under a reasoner a bp:Batch inherits its IOF type (an IOF material artifact) and its BFO type (a material entity, an independent continuant), with all the compatibility that guarantees. There is no IOF class literally named "material entity" — that term is BFO's; the IOF rung contributes material artifact, manufacturing process, piece of equipment, and the like.

Four rungs, each specializing the one above: a neutral upper ontology, an industrial mid-level modeled on the OBO Foundry, the biopharma domain, and the local vocabulary that aligns up to it — so a local class inherits compatibility for free. Original diagram by the authors, created with AI assistance.

Those four rungs are not a metaphor — they are literally what the alignment file asserts. Each local bp: class declares itself a subclass of a real, published upper term, so a bp:Batch asserts it is an IOF material artifact and a BFO material entity. A caveat on provenance: these IRIs were verified, but not all from one place — the BFO and OBO terms against the EBI Ontology Lookup Service (OLS4), and the IOF terms against the published IOF release on GitHub, because OLS does not host IOF. Each IRI is live and dereferenceable:

# align.ttl — local vocabulary aligned UP to BFO 2020 + IOF Core + IOF biopharma (verified IRIs).
@prefix bp:  <https://example.org/bioproc#> .
@prefix obo: <http://purl.obolibrary.org/obo/> .
@prefix iof: <https://spec.industrialontologies.org/ontology/construct/> .

bp:Material              rdfs:subClassOf obo:BFO_0000040 .          # BFO 'material entity'
bp:Process               rdfs:subClassOf obo:BFO_0000015 .          # BFO 'process'
bp:CellCultureProcess    rdfs:subClassOf iof:ManufacturingProcess . # IOF Core mid-level
bp:Batch                 rdfs:subClassOf iof:MaterialArtifact .      # IOF Core
bp:CaptureChromatography rdfs:subClassOf iof:CaptureStep .           # IOF biopharma unit op
bp:CellLine              rdfs:subClassOf iof:CellLine .              # IOF biopharma material
bp:Quality               rdfs:subClassOf obo:BFO_0000019 .          # BFO 'quality'
bp:InformationArtifact   rdfs:subClassOf iof:InformationContentEntity .

Note the relation used: every edge is rdfs:subClassOf, never owl:equivalentClass or owl:sameAs. A bp:Batch is a specialization of an IOF material artifact, not a synonym for it — asserting equivalence would over-commit, forcing the reasoner to conclude that every IOF material artifact anywhere is one of our batches. Subsumption is the honest, weaker claim, and it is all the interoperability we need: a downstream IOF-aware tool lines up on the shared superclass without our local leaf leaking back up into its world.

How each alignment line was written. The two edges above — bp:Batch rdfs:subClassOf iof:MaterialArtifact and bp:CellCultureProcess rdfs:subClassOf iof:ManufacturingProcess — were not guessed; each is the output of a short, repeatable placement procedure, the Reuse-stage analogue of Part I's competency-question recipe. Worked end to end on bp:Batch:

Ask the one question that splits the spine: does it persist, or does it happen? A batch of broth is there in full at every instant you point at it; it does not unfold. So it is a continuant, not an occurrent — the first fork, and the one most placement errors get wrong.
If continuant, ask which of the three dependence kinds. Does it exist on its own (independent), only by inhering in something else (specifically dependent), or as copyable information (generically dependent)? The batch exists on its own — it is an independent continuant, a material entity — whereas its purity would inhere in it (specifically dependent) and its batch record would be copyable (generically dependent). bp:CellCultureProcess exits this fork early: it happens, so it is an occurrent (a process) and the dependence question never arises.
Pin the placement to a published IRI, not an English label. Read the BFO 2020 term off the table above — material entity is obo:BFO_0000040 — and confirm it is live (the BFO/OBO terms against OLS4, the IOF terms against the GitHub release, since OLS does not host IOF).
Assert rdfs:subClassOf up the BFO → IOF → domain → bp stack at the most specific rung that already exists. Rather than wiring bp:Batch straight to BFO, attach it to the mid-level term that already carries the BFO grounding — iof:MaterialArtifact (itself an IOF material artifact under BFO material entity) — so one edge buys the whole chain on import. Where the most specific rung is missing (no iof:Specification), drop to the nearest real term (iof:RequirementSpecification) and flag the gap rather than minting a synonym.

A placement that survives those four asks lands in exactly one box and inherits everything above it for free; the same procedure run on bp:CellCultureProcess stops after step 1's "it happens" and pins to iof:ManufacturingProcess.

There is a gap worth naming, because it is where most adoptions trip. Asserting rdfs:subClassOf iof:MaterialArtifact gives a real, shared IRI another team's IOF-aware tool can line up with — but on its own it does not make a reasoner conclude that a bp:Batch is a BFO material entity, because the chain from iof:MaterialArtifact up to BFO lives inside the IOF ontology, which the alignment file does not load. To turn the assertion into an inference, you owl:imports IOF Core — which itself imports BFO 2020 — and the reasoner then classifies your terms through the whole stack and checks them for consistency against it. Because BFO (and the OBO/IOF stack it grounds) ships under an open license, importing and redistributing it carries no licensing cost — which is why "standing on what already exists" is actually free; the reuse survey weighs license and import cost as an explicit selection criterion.

IOF is also more than Core: it ships domain modules, and the biopharma module turns out to be substantial. Audited directly against the published release (Release_202602), it defines 171 classes across the biopharma modules — all marked Released, not provisional — including 44 unit-operation classes and 17 Quality-by-Design (QbD)-parameter classes (plus equipment, material, and recipe terms), of which the running example reuses 20: the 27 distinct IOF classes it consumes are 7 Core + 20 biopharma, recorded in align.ttl. That same audit is the single source of truth the standards chapter and the-ontologies-in-use cite, so the three chapters stay numerically consistent. So the running example reuses real, verified IOF terms across the whole interior rather than re-minting them: iof:Bioreactor, iof:ChromatographyColumn, and iof:MasterRecipe for the equipment and recipe; iof:CaptureStep, iof:ViralClearance, iof:ViralInactivation, iof:ViralFiltration, iof:PolishingProcess, and iof:DrugProductFormulationProcess for the unit operations; iof:CellLine and iof:ClonedCellLine for the line and the clone it descends from; and iof:QualityAttribute, iof:ProcessParameter, and the normal-operating-range (NOR) and proven-acceptable-range (PAR) expressions — the operating windows QbD defines around each process parameter — for the QbD scaffolding.

Where IOF genuinely has no settled term, the alignment stays honest about the gap rather than papering over it: there is no bare iof:Specification (so bp:Specification aligns to iof:RequirementSpecification instead), and there is no fill-finish, aseptic-fill, or lyophilization class at all — so bp:FillFinishProcess stays a flagged local class. The running example does both halves — its align.ttl reuses the IOF terms it could confirm, and a companion bioproc-imports.ttl carries the real owl:imports — while keeping the offline validator scoped to what it can prove without fetching the external stack. The honest one-line summary: the alignment buys shared vocabulary for free and cross-ontology reasoning only once you import.

The relations are part of the spine too

An upper ontology standardizes not only the kinds of thing but the relations between them, and getting those right matters just as much. The biomedical community wrote them down early in the Relation Ontology (RO), arguing that relations like part of, participates in, has participant, is about, and derives from must be defined as carefully as classes or two ontologies will use "part of" to mean two different things [4]. BFO supplies the backbone relations: a continuant participates in an occurrent (the batch participates in the cell-culture run), a quality inheres in a continuant (the purity inheres in the lot), a role is realized in a process. Each of those backbone relations is itself a real RO or BFO term the alignment pins to, not loose English: participates in is obo:RO_0000056, has participant its inverse obo:RO_0000057, has output obo:RO_0002234, inheres in the BFO 2020 relation obo:BFO_0000197, realized in obo:BFO_0000054, and occurs in obo:BFO_0000066. Our workhorse derivedFrom — the genealogy edge from the preface — is a domain relation that sits cleanly on this backbone, relating one material entity to the parent material it came from. It is declared rdfs:subPropertyOf obo:RO_0001000 ('derives from') and, because origination is genuinely chainable, owl:TransitiveProperty (if A derives from B and B from C, then A derives from C) — which is exactly what lets one SPARQL (the query language for RDF graphs) (bp:derivedFrom)+ walk — the + meaning "follow this edge one or more hops" — recover a drug-substance lot's whole material genealogy back to the cell bank (the frozen stock of cells the whole campaign descends from), the query behind competency question CQ-01 and the genealogy chapter. Declaring it once, with a clear definition, is what stops derivedFrom, madeFrom, and comesFrom from multiplying into three half-synonyms nobody can query across.

The unsolved part: a shared spine is necessary, not sufficient

It is tempting to read all this as: adopt BFO and the meaning problem is solved. It is not, and the gap is worth naming exactly. A shared upper ontology guarantees that two independently built ontologies are structurally compatible — that both agree a process is an occurrent and a purity is a quality. It does not guarantee that two modelers facing the same plant make the same choices: one may model "harvest" as a process, another as the material that results from it, and both can be BFO-conformant while still failing to line up. Compatibility of structure is not agreement of content. The running example shows exactly this fork in miniature: this book models bp:CLAR-001, the clarified harvest, as a material entity (the clarified broth that flows on to capture) produced by a separate bp:HARV-001 process — but a perfectly BFO-conformant colleague could model "harvest" as the process alone and attach no material at all, and both graphs would pass every consistency check while refusing to line up at the seam. The spine forbids the impossible; it does not legislate the merely different.

Worse, the abstraction has a real cost that the field argues about openly. Classifying every entity correctly onto the spine takes genuine ontological expertise — the continuant/occurrent line is sharp in theory and slippery in practice (is a "batch" the material, the record, or the run? the honest answer is "three different entities," which is more work than most projects budget for). Over-modeling is its own failure: a graph where every trivial thing is decomposed into roles and dispositions becomes unusable. So the upper spine is a foundation, not a finished building. It rules out whole categories of incompatibility and silent error — which is a lot — but the discipline of choosing the same classes, and of stopping the modeling at a useful depth, remains a human practice this book returns to in model governance and in the final verdict. BFO tells you what kinds of thing exist; it cannot tell you that your colleague modeled the harvest the way you did.

Why it matters

Every later chapter in this book names new entities — a target, a design space, a chromatography pool, a release specification — and every one of them will be placed on this spine. The reason to do that work is leverage: a term anchored to BFO is interoperable with every other BFO-grounded ontology on earth without a single bespoke adapter, which is the entire promise of the FAIR principles made structural. Skip the spine and you get a vocabulary that works inside one project and nowhere else — the private-dialect trap, rebuilt in more expensive materials.

In the real world

BFO's reach is not aspirational. It is the upper ontology under a large share of the OBO Foundry's biomedical ontologies, it is an ISO/IEC standard, and it is the grounding the Industrial Ontologies Foundry chose for manufacturing — including the biopharma ontologies aimed at monoclonal-antibody lines like the one this book models [1][5]. In practice you rarely hand-classify entities against raw BFO; you import a domain ontology that already did it, and you author your local terms in an editor like Protégé that can check your alignment. The spine is mostly invisible in daily use, exactly as parts of speech are invisible while you write — present in every sentence, noticed only when something is grammatically wrong.

Key terms

Upper (foundational) ontology — a small, domain-neutral vocabulary of the most general categories, on which domain ontologies build so they stay compatible.
Basic Formal Ontology (BFO) — the standardized upper ontology (ISO/IEC 21838-2) used across science and industry.
Continuant — an entity that persists through time as a whole, present in full at every instant it exists (a cell, a vessel, a lot, a purity, a role).
Occurrent — an entity that happens and unfolds in time, with temporal parts (a fermentation, a capture step, a campaign).
Independent continuant / material entity — a thing that exists on its own, such as the bioreactor or the drug-substance lot.
Specifically dependent continuant — a quality (the monomer purity) or realizable entity (role, disposition, function) that exists only by inhering in something else.
Generically dependent continuant — copyable information, such as the recipe, master batch record, specification, or certificate of analysis.
IOF Core — the BFO-grounded mid-level manufacturing ontology that supplies shared concepts (equipment, material, process) for domain ontologies to specialize.
OBO Foundry — the life-sciences community whose coordinated, principle-based ontologies inspired the industrial equivalent.
Relation Ontology (RO) — the effort to define relations (part of, participates in, derives from) as carefully as classes so they mean the same thing across ontologies.
IRI (Internationalized Resource Identifier) — a globally unique web address that names a term, so two systems agree on meaning by pointing at the same IRI rather than comparing English labels.
SEC (size-exclusion chromatography) — the analytical method that separates molecules by size; SEC %monomer reports the fraction of intact, single-molecule product.
SPARQL — the query language for RDF graphs; a (predicate)+ path walks a relationship across one or more hops.
Ground truth (for AI) — the verified, classified facts a learning model is anchored to; here the knowledge graph and the BFO spine beneath it, which supply the substance a fluent model cannot.
Leave-one-batch-out / grouped split — a validation scheme in which all rows from one batch stay wholly in train or test, so the genealogy (bp:derivedFrom)+ walk that finds the non-independent sibling lots becomes the group key for an honest train/test fold.

Where this leads

We have the spine: the categories every term hangs from and the relations that wire them. The next chapter, Classes, Relations, and Axioms: Building the Vocabulary, descends from the upper categories to the actual act of authoring — defining a Batch class, a derivedFrom relation, and the axioms that turn a loose vocabulary into something a reasoner can check and extend. We move from what kinds of thing exist to how you write them down.

What this chapter covers​

What an upper ontology is, and why bioprocess needs one​

The one distinction that does the work: continuants and occurrents​

Inside the continuants: objects, qualities, and realizables​

Anatomy of one batch, placed on the spine​

Why a learning model wants the spine underneath it​

From spine to plant: the IOF Core mid-level​

The relations are part of the spine too​

The unsolved part: a shared spine is necessary, not sufficient​

Why it matters​

In the real world​

Key terms​

Where this leads​