Skip to main content

Classes, Relations, and Axioms: Building the Vocabulary

📍 Where we are: Part I · Foundations of the Model — Chapter 2. With the upper spine in place, we move from what kinds of thing exist to the craft of writing them down — a class, a relation, and the axioms that make the vocabulary do work.

The last chapter classified the furniture of a bioprocess onto BFO's categories. That tells us a Batch is a continuant and a fermentation is an occurrent — but it does not yet give us a Batch we can use. To get there we have to author: declare the class, declare the derivedFrom relation, and — the part that separates a real ontology from a glorified glossary — write the axioms that let a computer reason over what we declared. This chapter is that act of authoring, in the same RDF/OWL/SHACL languages the data-management book introduced, now turned on our own vocabulary.

The simple version

Think of baking. A class is a cookie cutter — the shape "Batch." An instance is one actual cookie stamped from it — BATCH-2026-001. A relation is a thread tying cookies together — this cookie derivedFrom that one. And an axiom is a rule the kitchen inspector enforces without being asked each time: "every cookie has exactly one label," "a cookie is never also an oven," "if A came from B and B from C, then A came from C." Classes and instances give you shapes and things; axioms are what let a machine check and extend the whole tray on its own.

What this chapter covers

We author a small bioprocess vocabulary in OWL, written as RDF triples in Turtle. We then add the axioms that earn their keep — subclass, domain and range, transitivity, disjointness — and watch a reasoner use them to infer new facts and catch contradictions. We dissect one class definition field by field, show how Quality by Design's parameter-to-attribute link becomes an authored relation, and end on the open-world surprises and over-axiomatization traps that make this craft harder than it looks.

From spine to syntax: classes and properties in OWL

Recall the three atoms from the data book: a class is a category (Batch, CapturePool), an instance is a concrete member (BATCH-2026-001), and a relation (or property) connects things (derivedFrom) [1]. The foundation for writing them down is RDF, which represents every fact as a triple — subject, predicate, object — each named by a globally unique IRI [2]. OWL 2, the Web Ontology Language, is the layer on top that lets you say what the classes and properties are formally enough for a reasoner to act on [1].

OWL draws one distinction worth fixing early: an object property links a thing to another thing (derivedFrom relates a lot to a parent lot), while a datatype property links a thing to a literal value (monomerPct relates a lot to the number 98.611). That is the same fork the open-source knowledge-graph chapter calls "an edge or a value," and it is the difference between a relationship you can walk and a measurement you can read. Here is the seed of our vocabulary, in Turtle — the same shape the companion repo aligns to IOF and Allotrope:

# bioproc.ttl — the local vocabulary, aligned up to IOF Core (illustrative).
@prefix bp: <https://example.org/bioproc#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix owl: <http://www.w3.org/2002/07/owl#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

bp:Material a owl:Class ; rdfs:label "Bioprocess material" .
bp:Batch a owl:Class ; rdfs:label "Bioreactor batch" ;
rdfs:subClassOf bp:Material .
bp:CapturePool a owl:Class ; rdfs:label "Protein A capture pool" ;
rdfs:subClassOf bp:Material .

bp:derivedFrom a owl:ObjectProperty ;
rdfs:label "derived from" .
bp:monomerPct a owl:DatatypeProperty ;
rdfs:label "SEC %monomer" ;
rdfs:range xsd:float .

Nothing here is doing logic yet — these are declarations, the nouns and verbs of the vocabulary. The intelligence comes from what we say about them.

The axioms that earn their keep

An axiom is a logical statement that constrains the model so a computer can reason over it [1]. Four kinds do most of the work in a bioprocess ontology, and each one buys a concrete capability.

Subclass (rdfs:subClassOf) says one class is a kind of another. Declaring bp:ProductionBioreactor rdfs:subClassOf bp:Bioreactor means every production bioreactor is a bioreactor, so any fact true of bioreactors is automatically inherited — you state a constraint once at the general class and every specialization gets it for free.

Domain and range pin what a property may connect. bp:derivedFrom rdfs:domain bp:Material ; rdfs:range bp:Material says both ends must be materials. Now if someone writes BATCH-2026-001 derivedFrom SomeOperator, a reasoner can infer that SomeOperator must be a bp:Material — and if it has been declared a Person and Person is disjoint from Material, the graph is flagged as inconsistent. Domain and range turn a loose edge into a typed one.

Transitivity is the single most valuable axiom in this whole book. Declaring bp:derivedFrom a owl:TransitiveProperty tells any reasoner that if POLpool-001 derivedFrom VFpool-001 and VFpool-001 derivedFrom VIpool-001, then POLpool-001 derivedFrom VIpool-001 — and, hop after hop, that DS-001 derivedFrom BATCH-2026-001 and on up to WCB-CHO-001, without anyone asserting any of those long-range links. Only the immediate parent edges are stated; the running example's lineage is eleven materials deep — POLpool-001, VFpool-001, VIpool-001, PApool-001, CLAR-001, BATCH-2026-001, SEED-001, SEEDFLASK-001, WCB-CHO-001, MCB-CHO-001, RCB-CHO-001 — and that one word is what lets a lineage question reach the research cell bank from the drug substance in a single hop of meaning, complementing the SPARQL property path the open-source chapter uses to walk it.

Disjointness catches the exact errors the spine chapter warned about. bp:Batch owl:disjointWith bp:CellCultureProcess says nothing can be both a batch (continuant) and a fermentation (occurrent) — the classic "the batch is the run" conflation. And bp:Material owl:disjointWith bp:Equipment says nothing can be both the material and the vessel that holds it — the subtler "the batch is the bioreactor" conflation that a naive multi-source load produces. Assert both types on one IRI and the model is contradictory; a DL reasoner (HermiT, ELK) reports it, and — because the lightweight OWL-RL reasoner most pipelines run does not act on owl:disjointWith — the release-gate SHACL shapes carry a closed-world guard that catches it in the runnable validator too. Disjointness is how the continuant/occurrent and material/equipment disciplines stop being advice and become enforced rules.

bp:ProductionBioreactor rdfs:subClassOf bp:Bioreactor .
bp:derivedFrom rdfs:domain bp:Material ; rdfs:range bp:Material ;
a owl:TransitiveProperty .
bp:Batch owl:disjointWith bp:CellCultureProcess . # material continuant ≠ occurrent
bp:Material owl:disjointWith bp:Equipment . # the batch ≠ the vessel

A small fact network with axiom annotations: instances DS-001, POLpool-001, and VFpool-001 along the lineage chain linked left to right by asserted immediate derivedFrom edges, with a dashed inferred derivedFrom edge (DS-001 derivedFrom VFpool-001) arcing across them labelled by the transitive-property axiom; the DS-001 node carries a monomerPct 98.611 typed-literal value; a separate CellCultureProcess node sits in a different color band with a disjointWith bar between it and Batch showing the contradiction a reasoner would catch; each edge is tagged with the axiom that governs it (subClassOf, domain/range, transitive, disjointWith). Axioms at work: asserted immediate derivedFrom edges (solid) let a reasoner infer the long-range link (dashed) by transitivity, domain and range type the ends, and a disjointness bar between Batch and CellCultureProcess is what turns "a batch is not a run" into an enforced rule. Original diagram by the authors, created with AI assistance.

What the reasoning buys: inference and contradiction

Axioms are not documentation; they are executable. Run a reasoner — the classification engines like HermiT or ELK that ship with Protégé — over the vocabulary plus the facts, and two things happen [5]. First, inference: the reasoner derives facts no one stated, such as the transitive lineage above, or the conclusion that an instance asserted to be a ProductionBioreactor is therefore a Bioreactor, a piece of equipment, and a BFO material entity (an independent continuant) — note that equipment is a BFO material entity but not a bp:Material, the two being disjoint here. Second, contradiction detection: if the asserted facts violate an axiom — a batch that is also a process, a derivedFrom whose object cannot be a material — the reasoner reports the graph as inconsistent and points at the clash. A relational database cannot do either from its schema alone; an axiomatized ontology does both for free, which is the whole reason to pay the authoring cost. Run a reasoner over the running example and both happen at once — new facts appear, and a planted contradiction is caught:

# Real output from validate.py (owlrl OWL-RL closure + SHACL over the running example).
[1] parsed 2100 triples (bioproc + align + instances)
[2] reasoned: 2100 -> 7089 triples after OWL-RL closure
transitive derivedFrom inferred DS-001 -> WCB-CHO-001: True
transitive derivedFrom inferred DS-001 -> RCB-CHO-001: True
planted Batch-is-a-Process caught (conforms False): True
planted Batch-also-Bioreactor caught (conforms False): True

The reasoner manufactured the long-range derivedFrom links no one asserted — DS-001 reaches both the working and the research cell bank across the full eleven-material chain — and the closed-world guards refused both the "batch is a run" and the "batch is the vessel" conflations, none of which the raw triples stated. (The conforms False verdict is SHACL's: since OWL-RL does not act on owl:disjointWith, the runnable catch is the shape, while the axiom is what a Protégé DL reasoner enforces directly.)

OWL reasons; SHACL validates

There is a subtlety that trips up everyone once, and it decides which tool you reach for. OWL is open-world: it assumes that what is not stated is merely unknown, not false. That is exactly right for inference — you would not want a reasoner to conclude a batch has no parent just because the parent edge has not loaded yet. But it is exactly wrong for data validation, where a missing required field genuinely is an error. "This released lot has no release status" should fail a release check, not be charitably treated as "status unknown."

That validation job belongs to SHACL, the Shapes Constraint Language, which checks a graph against shapes under a closed-world reading — "every DrugSubstance or DrugProduct lot must have exactly one releaseStatus drawn from this list" — and reports violations [3]. The division of labor is clean and worth memorizing: OWL says what things mean and infers consequences; SHACL says what a valid record must contain and gates it. The transitive lineage is an OWL job; the release gate we build later is a SHACL job. Authoring both against the same vocabulary is what makes the graph simultaneously smart and trustworthy.

A split-panel flow: on the left an OWL/reasoner lane takes asserted triples (derivedFrom, subClassOf, domain/range) through a HermiT/ELK reasoner box and emits inferred facts (the transitive lineage, subclass inheritance) plus a consistency check that flags both the Batch-equals-Process and the Batch-equals-vessel contradictions, labelled open-world; on the right a SHACL lane takes the graph plus shapes through a shape-check box (required, single, in-range), emitting a conforms-true or a validation report and noting a missing required field is an error not unknown, labelled closed-world; a caption bar underneath reads OWL infers meaning, SHACL gates records. Two engines, one vocabulary: OWL's open-world reasoner infers new facts and catches contradictions, while SHACL's closed-world validator enforces that a record contains what a valid record must. Original diagram by the authors, created with AI assistance.

Anatomy of one class definition

To see authoring as a craft rather than a list, take the bp:Batch class apart field by field — every line a real declaration, every line buying something. The rdfs:label ("Bioreactor batch") is the human-readable name, distinct from the IRI a machine uses. The rdfs:subClassOf places it under bp:Material and, through that, under IOF Core and BFO — the alignment that makes it interoperable without a bespoke adapter. The owl:disjointWith bp:CellCultureProcess encodes the continuant/occurrent boundary as an enforced rule. The properties the class bears — a bp:derivedFrom edge to its parent, and the release attributes bp:releaseStatus and bp:monomerPct — are declared with their domain set to bp:Material (the batch's superclass), so asserting any of them types the subject as a bp:Material, not specifically a bp:Batch. That width is deliberate: the released lots that actually carry a release status and a monomer result are the drug-substance and drug-product lots, also bp:Material, so scoping these properties to bp:Batch would have wrongly typed every lot as a batch. And an annotation (rdfs:comment, skos:definition, a provenance note) records why the class exists, which is what a future maintainer and an auditor both need.

Identity card dissecting the bp class definition into labelled rows: the IRI bp with its rdfs Bioreactor batch; an rdfs row pointing up through bp to IOF Core and BFO; an owl row pointing to bp marked as the continuant-versus-occurrent guard; a properties-borne block listing derivedFrom (object property to a parent Material) and, inherited from bp, the release attributes releaseStatus and monomerPct that the released drug-substance and drug-product lots carry; and an annotation row holding the definition and provenance, with each row tagged as either a declaration or an axiom. One class, fully unpacked: a label for humans, a subclass chain that aligns it upward, a disjointness axiom that guards a category error, the properties it bears, and the annotation that records why it exists. Original diagram by the authors, created with AI assistance.

This is the same identity-card discipline applied to the schema that the rest of the series applies to a data point: just as a reading travels with its unit and quality, a class travels with its alignment, its constraints, and its definition — so the meaning is in the model, not in a modeler's memory.

Quality by Design as an authored relation

The preface argued that QbD is secretly an ontology. Here is where that becomes literal. The link a development team works so hard to establish — this critical process parameter affects that critical quality attribute — is just a relation waiting to be declared: bp:affectsQuality, an object property from a bp:ProcessParameter to a bp:QualityAttribute. Declare it, and a fact like bp:FeedRate bp:affectsQuality bp:MonomerPct-CQA stops living in a prose report and becomes a queryable edge — so an investigator can later ask the graph "which parameters affect monomer purity?" and get an answer rather than a reading assignment. We build that out in the process-development chapter; the point here is that the most prized knowledge in process development is, structurally, one more authored relation.

The unsolved part: consistent is not the same as correct

A reasoner can prove your ontology is consistent — free of internal contradiction. It cannot prove it is correct — a faithful model of the plant. Those are different guarantees, and conflating them is the quiet failure of axiomatized models. You can author a perfectly consistent ontology that says the wrong thing: a derivedFrom whose direction is reversed, a disjointness you forgot to declare so the "batch is the run" error slips through silently, a cardinality that should be "exactly one" left as "at least one." The logic checks coherence, not truth.

Two further traps are specific to this craft. The first is the open-world surprise: because OWL treats missing facts as unknown, an OWL reasoner will not tell you a required field is absent — newcomers expect it to, and ship gaps that only SHACL would have caught. The second is over-axiomatization: every axiom you add is a commitment a reasoner must honor, and a richly constrained ontology can become computationally expensive or even step outside the decidable OWL DL profile into territory where reasoning no longer terminates reliably [5]. The art is to axiomatize exactly enough to catch the errors that matter and no more. So the honest standard for this chapter is not "does it reason?" but "does it reason about the right world, and stop there?" — a judgment no engine makes for you, and the reason ontology authoring remains a reviewed, governed human practice rather than a generated artifact.

Why it matters

Every later chapter declares new classes and relations, and the quality of those declarations decides whether the graph is an asset or a liability. Axioms are the difference between a vocabulary that merely labels data and one that checks and extends it: transitivity makes lineage walkable, disjointness makes category errors loud, domain and range make edges typed. Get the axioms right and the model defends its own integrity as it grows; get them wrong or omit them and you have an expensive spreadsheet with IRIs. The leverage of the whole approach lives in this chapter.

In the real world

You do not author this in a text editor. The de facto standard tool is Protégé, the free, open-source ontology editor from Stanford, which lets domain experts define classes and properties visually and run a reasoner to check consistency and view inferences before anything ships [4]. The RDF/OWL/SHACL underneath is the interchange format, the way HTML is the format under a styled page. In a regulated setting the authored ontology is itself a controlled artifact — versioned, reviewed, and change-controlled — which is the subject of model governance in Part VI; an axiom is a decision about how the plant is modeled, and decisions about regulated systems do not get made in an unversioned file on one laptop.

Key terms

  • Class / instance / relation — a category of thing; a concrete member; a connection between things.
  • OWL (Web Ontology Language) — the W3C language that adds formal logic to RDF so reasoners can infer facts and detect contradictions.
  • Object property / datatype property — a relation to another thing (an edge to walk) versus a relation to a literal value (a measurement to read).
  • Axiom — a logical statement constraining the model: subclass, domain/range, transitivity, disjointness, cardinality.
  • Transitive property — an OWL relation (like derivedFrom) where A→B and B→C imply A→C, making lineage inferable to any depth.
  • Disjointness — an axiom declaring two classes share no members, so asserting both about one thing is a flagged contradiction (the continuant/occurrent guard).
  • Reasoner — an engine (HermiT, ELK) that computes inferred facts and checks consistency over an OWL ontology.
  • Open-world assumption — OWL's stance that unstated facts are unknown, not false; correct for inference, wrong for validation.
  • SHACL — the closed-world Shapes Constraint Language that validates whether a graph contains what a valid record must.
  • Protégé — the standard free, open-source editor for authoring and reasoning over OWL ontologies.

Where this leads

We can now declare classes, wire them with relations, and constrain them with axioms a reasoner enforces. But every subject, predicate, and value we have written assumes one thing we have not yet examined: that a name means the same thing everywhere, and that a number never travels without its unit. The next chapter, Identifiers and Units: IRIs, QUDT, and the Typed Value, makes those guarantees concrete — the globally unique identifier that stops two systems' BATCH-2026-001 from colliding, and the unit-and-datatype discipline that stops 98.611 from ever again meaning a fraction in one system and a percent in another.