Skip to main content

Modeling the Target and the Product Concept

📍 Where we are: Part II · Discovery and Development, modeled — Chapter 4. Part I built the kit. Now we point it at the process, beginning where the medicine itself begins: not in a tank, but in an idea about a disease.

A biologic program starts long before any cell is engineered. Someone identifies a target — a molecule in the body, usually a protein, whose behavior drives a disease — and forms a product concept: an antibody that will bind that target and change its behavior. These are the first entities of the whole story, and the temptation is to record them in a slide deck and a spreadsheet. This chapter argues for doing something else from the very first day: naming them with ontologies the entire biomedical world already shares, so that the knowledge accumulated about this program is interoperable from its first entry rather than its last.

The simple version

When you start a research project, you can invent your own filing codes for "the protein we're going after" and "the disease we're treating" — or you can use the catalog numbers the rest of science already agreed on. Invent your own, and on the day you want to connect your data to a public database, a partner's data, or a regulator's, you are back to translating. Use the shared catalog numbers — the ones for this exact gene, this exact protein, this exact disease — and the connection is already made. This chapter is about reaching for the shared catalog at the very start, when it costs almost nothing.

What this chapter covers

We name the target, the mechanism of action, and the product concept as ontology individuals, and we deliberately do not invent the classes for proteins, molecular functions, diseases, indications, and the mechanism of action — we borrow them from OBI, the Gene Ontology, the Protein Ontology, a disease ontology, the NCI Thesaurus, and the Relation Ontology. We dissect one target definition to see those external IRIs at work, model the product concept as an information artifact that exists before the product does, and close on the honest seam between the research ontologies (the OBO world) and the manufacturing ontologies (the IOF world) this program will eventually cross.

The target: borrow the class, do not build it

The single most valuable habit in modeling discovery is restraint: when an established ontology already defines a thing, use its IRI instead of minting your own. Biomedicine has spent two decades building exactly the ontologies a program needs. The Gene Ontology (GO) describes what gene products do — their molecular functions and the biological processes they participate in — and is the most widely used ontology in biology [2]. The Protein Ontology (PRO) gives every protein, down to specific forms and modifications, a precise IRI [3]. A disease ontology such as the Human Disease Ontology supplies a stable identifier for the disease [4]. And OBI, the Ontology for Biomedical Investigations, supplies the assays, study designs, and roles that surround them — all built on the same BFO spine from Part I, all part of the OBO Foundry's coordinated suite [1][5].

So the target of our program is not a string "the antigen." It is an individual that is about a PRO protein class, which the Gene Ontology says has a particular molecular function, expressed in a disease named by a stable disease-ontology IRI. Each of those is a globally unique identifier from Part I's identifier discipline, and each connects this program's private graph to the entire public web of biomedical knowledge. That is leverage no in-house vocabulary can buy.

The mechanism of action is a relation, not a label

A program does not just name a target; it has a hypothesis about it — the mechanism of action (MoA): this antibody will bind the target and, say, block its signaling. In a slide that hypothesis is a sentence; in the model it is a set of relations. The antibody bindsTo the target (an object property to the PRO individual); the binding inhibits a Gene Ontology molecular function; the inhibition is expected to modulate a biological process implicated in the disease. Modeling the MoA as relations rather than prose means a later question — which programs target this pathway? — becomes a query across every program's graph, exactly the QbD-as-relations move the axioms chapter introduced, now applied to biology instead of process.

Identity card dissecting one target definition into rows that each point at an external ontology IRI: a target row whose value is a Protein Ontology (PRO) individual; a molecular-function row pointing at a Gene Ontology term; a disease (indication) row pointing at a Human Disease Ontology IRI; a mechanism-of-action block showing the antibody bindsTo the PRO target and inhibits the GO function; and an investigation row pointing at OBI assay and role classes — with a side note that every value is a globally unique identifier shared with public biomedical databases, color-coded by source ontology, with the mechanism of action shown as a set of relations. One target, defined by reference: the protein is a PRO IRI, its function a Gene Ontology term, the disease a disease-ontology IRI, and the mechanism of action a set of relations — so the program's private graph is wired into the public web of biomedicine from its first entry. Original diagram by the authors, created with AI assistance.

The product concept exists before the product does

Here is a subtlety Part I prepared us for. On the day a program begins, the antibody does not exist — no cell makes it, no vial holds it. Yet the program reasons about it constantly: its intended target, its desired properties, its planned indication. What exists is a product concept, and BFO has exactly the right category for it — a generically dependent continuant, an information artifact, the same kind of thing as the recipe and the specification [1]. The concept is about a future material entity that does not yet exist; modeling it as information lets the graph hold design intent before there is anything to point a sensor at. When the molecule is later discovered, made, and released, the physical drug-product lot realizes (or conformsTo) the concept — and the thread from the original idea to the filled vial is unbroken. This is the ontology's answer to a problem every program has: how to record what you intend to make in a way that connects, years later, to what you actually made. And here is the payoff of the whole discipline made concrete: almost nothing this chapter names is a class the program invents. Nearly every entity and relation below is a real, dereferenceable IRI the rest of biomedicine already uses, bound up in the alignment file rather than minted locally — so the day-one discovery graph is wired into the public web almost end to end before a single cell is engineered:

# align.ttl — every entity and relation this chapter names is bound UP to a shared, verified
# IRI rather than minted locally. (Excerpt; the running example carries the full alignment.)
@prefix bp: <https://example.org/bioproc#> .
@prefix obo: <http://purl.obolibrary.org/obo/> . # one shared PURL host; each ontology is named in its own comment below
@prefix iof: <https://spec.industrialontologies.org/ontology/construct/> .

# the nouns of the program
bp:Target rdfs:subClassOf obo:PR_000000001 . # Protein Ontology 'protein' (the target is a leaf PRO IRI)
bp:MolecularFunction rdfs:subClassOf obo:GO_0003674 . # Gene Ontology 'molecular_function'
bp:Disease rdfs:subClassOf obo:DOID_4 . # Human Disease Ontology 'disease'
bp:Indication rdfs:subClassOf obo:NCIT_C41184 . # NCIt 'Indication' (the intended-use class; the example below points hasIndication at the disease itself)
bp:MechanismOfAction rdfs:subClassOf obo:NCIT_C54680 . # NCIt 'Mechanism of Action'
bp:Antibody rdfs:subClassOf obo:GO_0071735 . # GO 'IgG immunoglobulin complex'
bp:ProductConcept rdfs:subClassOf iof:InformationContentEntity . # information artifact, kept in IOF to share a family with the recipe and spec; IAO would serve too (see below)
bp:MonoclonalAntibodyProduct rdfs:subClassOf obo:NCIT_C20401 . # NCIt 'Monoclonal Antibody' (the product concept)

# the relations of the hypothesis
bp:isAbout rdfs:subPropertyOf obo:IAO_0000136 . # IAO 'is about' (the concept is about the future molecule)
bp:bindsTo rdfs:subPropertyOf obo:RO_0002436 . # RO 'molecularly interacts with' (the antibody binds the target)
# inhibits / modulates stay the program's own relations on purpose: RO's regulation relations are
# process-scoped, and here the subjects are a material antibody and an information-artifact MoA.

None of these alignments is a reflex — each is a curated choice the running example documents and verifies against the published ontology, IRI by IRI. Some are barely choices: a protein resolves to the Protein Ontology and a function to the Gene Ontology because those are the only IRIs anyone uses. Two are genuine judgment calls. Indication and mechanism of action are named from the NCI Thesaurus (NCIt) — the U.S. National Cancer Institute's reference biomedical vocabulary — rather than from SNOMED CT or MeSH, because NCIt is the production vocabulary the regulated world already speaks, a tradeoff Part VII weighs in full. And bp:ProductConcept takes its superclass from the industrial IOF, not from the home of is about — the Information Artifact Ontology (IAO), the OBO Foundry's ontology of information entities. IAO has its own information content entity, so this is the curated pick, not the obvious one: it keeps the product concept in the same information-artifact family as the recipe and specification modeled against IOF in Part I, so the OBO–IOF seam runs between the research terms and the making terms rather than through the concept itself. Both are legitimate — IAO and IOF descend from one BFO spine — and the example simply picks one and records why.

The individuals make the hypothesis itself loadable: the mechanism of action is relations, not prose, and the loop from the day-one concept to the realized lot is one real conformsTo edge:

# instances.ttl — the target, the day-one product concept, and the loop closed at release.
bp:TARGET-X a bp:Target ; # a Target (a PRO protein); the leaf PRO IRI is program-specific
bp:hasMolecularFunction bp:MF-X . # a Gene Ontology molecular function
bp:PC-mAb-A a bp:MonoclonalAntibodyProduct ; # the product concept — an information artifact, on day one
bp:hasMechanismOfAction bp:MOA-X ;
bp:hasIndication bp:DISEASE-X ; # the indication points at the disease itself (a Human Disease Ontology IRI)
bp:isAbout bp:mAb-A . # about a molecule that does not yet physically exist
bp:MOA-X a bp:MechanismOfAction ; bp:modulates bp:MF-X .
bp:mAb-A bp:bindsTo bp:TARGET-X ; bp:inhibits bp:MF-X .
bp:DP-001 bp:conformsTo bp:PC-mAb-A . # years later, the realized lot closes the loop to the concept

A horizontal timeline graph showing the product concept as an information artifact (generically dependent continuant) created at program start on the left, pointing by an isAbout edge to a dashed, not-yet-existing future antibody; arrows trace forward through discovery, cell-line, manufacturing, and release stages until a concrete drug-product lot DP-001 appears on the right and links back to the concept by a conformsTo edge, closing the loop from intended product to realized product; the concept node is colored as information while the realized lot is colored as material. Design intent, modeled before the thing exists: the product concept is an information artifact created on day one that points at a future molecule; years later the realized drug-product lot conforms to it, closing an unbroken thread from idea to vial. The downstream stage IDs (WCB-CHO-001, BATCH-2026-001, DP-001) are this book's running example, met in full in the chapters ahead. Original diagram by the authors, created with AI assistance.

The unsolved part: the OBO–IOF seam

This chapter quietly straddles a fault line the rest of the book lives on, and it is honest to name it now. The ontologies that best describe a target — GO, PRO, the disease ontologies, OBI — grew up in the OBO Foundry, the biomedical world. The ontologies that best describe making the antibody — IOF Core, the IOF biopharma manufacturing ontologies from Part 1 — grew up in the Industrial Ontologies Foundry, the engineering world. Both are grounded in BFO, which is precisely why they can meet; but "can meet" is not "have met." There is no single, settled, universally adopted bridge that says how an OBI assay relates to an IOF measurement process, or how a PRO protein target relates to the IOF material that is the purified antibody. Teams build those bridges case by case.

So a program that models its target beautifully in OBO terms and its manufacturing beautifully in IOF terms still owns the seam between them — a mapping it must author, review, and maintain, not a feature it imports. This is the same lesson the knowledge-graph chapter reached from the code side: shared upper grounding gives you compatibility of structure, while agreement of content across two large communities is still, in 2026, partly aspirational. The good news is that BFO makes the bridge possible and small; the honest news is that it is yours to build, and the discovery-to-manufacturing handoff is where this book's two ontology worlds first have to shake hands.

Why it matters

The cost of naming the target with a shared ontology is almost nothing on day one and enormous to retrofit on day one thousand. A program that started with private codes for its gene, protein, and disease must, when it wants to query across studies or submit structured data, reconcile those codes against the public ontologies after the fact — the entity-resolution problem the last chapter said is the unsolved half of identity, now self-inflicted. A program that borrowed the shared IRIs from the start simply is interoperable. The earliest entities are the cheapest place in the entire lifecycle to get identity right, and the most expensive place to get it wrong.

In the real world

This is established practice in research informatics, if not yet universal in manufacturing. GO, PRO, OBI, and the disease ontologies are mature, heavily used, and curated by funded consortia; public databases annotate to them by default [2][3]. The target of a real antibody program — a cytokine, a receptor, a checkpoint protein — already has a PRO IRI and a GO functional annotation waiting to be used. What is still maturing is the manufacturing side reaching back to meet the research side: the IOF biopharma ontologies were released only recently, and the discipline of carrying a target's OBO identity all the way into the cell line and the released drug product is exactly the frontier the rest of this book walks.

Key terms

  • Target — the molecule (usually a protein) whose behavior drives a disease and that the biologic is designed to bind; modeled by reference to a Protein Ontology IRI.
  • Mechanism of action (MoA) — the program's hypothesis about how the antibody changes the target's behavior; modeled as relations (bindsTo, aligned to the Relation Ontology's molecularly interacts with; inhibits, kept local) rather than prose, and named as a class by the NCI Thesaurus term Mechanism of Action.
  • Product concept — the intended product, modeled as a generically dependent continuant (an information artifact) that exists before the physical product and that the realized lot later conforms to.
  • OBI (Ontology for Biomedical Investigations) — the BFO-grounded ontology of assays, study designs, and roles used across biomedical research.
  • Gene Ontology (GO) / Protein Ontology (PRO) — the standard ontologies for gene-product functions and for specific protein entities.
  • Disease ontology — a stable-identifier ontology for the disease the program treats (e.g., the Human Disease Ontology); in the running example the disease also fills the indication slot, while the NCIt Indication class names the intended-use sense.
  • OBO–IOF seam — the still-unbridged boundary between the biomedical (OBO) ontologies that describe the target and the manufacturing (IOF) ontologies that describe making it; both BFO-grounded, but the crosswalk is yours to author.

Where this leads

We have named the target and the intended product. The next chapter, Modeling the Molecule: Sequence, Modality, and Developability, follows the program as it turns that concept into actual candidate molecules — and shows how a sequence becomes an information entity, how "developability" is modeled as a set of dispositions a candidate either has or lacks, and how the discovery campaign that selects a lead is itself an occurrent the graph can record.