FAIR in Practice: Measuring Whether the Graph Actually Delivers

📍 Where we are: Part VI · The Whole Graph — Chapter 23. The model is governed and stays true to the plant. This chapter asks the harder question: is it actually useful to the world — genuinely FAIR — or only FAIR on paper?

The FAIR principles — that data be Findable, Accessible, Interoperable, and Reusable — have been the north star since the preface. We have built toward them with global IRIs, shared ontologies, typed units, and governance. It would be satisfying to declare victory. But FAIR is a set of principles, not a conformance certificate, and a graph can satisfy every standard on the wire while quietly failing the goal those standards serve. This chapter turns FAIR from an aspiration into a measurement, and faces the uncomfortable, well-documented gap between being standards-compliant and being FAIR in fact.

The simple version

A library can own every book and still be useless if nothing is catalogued, the doors are locked at random hours, and half the books are in languages no one labeled. "We have the books" is not "you can find, get, combine, and reuse them." FAIR is the promise that the library actually works — and the only way to know is to test it, not to assume that buying the right shelving delivered it. This chapter tests the graph against FAIR honestly, and admits where the test comes back worse than the brochure.

What this chapter covers

We turn each FAIR letter into a concrete, checkable question about the graph, dissect a FAIR assessment scorecard, and confront the gap the whole series has circled: standards-compliance does not guarantee FAIRness, interoperability is the dimension real data most often fails, and the cause is almost never the triplestore — it is metadata authored without a controlled vocabulary, by people under deadline, on the plant floor.

Each FAIR letter is a question you can actually check

FAIR's power is that it decomposes into concrete checks, which is what lets it be measured rather than merely claimed [1][2]. For our bioprocess graph:

Findable — does every entity carry a globally unique, persistent IRI and rich metadata, indexed so it can be located? A batch with a permanent IRI and product/site/date metadata passes; a number buried in a spreadsheet cell does not.
Accessible — can the data be retrieved by that identifier over a standard protocol, with clear access rules? A SPARQL endpoint with documented authorization passes; FAIR-accessible does not mean open — a tightly access-controlled record is fully FAIR if its access conditions are clear and its retrieval mechanism standard.
Interoperable — do values use shared, formal vocabularies and qualified, unit-bearing references, so they combine with other data? A monomerPct pointing at a shared ontology class with a QUDT unit passes; a bare string "98.6" does not.
Reusable — is the data richly described with provenance, context, and a clear usage license, so others can trust and reuse it? A result carrying its method, sample lineage, and terms of use passes; an orphaned number does not.

These are not abstractions for our graph; each letter lands on a concrete artifact already in the dataset. The drug-substance lot DS-001 is Findable because its CURIE expands to one global, persistent IRI — bp:DS-001 is https://example.org/bioproc#DS-001, the same string everywhere it appears — and it carries indexed metadata (a label, a type, a release status) rather than living as an anonymous cell:

@prefix bp:  <https://example.org/bioproc#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

bp:DS-001 a bp:DrugSubstance ; rdfs:label "DS-001" ;   # Findable: one persistent IRI, typed + labelled
    bp:derivedFrom bp:POLpool-001 ;                      # Reusable: provenance / lineage edge (one tier up)
    bp:releaseStatus "PASS" ;
    bp:monomerPct "98.611"^^xsd:float ;                  # convenience scalar...
    bp:monomerValue bp:DS-001-monomer .                  # ...Interoperable: qualified, unit-bearing value

The Interoperable difference is the bp:monomerValue edge: instead of a bare "98.6", the number resolves to a fully self-describing QUDT QuantityValue whose unit and quantity kind are themselves IRIs, so 98.611 can never be misread as a fraction or a different unit:

@prefix bp:    <https://example.org/bioproc#> .
@prefix qudt:  <http://qudt.org/schema/qudt/> .
@prefix unit:  <http://qudt.org/vocab/unit/> .
@prefix qkind: <http://qudt.org/vocab/quantitykind/> .
@prefix xsd:   <http://www.w3.org/2001/XMLSchema#> .

bp:DS-001-monomer a qudt:QuantityValue ;
    qudt:numericValue "98.611"^^xsd:float ;
    qudt:hasUnit unit:PERCENT ;
    qudt:hasQuantityKind qkind:DimensionlessRatio .

That same bp:derivedFrom edge is what makes the lot Reusable — it is a subproperty of the OBO Relation Ontology's derives from and is transitive, so the provenance chain back to the cell bank is recoverable rather than asserted. Crucially, the FAIR principles target machine-actionability — usable by computers with minimal human help [1] — which is exactly why a human-readable-but-machine-opaque record fails them even when a person can make sense of it.

FAIR as a scorecard, not a slogan: each letter becomes a concrete check, machine-actionability is the target, and Interoperability is flagged as the dimension a real graph most often fails — usually because of hand-authored metadata, not the engine. Original diagram by the authors, created with AI assistance.

This scorecard is not a slide — it is a loadable individual in the dataset. The graph carries its own FAIR self-assessment of the DS-001 record as four bp:FAIRCheck nodes hung off a bp:FAIRAssessment, each with a verdict, plus the usage license that the Reusable check depends on:

@prefix bp:   <https://example.org/bioproc#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .

bp:FAIR-DS-001 a bp:FAIRAssessment ; rdfs:label "FAIR assessment of the DS-001 record" ;
    bp:assesses bp:DS-001 ;
    bp:hasCheck bp:FC-F , bp:FC-A , bp:FC-I , bp:FC-R .
bp:FC-F a bp:FAIRCheck ; rdfs:label "Findable (global IRI)" ; bp:fairVerdict "PASS" .
bp:FC-A a bp:FAIRCheck ; rdfs:label "Accessible (resolves, with access conditions)" ; bp:fairVerdict "PASS" .
bp:FC-I a bp:FAIRCheck ; rdfs:label "Interoperable (shared vocab + QUDT units)" ; bp:fairVerdict "PARTIAL" .
bp:FC-R a bp:FAIRCheck ; rdfs:label "Reusable (method, lineage, licence)" ; bp:fairVerdict "PASS" .
bp:LICENSE-CC-BY a bp:UsageLicense ; rdfs:label "CC BY 4.0" .
bp:DS-001 bp:hasLicense bp:LICENSE-CC-BY .

The honesty of the model is in the one verdict that is not "PASS": Interoperable is recorded as "PARTIAL", not because the QUDT units or the shared-class edges are missing, but because — as the next section measures — even a graph this carefully aligned still maps only a fraction of its local terms up to verified external IRIs, and that is the dimension a real graph most often fails.

The unsolved part: compliant on the wire, hollow in fact

Here is the gap the series has named since the data book and that this chapter measures head-on. When researchers assess real datasets against FAIR, a consistent pattern emerges: nearly everything is Findable — identifiers and search are easy — while Interoperability is the dimension that most often scores poorly, frequently the lowest of the four [2][3]. The reason is the one this whole book has circled: metadata is authored by hand, in free text, without a controlled vocabulary, so the fields exist but point at no shared ontology and the units are bare strings. The data is findable and downloadable and still un-combinable — the semantic swamp dressed in the language of compliance. "We use FAIR" can be true at the syntactic layer and hollow at the semantic one.

Biomanufacturing inherits this directly, and the knowledge-graph chapter named the failure modes: a plant can stand up a conformant triplestore, emit perfect RDF, and still produce a graph no downstream system can merge, because the predicates were never aligned to a shared ontology and the units were never pinned. What closes that gap is not the triplestore but a separate alignment file that maps every local term up to a verified external IRI — the line that turns bp:DrugSubstance from a private label into something a BFO-grounded system recognizes:

@prefix bp:   <https://example.org/bioproc#> .
@prefix obo:  <http://purl.obolibrary.org/obo/> .
@prefix iof:  <https://spec.industrialontologies.org/ontology/construct/> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .

bp:Material      rdfs:subClassOf obo:BFO_0000040 .       # BFO 2020 'material entity'
bp:DrugSubstance rdfs:subClassOf iof:MaterialProduct .   # IOF Core 'material product'
bp:derivedFrom   rdfs:subPropertyOf obo:RO_0001000 .     # RO 'derives from'

Without these three lines the earlier bp:DS-001 triples are findable and downloadable yet un-combinable; with them, the predicate and the type carry meaning a stranger's reasoner already holds. But align.ttl is also where the "PARTIAL" verdict earns itself honestly, and it is worth seeing why a careful alignment is still partial. Some terms map cleanly to a verified external IRI: bp:Material to BFO 2020's material entity obo:BFO_0000040, bp:HostOrganism to NCBI Taxonomy's Cricetulus griseus obo:NCBITaxon_10029, bp:HCPAssay to the OBI ELISA class obo:OBI_0000661, bp:derivedFrom to RO's obo:RO_0001000. Others have no verified one-to-one external leaf and stay deliberately marked ILLUSTRATIVE — a local QbD relation like affectsQuality has no canonical standard counterpart, so it is honestly kept local rather than mapped to a plausible-but-wrong IRI. A second, subtler crack: the IOF terms (iof:MaterialProduct, iof:Bioreactor) were verified against the published IOF release, not the EBI Ontology Lookup Service, because OLS does not host IOF — so a downstream tool that resolves IRIs only through OLS will recognize the BFO/OBI side and silently miss the IOF side. That split, plus the absence of any owl:imports in the validated file, is exactly why a self-honest scorecard records Interoperability as "PARTIAL" rather than "PASS". A SHACL gate catches the structural cases — a missing field, a wrong datatype — but it cannot catch a human who confidently mislabels a field with a plausible-looking term, the same completeness-is-not-correctness limit the release gate has. So FAIRness is not delivered by adopting RDF; it is delivered by the discipline and tooling that ensure metadata authored on the floor is controlled-vocabulary, machine-checkable, and interoperable in fact. That discipline is an organizational and change-management problem as much as a technical one, and it is — honestly — where the field still genuinely struggles. The standard this chapter sets is to measure FAIRness rather than assume it: assess the graph against each letter, expect Interoperability to be the weak one, and treat the gap as work to do, not a box already ticked.

Why it matters

FAIR is the entire justification for the modeling effort — the reason to pay the cost of ontologies, IRIs, and units rather than just storing numbers. But the justification only holds if the FAIRness is real, and the consistent finding that real data is findable-but-not-interoperable means the justification is routinely claimed and not delivered. Measuring FAIRness — turning each letter into a check and scoring the graph honestly — is what keeps a project from the comfortable lie that adopting standards equals achieving the goal. A graph that looks FAIR and is not is arguably worse than an honest spreadsheet, because it invites a trust the data has not earned. This chapter is the book's insistence that the north star be checked, not assumed.

In the real world

FAIR assessment has matured from principle to practice: there are published metrics, maturity indicators, and interpretation frameworks that let an organization score its data rather than assert compliance [2][3]. The consistent, sobering result across domains is that Interoperability and rich Reusability lag far behind Findability, because they demand the controlled vocabularies and provenance that hand-authored metadata lacks. In biopharma, the standards that would deliver interoperability — IOF/BMIC, Allotrope, QUDT — exist and are converging, so the bottleneck is not missing technology but the discipline of actually authoring to them on the floor. That is precisely the governance commitment of the last chapter, viewed through the lens of the goal it serves, and it is the honest setup for the book's final reckoning.

Key terms

FAIR principles — that data be Findable, Accessible, Interoperable, and Reusable, with machine-actionability as the explicit target; principles, not a conformance test.
Machine-actionability — usability by computers with minimal human help; the property a human-readable but machine-opaque record fails.
FAIR is not open — Accessible means clear access conditions and a standard retrieval mechanism, not that everyone may read everything; restricted regulated data can be fully FAIR.
FAIR assessment — turning each letter into a concrete, scorable check rather than a claim; the practice that exposes the gap between compliance and FAIRness.
The interoperability gap — the consistent finding that real data is Findable but most often fails Interoperability, because metadata is hand-authored without a controlled vocabulary.
Compliant versus FAIR-in-fact — the difference between emitting valid RDF on the wire and producing data that is genuinely combinable; the gap a SHACL gate cannot close because it cannot catch a plausible mislabel.

Where this leads

We can measure whether the graph delivers, and we have admitted where it falls short. We have also now built the whole model — the spine, the values, the process, the thread, its governance, and its FAIRness. Before the book delivers its verdict, it steps out of the running example to ask the empirical question the last six parts assumed an answer to: does the real industry actually do any of this, and how mature is it? Part VII opens with The Standards Bodies: Who Actually Builds Biopharma's Shared Vocabulary, surveying the pre-competitive consortia — Allotrope, the Pistoia Alliance, ISA-88/95, OPC UA and MTP, ISPE Pharma 4.0, BioPhorum, GS1, and the OAGi/NIIMBL biomanufacturing-ontology effort — that produce the shared vocabularies every earlier chapter quietly leaned on.

What this chapter covers​

Each FAIR letter is a question you can actually check​

The unsolved part: compliant on the wire, hollow in fact​

Why it matters​

In the real world​

Key terms​

Where this leads​