Skip to main content

Modeling the Design Space: CPPs, CQAs, and QbD as a Graph

📍 Where we are: Part II · Discovery and Development, modeled — Chapter 7. The molecule and its living factory are modeled. Now we model the knowledge that turns a molecule into a controllable process — and make good on the preface's claim that QbD is secretly an ontology.

The preface promised that Quality by Design (QbD) — the framework that treats recorded process understanding as part of the product — is an ontology in everything but name. This is the chapter that cashes that promise. Process development is where a team learns which knobs matter: which critical process parameters (CPPs) drive which critical quality attributes (CQAs), and over what ranges. That knowledge is the most valuable thing a program produces besides the molecule itself, and it usually lives in a development report no machine can read. We model it instead as a graph a machine can query.

The simple version

A great cook knows that oven temperature affects crust, that resting time affects texture, and the safe range for each. Write that as prose in a notebook and only a human who reads the whole notebook can use it. Write it as a structured map — this knob affects that outcome, within these limits, and here's the experiment that proved it — and anyone, or any computer, can ask "what affects the crust?" and get an answer. QbD is that structured map for making medicine. This chapter turns it from prose into a graph you can query.

What this chapter covers

We model the QbD vocabulary — process parameter, quality attribute, criticality, the design space, and the control strategy — as entities and relations, building out the affectsQuality relation the axioms chapter introduced. We attach the evidence (the studies, the ranges, the risk level) that makes a relationship trustworthy, dissect one CPP-to-CQA link, and confront the honest limit: a graph can index which parameters matter, but it cannot hold the continuous response surface that says exactly how.

Criticality is a judgment the graph should record, not hide

A process parameter is any setting you can choose — feed rate, temperature, pH, dissolved oxygen. A quality attribute is any measurable trait of the product — monomer purity, glycosylation, charge variants. What makes either one critical is a judgment: a parameter is a CPP if varying it within plausible limits meaningfully affects a CQA, and an attribute is a CQA if it affects safety or efficacy [1]. The naive model bakes criticality into a class name and moves on. The better model treats criticality as an assessed quality with provenance: this parameter hasCriticality high, as determined by a specific risk assessment, because it affects this CQA. Modeling the judgment — not just its conclusion — is what lets the graph answer "why is feed rate a CPP?" and survive the day someone re-assesses it. Criticality is the output of quality risk management, an ICH Q9 discipline, and the assessment that produced it is itself an entity worth keeping [2].

affectsQuality, with its evidence attached

The heart of QbD is one relation: a CPP affects a CQA. Declared as the object property bp:affectsQuality from a ProcessParameter to a QualityAttribute, the fact bp:FeedRate bp:affectsQuality bp:MonomerPct-CQA stops being a sentence in a report and becomes a queryable edge — so "which parameters affect monomer purity?" is a one-line query across every program that models this way. But an edge alone is a claim without backing. What makes process knowledge trustworthy is the evidence, so the model hangs it on the relation: the design-of-experiments (DoE) study that established the link, the proven acceptable range (PAR) and tighter normal operating range (NOR) within which the parameter is controlled, and the strength or direction of the effect. The relationship plus its evidence is the difference between "we believe feed rate matters" and "feed rate affects monomer purity per study DOE-07, controlled to 0.30–0.50 of its PAR, NOR 0.35–0.45" — a fact an investigator, an auditor, or a soft-sensor model can stand on. Declared and asserted, that single edge is all it takes:

# bioproc.ttl + instances.ttl — QbD's core relation, declared and then asserted.
bp:affectsQuality a owl:ObjectProperty ;
rdfs:domain bp:ProcessParameter ; rdfs:range bp:QualityAttribute .
bp:FeedRate a bp:ProcessParameter .
bp:MonomerPct-CQA a bp:QualityAttribute .
bp:FeedRate bp:affectsQuality bp:MonomerPct-CQA . # the prose claim, now a queryable edge

And the question "which parameters affect monomer purity?" becomes a one-line query with a one-row answer over the loadable graph:

# affects-quality.rq
PREFIX bp: <https://example.org/bioproc#>
SELECT ?parameter ?attribute WHERE {
?parameter bp:affectsQuality ?attribute .
?attribute a bp:QualityAttribute .
}
affectsQuality edges: [('FeedRate', 'MonomerPct-CQA'), ('Temperature', 'MonomerPct-CQA')]

That bare edge is exactly as trustworthy as the evidence hung on it, and in the dataset the evidence is real — the criticality assessment, the two ranges, the study that established the link, and the pointer to the surface the graph does not store:

# instances.ttl — the same parameter, now with its evidence, ranges, criticality, and model pointer.
bp:FeedRate
bp:hasCriticality bp:CRIT-FeedRate ; # assessed HIGH...
bp:hasNormalOperatingRange bp:NOR-FeedRate ; # NOR 0.35-0.45
bp:hasProvenAcceptableRange bp:PAR-FeedRate ; # PAR 0.30-0.50 (wider)
bp:establishedBy bp:DOE-07 . # ...by design-of-experiments study DOE-07
bp:CRIT-FeedRate a bp:Criticality ; bp:criticalityLevel "high" ; bp:establishedBy bp:RA-01 .
bp:NOR-FeedRate a bp:NormalOperatingRange ; bp:norLow 0.35 ; bp:norHigh 0.45 .
bp:PAR-FeedRate a bp:ProvenAcceptableRange ; bp:parLow 0.30 ; bp:parHigh 0.50 .
bp:DESIGNSPACE-mAb-A a bp:DesignSpace ; bp:referencesModel bp:RSM-feedrate-monomer . # the surface lives in the model file

Identity card dissecting one CPP-to-CQA relationship: a subject row for the process parameter bp; a predicate row bp; an object row for the quality attribute bp; an evidence block attaching the design-of-experiments study DOE-07 that established the link, the proven acceptable range and the tighter normal operating range as QUDT-typed intervals, and the effect direction and magnitude; a criticality row marking the parameter a CPP as assessed by a named risk assessment; and a pointer row noting the full response surface lives in a referenced model file, not in the graph. One unit of process knowledge: the affectsQuality edge carries the study that proved it, the ranges that bound it, and the risk assessment that made the parameter critical — so the most valuable knowledge in development is a fact, not a paragraph. Original diagram by the authors, created with AI assistance.

The design space is a region; the graph is its index

Put many affectsQuality relationships together, with their ranges, and you have approached the design space: the multidimensional region of parameter settings proven to yield acceptable product [1]. Stay inside it and quality is assured; that is the whole QbD bargain. It is tempting to think the graph should contain the design space. It should not, and seeing why sharpens everything Part I taught. The design space is a continuous surface over many interacting parameters — a response surface fitted from data. Flattening a continuous surface into subject-predicate-object triples would balloon the graph into millions of meaningless rows and still lose the surface's shape, exactly the boundary the knowledge-graph chapter drew for spectra and chromatograms.

So the division of labor is the same one that runs through the whole series: the graph holds the structure and intent — which parameters affect which attributes, their ranges, their evidence, their criticality — and references the response surface as a model artifact by IRI. The graph says that feed rate and temperature jointly affect monomer purity, names the study, bounds the ranges, and points at the fitted model that says exactly how. The graph is the navigable index of process knowledge; the model file is the warehouse of the surface. This keeps the graph small, queryable, and honest, and it is why the design space modeled here meets the hybrid and ML models of the companion book at a clean seam: the graph indexes the relationship, the model computes the value.

A two-layer diagram: the upper layer is a knowledge-graph view showing process-parameter nodes (feed rate, temperature, pH, dissolved oxygen) linked by affectsQuality edges to quality-attribute nodes (monomer purity, glycosylation, charge variants), each edge tagged with a DoE study and a range, and a control-strategy node tying them together; the lower layer is a referenced model artifact shown as a multidimensional response surface labelled design space, linked up to the graph by a single IRI pointer rather than being flattened into triples; a caption reads the graph indexes which parameters matter, the model file holds how much. Two layers, one seam: the graph holds the affectsQuality relationships, ranges, and control strategy as queryable facts, and points by IRI at the fitted response surface — indexing the design space rather than trying to store it. Original diagram by the authors, created with AI assistance.

The control strategy is where knowledge becomes action

Knowing what matters is not the same as controlling it. The control strategy is the planned set of controls — parameter ranges, in-process tests, the analytical methods that check CQAs — that together assure quality [3]. Modeled, it ties the whole graph together: each control controls a CPP, verifies a CQA, and usesMethod an analytical method. This is the artifact that ships to the plant in the recipe, and modeling it as relations rather than a document means the release gate downstream can check, mechanically, that every CQA in the control strategy actually has a result — closing the loop from development intent to manufacturing evidence:

# instances.ttl — the control strategy as relations, not a document.
bp:CS-mAb-A a bp:ControlStrategy ; bp:hasControl bp:CTRL-monomer .
bp:CTRL-monomer a bp:Control ;
bp:controls bp:FeedRate ; # holds the CPP within its range...
bp:verifies bp:MonomerPct-CQA ; # ...to assure the CQA...
bp:usesMethod bp:SEC-Method . # ...checked by the validated SEC method

The unsolved part: interactions, and the limits of a structural model

The honest difficulties are two, and both are about what the structural model cannot capture. The first is interaction. Parameters rarely act alone — feed rate and temperature may matter only in combination, and the number of possible interactions explodes combinatorially as parameters multiply. The graph can assert that an interaction exists and point at the study, but it cannot represent the interaction's form; that, again, lives in the response surface the graph only references. A model that lists affectsQuality edges one parameter at a time can quietly imply a process is simpler than it is, and the discipline of also modeling joint effects — and flagging where they are unknown — is easy to skip.

The second is the gap between a proven design space and a modeled one. Regulatory design-space verification is hard, evolving, and data-hungry; a beautifully modeled graph of affectsQuality relationships is only as true as the studies behind each edge, and early process knowledge is incomplete by definition. The graph makes process understanding legible and queryable, which is a real advance over a buried report — but legible is not the same as complete or correct, the same distinction the axioms chapter drew between consistent and true. The model is an honest map of what the program believes and why; it does not certify that the belief is the whole truth, and pretending otherwise is how a confident graph hides an immature process.

Why it matters

Process knowledge is the asset that justifies the entire data enterprise, and a graph is where it stops being trapped. Modeled as affectsQuality relationships with evidence and ranges, QbD knowledge becomes queryable ("what affects this CQA?"), checkable (does every CQA have a control and a result?), and portable (it transfers to a new site as facts, not a re-read of a report). It also becomes the substrate the ML and soft-sensor models plug into, since a hybrid model that respects the design space needs the design space represented somewhere a machine can read. This chapter is where the preface's "process understanding" becomes something a computer can hold.

In the real world

QbD, the design space, CPPs and CQAs, and the control strategy are not this book's coinages — they are the vocabulary of the ICH quality guidelines that regulators and industry share, which is exactly why modeling them as a shared ontology is natural rather than forced [1][2][3]. The relationships a development team establishes already are a graph in their heads and their reports; the move this chapter argues for is to write that graph down in a form a machine can traverse. Knowledge-management expectations in the pharmaceutical quality system point the same way — toward process understanding captured as a reusable asset across the lifecycle rather than re-derived each time — and a queryable CPP-to-CQA graph is the most direct realization of that idea [3].

Key terms

  • Critical process parameter (CPP) — a process setting whose variation meaningfully affects a quality attribute; modeled with its criticality as an assessed, evidenced quality.
  • Critical quality attribute (CQA) — a measurable product trait affecting safety or efficacy that must stay within limits.
  • affectsQuality — the core QbD relation from a parameter to a quality attribute, carrying its DoE evidence, ranges, and effect as attached facts.
  • Design space — the multidimensional region of settings proven to yield acceptable product; indexed by the graph and stored as a referenced response-surface model, not flattened into triples.
  • Proven acceptable range (PAR) / normal operating range (NOR) — the wider proven and tighter routine operating intervals for a parameter, modeled as QUDT-typed ranges.
  • Control strategy — the planned set of controls, tests, and methods assuring quality; modeled as relations tying parameters, attributes, methods, and steps together.
  • Quality risk management — the ICH Q9 discipline that assesses criticality; the assessment is itself a modeled entity, not just its conclusion.

Where this leads

We can model what the process must achieve and how it is controlled. But every CQA in the control strategy is checked by a measurement, and measurements have their own rich structure. The next chapter, Modeling Analytical Methods and Results: Allotrope and OBI, models the methods that verify quality — the assay as an occurrent, the method as a plan, the result as a typed, vendor-neutral fact — and faces the boundary where a release number belongs in the graph but the spectrum behind it does not.