Generative AI and LLMs: Copilots, CAPA, and the Limits of Agents
📍 Where we are: Part VI · The Whole System — Chapter 24. The previous chapter, Manufacturing Operations, put learning into the plant's nervous system — predictive maintenance, yield analytics, and scheduling. This one turns to the layer that learns language: the deviation reports, batch records, SOPs, and CMC documents that wrap every batch. It is the loudest part of the 2023-2026 AI wave, and the part where the gap between demo and deployment is widest.
Every chapter so far has learned a number — a titer, a log-reduction value, an anomaly score, a maintenance horizon. This chapter learns words. Biomanufacturing runs on an enormous, mostly unstructured paper trail: a deviation is a free-text narrative; a CAPA (corrective and preventive action) is an investigation written in prose; a batch record is hundreds of pages of steps, entries, and signatures; a CMC (chemistry, manufacturing, and controls) section is a regulatory document drafted by hand over weeks. Large language models (LLMs) are, for the first time, genuinely good at reading and drafting exactly this kind of text — and the result has been a flood of "copilots," "agents," and "assistants" pitched at quality and manufacturing since GPT-4 landed in 2023 [1][2].
The honest reading is the one this whole book keeps arriving at, sharpened here to its hardest edge. Generative AI in GMP is real, useful, and almost entirely advisory: it triages, retrieves, summarizes, and drafts, and a qualified human reviews and signs. Where a firm let it cross that line — generating specifications and master production records without quality-unit review — the FDA issued its first AI-citing cGMP warning letter (Purolea, 2 April 2026) [3][4]. And the draft EU GMP Annex 22 draws the boundary in regulation: it excludes generative and continuously-learning AI from critical GMP decisions outright [5][6]. This chapter is the map of what works, what does not, and exactly where the line falls.
Think of a brilliant, fast new analyst who has read every deviation report your site ever filed but has no signing authority. You hand them a new problem; in seconds they say "this looks like the three temperature excursions we had last spring — here are the investigations and the CAPAs that closed them, and here is a first draft of your write-up." That is genuinely valuable: it saves hours and stops you reinventing the wheel. But the analyst sometimes states things with total confidence that are simply wrong (a "hallucination"), and they are not allowed to decide anything — to release a batch, close a CAPA, or approve a record. A qualified person reads what they produced, checks it against the real evidence, and signs. Generative AI in a GMP plant is exactly that analyst: it accelerates the human, it does not replace the signature.
What this chapter covers
- Where the GenAI wave is actually real in manufacturing and quality: deviation/CAPA triage and drafting, NLP investigation assistants, MES/SOP/batch-record copilots, CMC and regulatory drafting, and knowledge management.
- Retrieval-Augmented Generation (RAG): why grounding an LLM in your own validated documents is the architecture that makes it usable in GMP, and how a retriever actually works.
- The two NLP tasks underneath every copilot: classification/triage and similarity retrieval — shown as runnable, transparent code, not a black box.
- The hard limits: hallucination, GxP validation of a non-deterministic model, and data leakage/confidentiality.
- The regulatory boundary: the Purolea warning letter as enforcement, and draft Annex 22's exclusion of generative AI from critical decisions — and exactly where agentic AI is therefore confined.
The GenAI wave, sorted into what is real
The marketing makes everything sound equivalent. It is not. Sort the deployments by what they actually do with a regulated decision, and a clear hierarchy appears — from genuinely useful and low-risk to overhyped and forbidden.
Tier 1 — retrieve and summarize (real, widely piloted). The safest and most valuable use is turning a mountain of existing text into a fast answer. A deviation investigator asks "have we seen this failure before?" and the system retrieves the most similar historical deviations and their closed CAPAs. McKinsey reports a life-sciences manufacturer synthesizing roughly 70% of deviations and producing a first-draft CAPA for over 80% of cases with generative AI — a vendor/consultancy-reported figure, not peer-reviewed, and explicitly a drafting aid with human review [7]. MSD (Merck & Co.) has publicly described a RAG-based deviation assistant built on AWS Bedrock as exploratory, retrieval-not-predictive work [8]. This tier is mostly (pilot), occasionally edging into production for non-critical documentation.
Tier 2 — extract and classify (real, the NLP under the copilot). Beneath the chat interface sits old-fashioned, defensible NLP: classify a deviation into a category and route it; extract entities (equipment ID, batch, date, failure mode) from a narrative; flag a record as likely-recurring or likely-critical for prioritization. The peer-reviewed anchor here is a Merck & Co. study evaluating GPT-4 and Claude-2 on manufacturing deviation text: highly accurate entity extraction, but with an explicit, named tension the authors call the "interplay between apparent reasoning and hallucination" — and a hedge that human review "might be necessary, especially in high-risk tasks" [1]. This is the most rigorously studied tier, and it is where this chapter's runnable example lives.
Tier 3 — draft regulated content (real but tightly governed). CMC drafting, batch-record review assistance, product-quality-review (PQR) generation, and SOP authoring. Sanofi has reported generating product quality reviews roughly eight times faster, targeting around 5,000 reports per year — a self-reported figure that must be labeled as such, not stated as established fact [9]. Vendors are racing here: Aizon (GxP manufacturing intelligence; intelligent batch record), ValGenesis (validation lifecycle and tech-transfer drafting), and Veeva (Vault Quality AI agents, on a 2026 roadmap, running on Anthropic and Amazon models via Bedrock) all sell into this tier, alongside horizontal copilots from Microsoft (Copilot in quality systems) and integrators like Accenture and Capgemini [10][11][12][13]. Every output here is a draft that a qualified person reviews, edits, and signs.
Tier 4 — agentic / autonomous (overhyped, and where the line is drawn). "Agentic AI" — systems that plan and take multi-step actions with minimal human input — is the loudest 2025-2026 pitch (Aizon pre-announced agentic capabilities; Veeva's agents are a roadmap item) [14][12]. In a GMP plant this tier is confined, by both enforcement and draft regulation, to non-critical, human-in-the-loop tasks. Cross the line into critical GMP — generating the records that govern how a batch is made or released, without quality-unit review — and you get the Purolea warning letter [3][15].
The pattern across all four tiers is the ISPE Pharma 4.0 reality stated throughout this book: AI/ML has the most pilots and the fewest scaled implementations, and production clusters in monitoring, vision, and human-in-the-loop documentation — not autonomous control of quality [2].
Retrieval-Augmented Generation: why grounding is the whole game
A raw LLM answers from its training data — a frozen, opaque snapshot that contains none of your SOPs, none of your deviation history, and no guarantee of currency. Ask it about your process and it will either decline or, worse, confidently invent a plausible answer. Retrieval-Augmented Generation (RAG) is the architecture that fixes this, and it is the single most important idea for making LLMs usable in a regulated plant.
RAG separates two jobs. A retriever searches your own validated corpus — SOPs, batch records, prior deviations, CMC sections — for the passages most relevant to the question, using vector similarity over text embeddings. A generator (the LLM) is then prompted with the question plus those retrieved passages and instructed to answer only from them, citing which document each claim came from. The benefits are exactly the ones GMP demands:
- Grounding cuts hallucination. The model is told to answer from supplied, real documents rather than from memory, and to say "not found" when the passages do not support an answer.
- Traceability. Every statement can cite its source document and section — the audit trail a reviewer needs.
- Currency and access control. The corpus is your current, controlled documents; updating an SOP updates the system's knowledge without retraining, and document-level permissions can keep a model from surfacing text a user is not cleared to see.
RAG does not eliminate hallucination — a model can still misread or over-generalize a retrieved passage — but it converts an open-ended generation problem into a grounded, citable one, which is the difference between "interesting" and "auditable." This is why nearly every defensible GenAI deployment in the wave above is RAG-shaped: MSD's deviation assistant, Microsoft Copilot over quality systems, and the deviation/CAPA copilots all stand on a retriever over the firm's own documents [8][11][7].
The retriever is also the part you can build, test, and validate with classical, transparent tools — no giant model required. That is what the example below shows: the retriever and the triage classifier are ordinary, inspectable machine learning, and the generative LLM sits on top of them as the drafting layer that a human always reviews.
The two NLP tasks under every copilot
Strip the chat interface away and a GMP "copilot" reduces to two well-understood NLP problems, both of which predate LLMs and both of which you can validate.
Triage (classification). A new deviation arrives as free text. Before any drafting, the system should route it: which category (temperature excursion, contamination, out-of-specification assay, equipment fault, documentation), which severity, which owning function. This is text classification — vectorize the narrative, predict a label — and it is the step that lets a quality unit prioritize hundreds of deviations a month. The Merck & Co. study is, at its core, a rigorous evaluation of exactly this extract-and-classify capability on real deviation text [1].
Retrieval (similarity search). Given the new deviation, find the most similar historical ones and their CAPAs. This is the retriever step of RAG, and on its own it is enormously useful: an investigator grounded in three real prior cases writes a faster, better investigation than one starting from a blank page. Crucially, you can do this with classical text similarity (TF-IDF and cosine distance) or with neural embeddings; the shape is the same, and the classical version runs with no model download, which makes it the honest teaching version.
The reason to name these two tasks explicitly is governance: they are inspectable. A triage classifier's accuracy can be measured against a labeled test set; a retriever's neighbors can be eye-checked for relevance. The generative LLM that drafts the final summary is far harder to validate — which is precisely why, in a defensible architecture, it is constrained by these two validatable steps rather than running free.
The GenAI wave sorted by risk: retrieval and summarization at the base (grounded by RAG), the inspectable triage and extraction NLP beneath the copilot, governed drafting of regulated content in the middle — all human-reviewed-and-signed — and agentic autonomy held below a hard regulatory boundary that draft Annex 22 draws and the Purolea warning letter enforces.
Original diagram by the authors, created with AI assistance.
A runnable model: deviation_triage.py
The example module examples/platform/ml/deviation_triage.py builds both NLP tasks transparently, with no network and no model download, over a small synthetic corpus of deviation narratives written in the running example's vocabulary — the day-7 fed-batch temperature excursion, the BATCH-2026-004 host-cell-protein out-of-specification from hplc_results.csv, environmental-monitoring excursions, equipment faults, and documentation deviations. The corpus is explicitly synthetic (a real deviation log is confidential), so the classifier metrics are illustrative; the shape of the pipeline is the lesson.
The triage half vectorizes each narrative with TF-IDF and fits a logistic-regression classifier:
# examples/platform/ml/deviation_triage.py (excerpt)
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report, f1_score
def train_triage(seed: int = 2026):
texts, labels = build_corpus()
vec = TfidfVectorizer(ngram_range=(1, 2), min_df=1, sublinear_tf=True, stop_words="english")
X = vec.fit_transform(texts)
Xtr, Xte, ytr, yte, *_ = train_test_split(
X, labels, range(len(labels)), test_size=0.33, random_state=seed, stratify=labels)
clf = LogisticRegression(max_iter=1000, C=4.0)
clf.fit(Xtr, ytr)
pred = clf.predict(Xte)
return {"macro_f1": round(float(f1_score(yte, pred, average="macro")), 3),
"report": classification_report(yte, pred, zero_division=0)}
The retrieval half is the RAG retriever step, shown without an LLM — cosine similarity over the same TF-IDF space returns the most similar prior deviations with their scores, so a weak match can be flagged rather than blindly trusted:
# examples/platform/ml/deviation_triage.py (excerpt)
from sklearn.metrics.pairwise import cosine_similarity
def retrieve_similar(query: str, k: int = 3):
"""The retrieval in Retrieval-Augmented Generation, shown without an LLM."""
texts, labels = build_corpus()
vec = TfidfVectorizer(ngram_range=(1, 2), min_df=1, sublinear_tf=True, stop_words="english")
corpus_vecs = vec.fit_transform(texts)
sims = cosine_similarity(vec.transform([query]), corpus_vecs).ravel()
order = np.argsort(sims)[::-1][:k]
return [(texts[i], labels[i], round(float(sims[i]), 3)) for i in order]
Running python platform/ml/deviation_triage.py prints the following. The retrieval block is the part that matters, and it is verbatim: a new "temperature fell below setpoint" deviation correctly surfaces the day-7 BATCH-2026-001 excursion as its top match. The triage macro-F1 is deliberately modest — a teaching result, not a tuned one — because a 34-narrative corpus split into 22 train / 12 test is exactly the small-data regime that breaks classifiers, and the per-class breakdown shows it honestly (the oos_assay class scores 0.00 with only two test examples):
deviation triage + CAPA retrieval (the NLP under the GenAI copilot)
corpus: 34 SYNTHETIC deviation narratives, 5 categories (illustrative)
-- TRIAGE: TF-IDF (759 features) + LogisticRegression --
split: 22 train / 12 test macro-F1 = 0.542 (illustrative)
precision recall f1-score support
contamination 1.00 0.33 0.50 3
documentation 1.00 1.00 1.00 2
equipment 0.38 1.00 0.55 3
oos_assay 0.00 0.00 0.00 2
temperature 1.00 0.50 0.67 2
accuracy 0.58 12
-- RETRIEVAL: new deviation, find similar prior cases (RAG retriever step) --
query: Production bioreactor temperature fell below the 36.5 C setpoint for several h...
sim=0.363 [temperature ] Bioreactor BR101 temperature excursion on day 7 of BATCH-2026-001; jacke...
sim=0.201 [temperature ] Temperature deviation in production bioreactor: setpoint 36.5 C, PV dipp...
sim=0.163 [temperature ] Production bioreactor jacket temperature spiked above the upper NOR duri...
Read this the way a quality lead would. The retrieval is the immediately useful part: the new temperature deviation pulls back the three most similar historical cases, all genuinely temperature-related and all in the running example's plant, with similarity scores attached so a low-confidence match is visible. That is the RAG retriever doing its one job — grounding the investigator in real prior cases instead of a blank page. The triage classifier, by contrast, is a cautionary tale by design: a macro-F1 of 0.54 on a 34-row corpus is what small data actually buys you, and the oos_assay row collapsing to zero is the model failing loudly on the class it has the fewest examples of. The lesson is not "build a better classifier"; it is that this number is advisory at best, the LLM drafting on top of it is even less validatable, and a human must read everything. That gap between a confident demo and a thin, honest result is the entire subject of the limits section below.
Anatomy of one deviation-investigation record
A deviation under GenAI assistance, like every artifact in this series, is not a bare LLM answer — it is a structured investigation record where the model's contributions are labeled as such, sit beside their grounding evidence, and are gated behind a human signature. Dissect one the way a quality reviewer would, because the structure is the governance.
One AI-assisted deviation record, fully unpacked: the narrative and the AI-suggested triage (advisory), the RAG-retrieved prior cases that ground the draft (each citing a real document), the LLM-drafted summary and CAPA (stamped DRAFT, human-must-review), and the deterministic governance core — the quality-unit reviewer, the e-signature, the audit trail, and the model version and prompt hash that produced the draft. The AI accelerates every field above the line; the signature below it is the critical decision the model is forbidden to make.
Original diagram by the authors, created with AI assistance.
Read the card top to bottom and the chapter's argument is laid out as fields. The core holds the human-authored narrative and the AI-suggested category and severity, each tagged advisory — the triage output, never a final classification. The retrieval block is the RAG grounding: the three most similar prior deviations with their similarity scores and the CAPA references that closed them, each carrying a source citation so a reviewer can open the original. The generation block holds the LLM's first-draft investigation summary and draft CAPA, stamped DRAFT and human-must-review, with a hallucination-check note — these are the highest-value and lowest-trust fields on the card. The governance block is the deterministic, GxP-controlled core: the quality-unit reviewer, the e-signature (21 CFR Part 11 / Annex 11), the review decision, the audit-trail entry, and — the field that makes the whole thing auditable — the exact model_version and prompt_hash that produced the draft, so "which model, with which prompt, generated this text?" is answerable forever. The violet relationships panel records lineage: the record derivedFrom the narrative, is grounded-in prior CAPAs, was drafted-by a pinned model version, reviewed-by the quality unit, and closed-by a human signature. Everything above the signature is acceleration; the signature itself is the critical decision the model may not make.
The hard limits: hallucination, validation, and leakage
Three limits separate the demo from the deployment, and all three are structural — they do not go away with a bigger model.
Hallucination. An LLM generates fluent, confident text whether or not it is correct; it has no internal sense of "I do not know." In a deviation summary this is dangerous precisely because the output is plausible — a fabricated root cause reads exactly like a real one. The Merck & Co. study names this directly as the "interplay between apparent reasoning and hallucination," and it is why every defensible deployment is RAG-grounded (answer only from retrieved real documents) and human-reviewed [1]. RAG reduces the rate; it does not reach zero, because a model can still misread or over-extend a real passage.
Validating a non-deterministic model under GxP. Classical computerized-system validation assumes the same input yields the same output, which an LLM does not guarantee (temperature, sampling, model updates all move the output). The draft Annex 22, the ISPE GAMP AI Guide (July 2025, with its "seven control layers for LLMs"), and the FDA's risk-based credibility framework all converge on the same posture: a model used for a GMP-relevant purpose must be locked at validation, governed by a predetermined change-control plan, monitored for drift, and held to a level of scrutiny proportional to how much it influences a decision [5][16][17]. A continuously-updating cloud LLM is the hardest possible case for this, which is one reason the highest-risk uses are the ones the regulators carve out entirely.
Data leakage and confidentiality. Batch records, deviations, and CMC documents are among a manufacturer's most sensitive data. Sending them to a third-party LLM API risks exposure; training or fine-tuning a shared model on them risks the data resurfacing in another customer's output. This is why GMP-credible deployments run on tenant-isolated or on-premises models (e.g., Veeva's agents on Bedrock with customer-isolated models, private deployments behind the firm's own access controls) and why RAG — which keeps the corpus in the firm's control and merely retrieves into a prompt — is preferred over fine-tuning on proprietary text [12][8].
The unsolved part: validating a system that does not give the same answer twice
The deepest open problem is not hallucination — RAG and human review are a workable, if imperfect, mitigation for that. It is the collision between the definition of validation and the nature of a generative model.
GMP validation rests on reproducibility: you demonstrate that a system, given a defined input, reliably produces a correct, defined output, and you keep it under change control so that property holds for its whole life. A generative LLM violates this at the root. The same prompt can yield different text on two runs; a silent provider-side model update can shift behavior overnight; and even a "locked" model is locked only until the next version, with no guarantee the new one fails in the same places. The traditional answer — re-validate on change — runs into the validation paradox this book named in the MLOps chapter: the model that most needs continuous updating is the one hardest to keep validated, and freezing it to stay compliant forfeits the improvement that justified using it.
The field has no settled resolution, only a direction of travel. The emerging consensus is to validate the system and its guardrails rather than the model's every output: validate that the retriever returns relevant, permissioned documents; validate that the generator is constrained to answer from them and to abstain when they do not support an answer; validate the human-review gate and the audit trail; and treat the model itself as a controlled component with a predetermined change-control plan and ongoing monitoring [16][6]. It is an honest, partial answer. It keeps the human in the loop precisely because no one yet knows how to make a non-deterministic generator deterministic enough to trust unattended — and the draft regulation, by excluding generative AI from critical decisions, is essentially codifying that uncertainty into law.
What this chapter adds to the model suite
This chapter contributes examples/platform/ml/deviation_triage.py to the Book 5 example suite: a standalone, network-free module that demonstrates the two inspectable NLP tasks underneath every GMP copilot. It builds a small synthetic corpus of deviation narratives in the running example's vocabulary, then (1) trains a TF-IDF + logistic-regression triage classifier that routes a deviation into one of five categories, and (2) implements the RAG retriever step as cosine similarity over the same TF-IDF space, returning the most similar prior deviations with scores. It coordinates with — and deliberately does not duplicate — the QC-and-release OOS work (which predicts the numeric OOS) and the packaging anomaly module (serialization_anomaly.py, which reasons over structured events); this module is the text layer. The retrieval output is verbatim and genuinely useful (the temperature query surfaces the real day-7 excursion); the triage metrics are clearly labeled illustrative, and their modesty on a 34-row corpus is the intended lesson about small-data text classification — and about why the generative layer on top must stay advisory.
Why it matters
The paper trail is the part of biomanufacturing that scales worst and costs most in human time: deviations pile up, investigations drag, CMC documents take weeks, and every one is read and signed by a person. Generative AI is the first tool that can genuinely compress that work — drafting, retrieving, triaging — and the time savings reported across the wave, even discounting the self-reported headline numbers, are real enough to be transformative for quality operations. But the same fluency that makes it useful makes it dangerous: a confident, wrong investigation summary is worse than a slow, right one, and a model that decides rather than drafts is the failure mode the FDA has now sanctioned in writing. Getting this layer right means embracing the acceleration while holding the line that the regulators have drawn — the human reviews, the human signs, the model never makes the critical call. That discipline is not a brake on the technology; it is the only thing that lets a manufacturer deploy it at all.
In the real world
The deviation/CAPA and knowledge-management use is the most active corner of the entire GenAI wave, and it is mostly (pilot) edging toward production for non-critical documentation. The peer-reviewed anchor is the Merck & Co. study of GPT-4 and Claude-2 on manufacturing deviations — accurate extraction, an explicit hallucination caveat, and a human-in-the-loop hedge [1]. MSD's AWS-Bedrock deviation assistant is publicly described as exploratory RAG [8]; Microsoft Copilot is deployed in pharma quality systems for deviation drafting [11]; McKinsey reports a manufacturer synthesizing ~70% of deviations with first-draft CAPAs for >80% of cases (consultancy-reported) [7]. On the vendor side, Aizon (intelligent batch record, review-by-exception, pre-announced agentic capabilities), ValGenesis (validation and tech-transfer drafting), Veeva (Vault Quality AI agents on Bedrock, 2026 roadmap), Mareana (batch-release copilot), and the Siemens/Sanofi/Capgemini GenAI-in-MES collaboration are all selling into this space — every customer outcome vendor/self-reported [10][18][12][19][13]. Sanofi's ~8x-faster product-quality-review figure is self-reported and should be read as a target, not a verified result [9].
The governance frame is now concrete, not speculative. The ISPE GAMP AI Guide (July 2025) and its "seven control layers for LLMs" give a validation playbook; the FDA's risk-based credibility framework scales scrutiny to model influence; and the draft EU/PIC/S GMP Annex 22 (consultation July-October 2025) is the first manufacturing-specific AI rule, permitting only static, deterministic models for critical GMP and excluding generative/continuously-learning AI from critical use [16][17][5][6]. The annex is a draft — finalization is expected mid-2026 and the exclusion is provisional — but its direction is unambiguous. And the enforcement anchor is the Purolea cGMP warning letter (2 April 2026), the FDA's first to cite AI: a firm used AI agents to generate specifications, SOPs, and master production records without quality-unit review, the exact opposite of the human-in-the-loop discipline this chapter argues for [3][4][15]. Read together, the message from regulators, the peer-reviewed literature, and the one enforcement action so far is identical: generative AI drafts and retrieves; a qualified human decides and signs.
Key terms
- Large language model (LLM) — a neural model trained on vast text that reads and generates fluent natural language; the engine of the 2023-2026 GenAI wave.
- Generative AI — models that produce new content (text, here) rather than only classifying or scoring; excluded from critical GMP decisions under draft Annex 22.
- Retrieval-Augmented Generation (RAG) — pairing a retriever over the firm's own validated documents with a generator instructed to answer only from those documents, with citations; the architecture that makes LLMs usable in GMP.
- Retriever — the component that finds the most relevant passages in a corpus by vector/text similarity; shown here as TF-IDF + cosine similarity, an inspectable, validatable step.
- Hallucination — confident, fluent output that is factually wrong; the central risk of generative AI in a regulated setting, reduced (not eliminated) by RAG grounding and human review.
- Deviation / CAPA — a recorded departure from a procedure (deviation) and the corrective and preventive action that resolves it; the free-text records LLMs are most used to triage and draft.
- Triage (text classification) — routing a free-text deviation into a category and severity to prioritize it; an inspectable NLP task beneath the copilot.
- CMC drafting — using GenAI to draft chemistry, manufacturing, and controls regulatory content; a governed, draft-and-review use.
- Human-in-the-loop — the discipline (and now regulatory expectation) that a qualified person reviews and signs AI output; the model never makes the critical decision.
- Agentic AI — systems that plan and take multi-step actions with minimal human input; confined to non-critical, human-in-the-loop tasks in GMP.
- Draft Annex 22 — the draft EU/PIC/S GMP annex on AI that permits only static, deterministic models for critical GMP and excludes generative/continuously-learning AI from critical use.
- Purolea warning letter — the FDA's first AI-citing cGMP warning letter (2 April 2026), against a firm that used AI to generate GMP records without quality-unit review; the enforcement anchor.
Where this leads
The manufacturing spine is complete — every step from discovery to distribution learned, the whole-system layers (hybrid twins, MLOps, operations, and now language) mapped. Part VII turns from how ML works in biomanufacturing to who sells it and what is actually real. The next chapter, The Vendor Landscape: Who Sells What, and What Is Real, takes the names that have recurred throughout this book — Sartorius, AspenTech, Aizon, DataHow, Insilico/Yokogawa, Cytiva, Körber, ValGenesis, Veeva — and sorts their claims into production, pilot, and press release, so the buyer can tell the validated capability from the marketing.