Agentic AI for Connectivity: Tool-Using Agents, the Config They Generate, and When a Rule Engine Wins
📍 Where we are: Part VII · ML/AI in Industry Today — Chapter 29. The previous chapter, The Frontier, surveyed four headline capabilities and scored them all short of routine GMP use. This chapter takes the one with the most concrete near-term application — agentic AI — and examines it not as a way to decide, but as a way to connect: an agent that drafts the plumbing between a new instrument and the plant's data model. It is the corner where the agentic hype meets a real, unglamorous engineering problem, and where the honest answer is the most surprising.
The frontier chapter left agentic AI in a specific place: real, demonstrated, and confined by enforcement and draft regulation to non-critical, human-in-the-loop tasks. That confinement is usually read as a disappointment — the autonomous agent that closes a CAPA is forbidden, so what is left? The answer this chapter develops is that the most valuable near-term use of an agent is not deciding anything about a batch at all. It is connectivity: the slow, manual, error-prone work of wiring a new device into the plant — reading an instrument's raw address space, figuring out which register or node carries which signal, mapping each one into the plant's canonical tag model with the right units and quality, and writing the connector config that makes the data flow. That work is non-critical by nature (it produces configuration, not product-quality decisions), it is drowning in tedious detail (a single analyzer can advertise hundreds of nodes), and it is exactly the kind of read-understand-map-draft task large language models are good at. An agent that turns a two-day integration into a twenty-minute draft-and-review is a genuine, governable win — if you build the one thing that makes it trustworthy.
That one thing is the subject of this chapter, and stating it up front spoils the ending in the most useful way: an agentic connectivity tool is only as trustworthy as the deterministic verifier and the human gate wrapped around it. The agent's proposal is advisory; a Chain-of-Logic verifier decomposes every proposed mapping into checkable sub-conditions and scores it; anything less than perfect is escalated to a human; and the whole thing produces the traceable, audit-ready artifacts a later GMP validation will need. The model accelerates; the harness — not the model — is what a quality unit signs. This is the same discipline the generative-AI chapter drew for copilots, applied to the layer underneath every copilot: the wire itself.
Imagine hiring a brilliant temp who can read any instrument's manual in seconds and tell you "node ns=4;s=Analyzer.GLC is the glucose reading, in grams per litre — wire it to your BR101.Glucose.PV tag." Most of the time they are right and they save you a day of squinting at datasheets. But sometimes they confidently point at a node that does not exist, or quote the pressure in millibar when your plant works in millimetres of mercury, or hand you a value no real sensor could produce. You would never let that temp's notes go straight into the control system. You would have a checklist — does the node actually exist? do the units match? is the number physically possible? — and a qualified engineer signs off on anything the checklist does not pass clean. The agent is the temp. This chapter is about building the checklist and keeping the engineer's signature, because that is the entire difference between a useful accelerator and a silent corruptor of your plant's data.
What this chapter covers
- Why connectivity is the agent's best non-critical job — the integration bottleneck the NIIMBL Big-Data interoperability program names, and why "draft the connector" is a safe, high-value use that "decide the batch" is not.
- What "agentic" actually means — the plan-act loop, tool-use / function-calling, and the difference between an LLM that drafts text and an agent that reads an address space and proposes a mapping.
- The Chain-of-Logic verifier — decomposing each proposed mapping into independent, checkable sub-conditions and collapsing them into a weighted Correctness Rate, shown as runnable code in
examples/platform/ml/agentic_connectivity.py. - The human-in-the-loop gate — auto-accept only at a perfect Correctness Rate, escalate everything else as review-by-exception, and the guarantee the gate must make: no defective mapping is ever auto-accepted.
- The agentic-versus-deterministic benchmark — when an agent earns its risk (the long tail of unfamiliar devices) and when a hand-written rule engine wins outright (the stable, high-volume families).
- The sandbox and the future-state GxP path — protocol emulators and the cloud sandbox that keep a proof-of-concept hardware-free, the audit-ready artifacts the agent must emit, and what it would take to graduate a PoC into a validated system — bounded throughout by draft Annex 22 and the Purolea warning letter.
The bottleneck the agent is actually pointed at
Before the agent, name the problem honestly, because the size of the prize is the whole case for taking on the risk. A modern biomanufacturing suite is a fleet of instruments from dozens of vendors, each speaking its own dialect: a cell-culture analyzer over OPC UA, a feed pump over legacy Modbus, a pH transmitter over a 4-20 mA loop into a gateway, a Raman probe over a vendor SDK, a balance over RS-232. Book 3's connectivity and legacy-skids chapters walk that protocol zoo in detail; the point here is what it costs. Every one of those devices has to be mapped into the plant's shared model — the canonical tag, the unit, the datatype, the quality semantics, the normal operating range — before its data is usable, and that mapping is done by hand, by a scarce integration engineer reading a vendor's address-space dump against the plant's tag dictionary, one signal at a time. It is slow, it does not scale, and it is the explicit motivation behind the NIIMBL Big-Data program's interoperability work: most implemented solutions are proprietary point-to-point integrations, and the industry wants platform-agnostic, reusable connectivity instead [1].
This is the job an agent is genuinely suited to, for three reasons that map exactly onto the regulatory line. It is read-and-draft, the LLM's strength — the agent ingests the device's advertised nodes and the canonical model and proposes a correspondence, the same shape of task as drafting a deviation summary. It is non-critical — the output is a connector configuration, an engineering artifact reviewed and tested before it ever carries a GMP record, not a disposition of product quality. And it is verifiable — a proposed mapping makes concrete, checkable claims (this node exists; its unit is g/L; its value sits in range) that a deterministic checker can confirm or refute without trusting the model at all. Those three properties are why connectivity sits safely inside the boundary the frontier chapter drew, while autonomous batch decisions sit outside it. The agent is not being asked to decide; it is being asked to propose plumbing, and plumbing can be inspected.
What "agentic" means here: the plan-act loop and tool-use
An LLM that drafts text is a tool; an agent is an LLM placed in a loop where it can act — it decomposes a goal into steps, calls external tools to gather information or make changes, observes the result, and decides the next step [2]. The capability that turns a chat model into an agent is tool-use (also called function-calling): the model is given a set of callable functions with typed signatures, and instead of only emitting prose it can emit a structured call — read_address_space(device_id), lookup_canonical_tag(name), propose_mapping(target, source) — whose result is fed back into its context for the next step [3]. Standards are emerging to make those tool interfaces portable across models and systems — the Model Context Protocol is one such substrate, a published convention for exposing a system's tools and data to an LLM agent over a uniform interface, so the same agent can drive a historian, an OPC UA server, and a tag dictionary without bespoke glue for each [4].
For connectivity, the loop is concrete and short. The agent's goal is "produce a mapping for every canonical target this device should feed." It calls a tool to read the device's advertised address space (the raw nodes, their datatypes, their advertised units); it calls a tool to read the canonical model (the plant's tag dictionary, with expected datatypes, UCUM units, and normal ranges); it reasons over the two and emits, for each target, a proposed (target ← source) mapping with a stated rationale; and — in a system that crosses from draft into action — it would call a tool to write that connector config. The crucial design decision is where the loop is allowed to close. An agent that only reads and proposes is bounded; an agent that also writes the config into a running gateway is taking an action on infrastructure, and the further it is trusted to close that loop unattended, the larger the surface of unreviewed change it creates — which is precisely the failure mode the Purolea warning letter named when an agent generated GMP-controlling records with no human in the decision [5]. So the loop closes at propose, and a verifier and a human stand between the proposal and the wire.
A second design decision is multiplicity. The connectivity task naturally decomposes into roles — a reader that summarizes an unfamiliar address space, a mapper that proposes correspondences, a critic that checks them — and the multi-agent framing assigns each to a separate agent that hands off to the next. It reads well on an architecture slide, but it is worth a caution the rest of this chapter earns: more agents do not add trust, they add surface. Each agent is another generative component whose output must be verified, and a critic agent that is itself an LLM inherits the same fluent-and-wrong failure mode it is meant to catch — exactly the limitation the generative-AI chapter flagged for the LLM-as-judge. The trustworthy critic in this chapter is therefore not another agent; it is a deterministic verifier, which is the next section.
The Chain-of-Logic verifier: turning a fluent proposal into a checkable one
The single idea that makes an agentic mapping safe is to refuse to trust the proposal as a whole and instead decompose it into independent sub-conditions, each checked against ground truth rather than against the model's say-so. This is the Chain-of-Logic (CoL) pattern demonstrated for LLM-driven industrial control, where each model output is broken into verifiable logical sub-conditions — register correctness, value correctness, parameter correctness, write success — and scored with a weighted Correctness Rate before anything reaches the controller [6]. Applied to a connector mapping, the proposal "wire ns=4;s=Analyzer.GLC to BR101.Glucose.PV as a Double in g/L" makes five separable claims, and each is a boolean a deterministic checker can settle:
source_exists— does the device actually advertise the node the agent named, or did the model hallucinate a plausible-sounding one? Checked against the real address space the agent read, not against its memory. This is the highest-weighted check, because a hallucinated source is the error that most silently corrupts a record.datatype_ok— does the source's advertised datatype match what the canonical target expects (a Double where a Double is required, not a Float)?unit_present— did the agent carry an engineering unit at all, or drop it? A unitless number entering the plant model is the bare-number failure the ontology book refuses.unit_correct— is the unit the right one?mbarwhere the canonical target expectsmmHgis a number that will read 1.3× wrong forever.value_in_range— does a sample value sit within the target's declared normal operating range, or is it physically impossible (a dissolved-oxygen reading of 1500%)?
Each check carries a weight reflecting how much damage its failure does, and they sum to a Correctness Rate from 0 to 1. The weighting is a deliberate governance choice: source identity and unit correctness — the silent corruptors — weigh most, so a proposal can be fluent and well-formatted and still score low if it fails the checks that matter. The verifier is ordinary, inspectable code; there is no model inside it, which is exactly why it is the part a quality unit can validate. The agent is the part you cannot fully validate (it is generative and probabilistic); the CoL verifier is the part you can, and the architecture works by constraining the former with the latter — the same move the generative-AI chapter made when it gated an LLM draft behind a classical, validatable retriever.
The architecture in one view: the LLM agent reads a device's address space and the canonical model and proposes mappings (advisory, indigo); a deterministic Chain-of-Logic verifier decomposes each into five checkable sub-conditions and scores a weighted Correctness Rate (cyan); a human-in-the-loop gate auto-accepts only a perfect score and escalates the rest as review-by-exception (amber); and a hard regulatory boundary keeps the generative proposal from ever auto-dispositioning a GMP record. The model accelerates the left half; the harness governs the right.
Original diagram by the authors, created with AI assistance.
A runnable harness: agentic_connectivity.py
The example module examples/platform/ml/agentic_connectivity.py builds this architecture transparently, with no network, no model download, and a deliberately stubbed agent — because the chapter's argument is precisely that the LLM is not the artifact, the harness is. The scenario is the running example's plant: three devices advertise their address spaces — a Mettler-Toledo-style pH probe and a Watson-Marlow feed pump over Modbus (two known families an engineer has integrated before) plus a brand-new NovaFlex-9 cell-culture analyzer (a novel device no rule has been written for) — and the agent must map their raw nodes into seven canonical targets (BR101.pH.PV, BR101.Glucose.PV, and so on). The stub returns proposals that carry the exact five failure modes a real connectivity agent exhibits — a hallucinated node, a datatype mismatch, a missing unit, a wrong unit, and a physically impossible value — alongside two correct ones, so the harness has something real to catch.
The verifier is the heart of the module, and it is a dozen lines because it has no model in it — just the five checks against the device's real advertised nodes and the canonical spec:
# examples/platform/ml/agentic_connectivity.py (excerpt)
WEIGHTS = {"source_exists": 0.30, "datatype_ok": 0.15, "unit_present": 0.15,
"unit_correct": 0.25, "value_in_range": 0.15}
def verify(m: Mapping) -> Mapping:
spec = CANONICAL[m.target]
adv = _advertised() # the device's REAL address space
node = adv.get(m.source_node) # None => the agent hallucinated it
checks = {
"source_exists": node is not None,
"datatype_ok": node is not None and m.source_datatype == spec.datatype,
"unit_present": m.source_unit != "" or spec.unit == "",
"unit_correct": m.source_unit == spec.unit,
"value_in_range": spec.lo <= m.example_value <= spec.hi,
}
m.col = checks
m.correctness = round(sum(WEIGHTS[k] for k, ok in checks.items() if ok), 3)
return m
The gate is one line — auto-accept only at a perfect Correctness Rate, escalate everything else — and the benchmark pits the agent (after the gate) against a deterministic_mapper(): a hand-written lookup that maps the families it was coded for perfectly and the novel device not at all. Running python agentic_connectivity.py prints the following, verbatim:
agentic connectivity: the Chain-of-Logic verifier + HITL gate (the LLM is stubbed)
canonical targets: 7 devices: 3 (2 known + 1 novel) advertised nodes: 7
-- CHAIN-OF-LOGIC VERIFICATION (each proposal decomposed, checked, weighted) --
BR101.pH.PV CR=1.0 [AUTO-ACCEPT] all sub-conditions pass
BR101.Temp.PV CR=0.55 [ ESCALATE ] failed: source_exists,datatype_ok
BR101.Glucose.PV CR=1.0 [AUTO-ACCEPT] all sub-conditions pass
BR101.Lactate.PV CR=0.6 [ ESCALATE ] failed: unit_present,unit_correct
BR101.Pressure.PV CR=0.75 [ ESCALATE ] failed: unit_correct
BR101.FeedRate.SP CR=0.85 [ ESCALATE ] failed: datatype_ok
BR101.DO.PV CR=0.85 [ ESCALATE ] failed: value_in_range
-- HITL GATE -- auto-accepted: 2/7 escalated to human (review-by-exception): 5/7
defective mappings auto-accepted: 0 (must be 0)
-- ONE TRACEABLE ARTIFACT (the record a reviewer signs) --
target : BR101.Pressure.PV
source_node : ns=4;s=Analyzer.Press
unit : mbar
correctness_rate: 0.75
disposition : ESCALATE-TO-HUMAN
model_version : agent-v0.3
prompt_hash : 6d06e8a4d80b
col_checks : {'source_exists': True, 'datatype_ok': True, 'unit_present': True, 'unit_correct': False, 'value_in_range': True}
-- AGENTIC vs DETERMINISTIC (zero-touch correct maps over a mixed fleet) --
fleet points (2 known devices + 1 NOVEL): 7
deterministic mapper, zero-touch correct : 2 (perfect on coded families, 0 on the novel device)
agent + CoL gate, zero-touch correct : 2 (reaches the novel device; defects caught, not shipped)
NOTE: the LLM proposal is ADVISORY. The CoL verifier + HITL gate are what a
quality unit validates and signs; under draft Annex 22 the generative proposal
never auto-dispositions a GMP record. A rule engine wins on the stable, high-
volume families; the agent earns its risk only on the long tail -- behind the gate.
ASSERT ok: no defective mapping auto-accepted; the harness, not the model, is the control.
Read the verification block first, because it is the chapter's argument as data. Two proposals score a perfect 1.0 and auto-accept — the clean pH mapping on a known device and, more interestingly, a clean glucose mapping on the novel analyzer the rule engine cannot touch. The other five each fail exactly the sub-condition that names their defect: the hallucinated JacketTemp node fails source_exists; the lactate mapping that dropped its unit fails unit_present; the mbar-where-mmHg-was-meant pressure fails unit_correct; the Float-where-Double feed-rate fails datatype_ok; and the impossible 1500% dissolved-oxygen value fails value_in_range. Crucially, none of the five is auto-accepted — the defective mappings auto-accepted: 0 line is the validatable guarantee the module asserts, and it is what a quality unit actually signs off on: not "the agent is correct," which is unprovable, but "the gate never lets an unverified mapping through," which is.
When the agent earns its risk, and when a rule engine wins
The benchmark is the part that resists the marketing, and it is worth reading carefully because the honest result is not "the agent wins." Over the mixed fleet, the deterministic mapper scores two zero-touch-correct mappings and the agent also scores two — a tie on raw count. But the composition is the lesson. The deterministic mapper's two wins are both on the known families it was hand-coded for, where it is perfect, needs no verifier, and needs no human; it scores zero on the novel NovaFlex-9 because no one wrote a rule for it. The agent's two wins include the glucose node on that novel device — the one the rule engine could not reach at all — made safe by the gate that escalated its five defective siblings rather than shipping them.
That is the whole decision rule for agentic connectivity, and it is unglamorous: a hand-written rule engine wins outright on the stable, high-volume device families — the handful of vendors and models that make up most of a plant's instruments, where the integration is written once, deterministic, and cheap to maintain, and where a probabilistic proposal that still needs a verifier and a human is strictly worse. The agent earns its risk only on the long tail — the novel, one-off, or rarely-seen devices where no rule exists and writing one by hand is not worth it, and where the agent's ability to read an unfamiliar address space and propose a starting point is genuine value. Even there, its proposals are mostly drafts a human finishes, not zero-touch wins, as the 2-of-7 auto-accept rate shows. The mature architecture is therefore a hybrid: deterministic mappers for the families you know, an agent for the tail you do not, and the same CoL-verifier-plus-human gate over both, so the output is uniform regardless of which path produced it. An organization that buys an "agentic connectivity platform" expecting it to replace its integration rules has misread the benchmark; one that uses it to extend reach into the long tail, behind a gate, has read it right.
The traceable artifact: what a later validation will need
Every accepted or escalated mapping emits a structured record, and that record is not an afterthought — it is the deliverable that lets a proof-of-concept ever become a validated system. The module's printed artifact shows the shape: the proposed config (target, source node, datatype, unit), the agent's stated rationale (advisory), the full col_checks dictionary with each sub-condition's pass/fail, the Correctness Rate, the disposition (AUTO-ACCEPT or ESCALATE-TO-HUMAN), and — the fields that make it auditable forever — the model_version and prompt_hash that produced the proposal. This is the same provenance discipline the deviation record carried: pin which model with which prompt generated the draft, so the artifact is reproducible and the system can be re-validated on change instead of drifting silently. The NIIMBL interoperability program names exactly these as required outputs — traceable connector configurations, mapping rationale, and audit and change logs suitable for downstream validation workflows [1] — and they are the bridge from the next section's sandbox to a future GMP deployment.
Mapped onto the layers the rest of the series builds, the artifact has a natural home. The accepted mapping is a row in the plant's canonical model — the Unified Namespace tag dictionary Book 3 designs — and lifted into the knowledge graph Book 4 builds, the same mapping becomes typed triples whose units are QUDT/UCUM IRIs and whose completeness a SHACL shape can gate before the mapping is trusted — the admission-gate move the generative chapter already named. The agent, in other words, is not inventing a new artifact; it is drafting a faster first pass at an artifact the series already specifies, and the verifier checks the draft against that specification.
One connector-mapping proposal, fully unpacked: the agent-drafted config and rationale (advisory), the Chain-of-Logic checklist that caught the
mbar-versus-mmHg unit error and scored it 0.75, the escalation disposition (human-must-review), and the deterministic governance core — the pinned model version and prompt hash, the reviewer, the e-signature, and the audit-trail entry. The agent fills the top; the verifier scores the middle; the human signs the bottom — and the line between advisory and GxP-controlled is exactly where the field tags switch.
Original diagram by the authors, created with AI assistance.
The sandbox: proving it without touching the plant
A proof-of-concept for agentic connectivity must never run against a production system — both because an agent writing configs into a live GMP gateway is exactly the unreviewed-action risk the regulators target, and because you cannot responsibly let an experimental, probabilistic system near instruments that make a medicine. The discipline that resolves this is the cloud sandbox: the entire PoC runs in an isolated environment with no connection to any production, GxP-regulated, or proprietary system, and the devices are not real but simulated — protocol emulators and virtual endpoints standing in for physical hardware [1]. This is not a compromise forced by caution; it is established Industry 4.0 practice. Virtual commissioning — validating control software, communication protocols, and integration logic against simulated devices before physical deployment — is a mature discipline, and LLM-driven control work has explicitly adopted it, validating an LLM-plus-PLC system inside a process emulator and a Modbus simulator precisely to demonstrate safe, hardware-free testing of AI-driven logic before any real device is touched [6][7].
The mechanics are within reach of the open-source stack this series favors: a simulated OPC UA server (built with the asyncua library Book 3 uses) advertises a device's address space for the agent to read; a Modbus slave emulator stands in for the legacy pump; and the agent's proposed configs are exercised against those emulated endpoints, so a hallucinated node simply fails to resolve against the simulator rather than corrupting a real tag [8][9]. Book 3 devotes a whole chapter to building exactly this hardware-free testbed; the relevant point for the agent is that the sandbox is what lets you generate the benchmark numbers above honestly — every defective mapping the gate catches is caught against a simulated device, where the cost of a miss is a failed test, not a misreported batch.
The future-state path: from a sandbox PoC to a validated system
The honest framing of an agentic connectivity PoC is that it is not a validated system and does not pretend to be — it is a demonstration of feasibility, benefits, and limits, explicitly outside the GMP envelope [1]. The valuable question it must answer is the future-state one: what would it take to graduate this into something a plant could validate and run? The answer follows the series' validation spine, and it leans on the artifacts the harness already produces. Under the CSA risk-based posture and the GAMP 5 lifecycle, the path has a recognizable shape. The deterministic verifier and the human gate are the validated core — they are static, deterministic software (a GAMP category that can be validated conventionally), so the validation effort concentrates there, on the harness, not on the unvalidatable generative model. The agent is a controlled component, pinned to a model version under a predetermined change-control plan, monitored, and never permitted to auto-disposition — the same locked-and-governed posture the regulatory chapter requires of any model touching GMP. The audit-ready artifacts — the config, the CoL results, the rationale, the model version and prompt hash, the reviewer signature — are the ALCOA+ records a Part 11 system needs, designed in from the PoC rather than retrofitted. And the boundary holds throughout: draft Annex 22 permits only static, deterministic models for critical use and excludes generative AI from it, so the agent stays on the non-critical, human-in-the-loop side of the line by construction, and the Purolea warning letter stands as the worked example of what happens when it does not [5][10]. The PoC's real deliverable, then, is not the agent — it is a validated harness with a clearly-scoped, governable place for an agent inside it.
The unsolved part: the agent's reach is exactly where verification is hardest
The deepest tension in agentic connectivity is that the agent is most valuable precisely where it is hardest to verify. On a known device family, a CoL verifier has a rich canonical spec to check against — expected datatypes, units, ranges, even expected node names — so a defective proposal fails a concrete check. But the agent's whole reason to exist is the novel device, the one with no prior integration and often a sparse or idiosyncratic spec, and there the verifier has less to check against: if the canonical model does not yet declare a target's expected unit, the unit_correct check cannot fire, and the gate's guarantee weakens from "catches every defect" to "catches every defect we had a rule to check." The verifier is only as strong as the canonical model behind it, and the long tail is exactly where that model is thinnest. This is not a flaw to be engineered away with a bigger model; it is the same structural point the frontier chapter and the ontology book keep arriving at — the value of a machine proposal is bounded by the quality of the typed, unit-bearing, governed model it is checked against. The practical consequence is sober: agentic connectivity does not eliminate the integration engineer, it relocates them — from hand-mapping every device to curating the canonical model and reviewing the tail the agent drafts. The work that remains is the work that was always load-bearing, and the agent makes its absence more visible, not less.
What this chapter adds to the model suite
This chapter contributes examples/platform/ml/agentic_connectivity.py to the Book 5 example suite: a standalone, network-free module that builds the agentic-connectivity architecture with the LLM deliberately stubbed, because the verifier and the gate — not the model — are the artifact. It models a mixed fleet of three devices (two known families plus a novel analyzer), a stubbed agent that proposes mappings carrying the five canonical failure modes (hallucinated node, datatype mismatch, missing unit, wrong unit, impossible value), a Chain-of-Logic verifier that decomposes each proposal into five checked sub-conditions and scores a weighted Correctness Rate, a human-in-the-loop gate that auto-accepts only a perfect score, and an agentic-versus-deterministic benchmark over the fleet. It coordinates with — and does not duplicate — integration_opcua.py (which writes a prediction through the OPC UA / historian / MES contract this module's mappings would feed) and the frontier_scorecard.py survey (which scores the agentic capability this module shows in operation). Its assertion is the chapter's guarantee made executable: no defective mapping is ever auto-accepted, so the script's exit status — not a human reading the output — certifies that the harness, not the model, is the control. Like the rest of the suite it is stdlib-only and deterministic, so the audit trail is byte-for-byte reproducible and the run_all.py credibility ledger re-checks it on every change.
Why it matters
Connectivity is the unglamorous tax on every digital ambition in biomanufacturing: the digital twin, the soft sensor, the release model, the data-pooling frontiers all assume the data is already flowing, and getting it to flow is a manual, vendor-by-vendor slog that scales with neither the instrument count nor the engineer's patience. An agent that drafts the plumbing is the rare AI application that is simultaneously high-value, genuinely suited to the technology, and safely inside the regulatory line — because it proposes configuration rather than deciding quality, and because its proposals are concretely verifiable in a way a batch disposition is not. But the same fluency that lets it read an unfamiliar analyzer also lets it confidently invent a node or quote the wrong unit, and a connector that is silently wrong corrupts every record that flows through it for the life of the line. Getting this right means inverting the instinct the marketing sells: the value is not in trusting a smarter agent, it is in building a verifier and a gate good enough that the agent never has to be trusted at all. That is not a limitation on the technology — it is the only architecture that lets a plant deploy it, and it is the same lesson, in a new domain, that this entire book has been earning: the machine accelerates, the human signs, and the harness in between is what makes the acceleration safe.
In the real world
Agentic AI for connectivity is, in 2026, a (pilot/PoC) capability with vendor-self-reported and early-research evidence — exactly the tier the frontier scorecard assigns the broader agentic category. The concrete anchors are real but bounded: peer-reviewed work demonstrates an LLM integrated with a rule-based PLC controller over Modbus TCP, with a Chain-of-Logic validation framework and a weighted Correctness Rate gating every output, validated entirely inside a process emulator and a Modbus simulator before any hardware — the clearest published template for the verify-then-act discipline this chapter builds on [6]. Virtual commissioning is mature industrial practice, not a novelty, which is what makes the hardware-free sandbox credible rather than a dodge [7]. Open-source OPC UA implementations and the asyncua library make the simulated-endpoint testbed buildable today [8][9], and the NIIMBL Big-Data interoperability program is actively soliciting exactly this — a sandbox PoC demonstrating whether an LLM-enabled agent can accelerate and standardize device connectivity, with traceable artifacts and a future-state GxP guidance report as named deliverables [1]. What does not yet exist is a validated, in-production agentic connectivity system on a GMP line; the governance frame that would bound one is, however, already concrete — the ISPE GAMP AI Guide's control layers, draft Annex 22's static-and-deterministic-only rule for critical use, and the Purolea letter's enforcement of human review — so the honest reading is that the engineering is demonstrable in a sandbox today and the validated deployment is a governed build still ahead [5][10][11].
Key terms
- Agentic AI — an LLM placed in a loop where it plans, calls tools, observes results, and acts toward a goal, rather than only drafting text; here pointed at connectivity, where it proposes device-to-plant mappings and a verifier and human gate stand between the proposal and the wire.
- Tool-use / function-calling — the capability that turns a chat model into an agent: the model is given typed callable functions (read an address space, look up a canonical tag) and emits structured calls whose results feed its next step.
- Model Context Protocol (MCP) — a published convention for exposing a system's tools and data to an LLM agent over a uniform interface, so one agent can drive a historian, an OPC UA server, and a tag dictionary without bespoke glue per system.
- Connector mapping — the correspondence between a device's raw source node or register and a canonical plant tag, carrying datatype, unit, and quality; the artifact the agent drafts and the verifier checks.
- Canonical model / tag dictionary — the plant's shared definition of each signal (canonical tag, expected datatype, UCUM unit, normal operating range); the ground truth the verifier checks a proposal against, and the Unified Namespace Book 3 designs.
- Chain-of-Logic (CoL) verification — decomposing an LLM's output into independent, checkable sub-conditions (source exists, datatype matches, unit present and correct, value in range), each checked against ground truth, not the model's say-so.
- Correctness Rate — the weighted sum of passed CoL sub-conditions, 0 to 1, with the silent-corruptor checks (source identity, unit correctness) weighted most; the single auditable number per mapping.
- Human-in-the-loop (HITL) gate — the rule that auto-accepts a mapping only at Correctness Rate 1.0 and escalates everything else as a review-by-exception event; its guarantee is that no defective mapping is ever auto-accepted.
- Agentic-versus-deterministic benchmark — the comparison that shows a hand-written rule engine wins on stable, high-volume device families and the agent earns its risk only on the novel long tail; the mature architecture is a hybrid of both behind one gate.
- Virtual commissioning — validating control software, communication protocols, and integration logic against simulated devices before physical deployment; mature Industry 4.0 practice and the basis of the hardware-free sandbox.
- Cloud sandbox / protocol emulator — an isolated environment with no production or GxP connection, where simulated OPC UA servers and Modbus slaves stand in for real hardware, so an agent's defective proposal fails a test rather than corrupting a record.
- Audit-ready artifact — the traceable record each mapping emits (config, rationale, CoL results, Correctness Rate, disposition, pinned model version and prompt hash, reviewer signature); the ALCOA+ evidence a future validation needs, designed in from the PoC.
- Future-state GxP guidance — the framing of a PoC as a feasibility demonstration outside the GMP envelope, paired with a documented path to a validated system: validate the deterministic harness, control the agent under a change-control plan, and keep it non-critical and human-in-the-loop.
- Draft Annex 22 / Purolea warning letter — the regulatory boundary: Annex 22 permits only static, deterministic models for critical GMP and excludes generative AI from it; the Purolea letter is the FDA's first enforcement against an agent generating GMP records without human review.
Where this leads
The frontier is mapped, its one near-term application is examined, and the verdict the whole book has been earning is finally due. The closing chapter, The Honest Verdict: Where ML/AI in Biomanufacturing Really Stands, totals the ledger across all twenty-nine chapters — what is genuinely production-grade today, what is pilot, what is hype, and what a clear-eyed team should build, buy, and ignore — with the discipline of this chapter as one more data point: the most valuable agentic AI in 2026 is the one that knows it is plumbing, not a decision-maker, and is verified accordingly.