Skip to main content

The Vendor Landscape: Who Sells What, and What Is Real

📍 Where we are: Part VII · ML/AI in Industry Today — Chapter 25. The previous chapter weighed generative AI and LLMs and found copilots useful and agents oversold. This chapter zooms all the way out to the market that sells every technique in this book, and asks the only question a buyer should ask: of all this, what is actually real?

Twenty-four chapters have walked the bioprocess spine through the learning lens, naming algorithms, datasets, and a few named deployments along the way. But a process scientist does not buy an algorithm; they buy a product — a license, a support contract, a validation package, a vendor on the other end of an audit. This chapter is the buyer's map. It lists who sells what across the biomanufacturing software market, sorts each offering by the same honest maturity and evidence tiers this book has used throughout, and draws the single most important line on the whole map: the line between what is demonstrated in routine GMP production and what is agentic marketing running ahead of the demo.

The market is loud right now. Every vendor has an "AI ecosystem," every press release has a percentage, and "agentic" has become the word "cloud" was a decade ago. Underneath the noise, the production-grade reality is narrower and older than the marketing: multivariate monitoring, spectroscopic soft sensing, computer-vision inspection, and human-in-the-loop documentation — most of it built on math that predates the deep-learning era. The corrections this book has repeated matter most here, where they are easiest to get wrong on a slide.

The simple version

Buying biomanufacturing AI is like buying a self-driving car in 2026. The brochure shows a car that drives itself anywhere; the fine print says "driver-assist features, hands on the wheel." The honest buyer learns to read past the headline to the operational design domain — where, exactly, does this thing actually work unattended, and where is a human still steering? This chapter teaches that reading for the bioprocess software market: which products genuinely run in a GMP plant making release decisions, and which are a very good demo with a roadmap attached.

What this chapter covers

  • The four evidence tiers and three maturity markers, applied to a market instead of a paper
  • The MVDA/PAT incumbents (Sartorius, AspenTech/Emerson) — the genuinely production-grade core
  • The mechanistic and hybrid specialists (Cytiva GoSilico, DataHow, Yokogawa/Insilico) — and the two attributions everyone gets wrong
  • The GxP-AI SaaS and data-layer players (Aizon, TetraScience, Ganymede)
  • The automation and MES incumbents (Korber, Siemens, Rockwell, Emerson, AVEVA) onto which ML is layered, not native
  • The consumables-and-intelligence houses (MilliporeSigma/Merck, Thermo Fisher), the CDS/QC tools (Waters), and the validation/quality stack (ValGenesis, Veeva)
  • The consolidation map — who bought whom — and why it changes the attribution on a slide
  • How "agentic" marketing outran demonstrated production, and what the Purolea warning letter did to that gap
  • The open-source example suite as the reproducible counterpoint to every proprietary box

How to read a vendor claim: tiers, maturity, and the headline number

Every claim in this chapter carries two labels, the same two the rest of the book uses. The evidence tier says how the claim was established: peer-reviewed-independent (a journal, authors with no commercial stake), peer-reviewed-self-authored (a journal, but the vendor or customer is a co-author), vendor-self-reported (a product page, white paper, or conference slide), or press-release-only (an announcement with no methods behind it). The maturity marker says how far the thing actually got: (production) deployed in GMP or commercial manufacturing, (pilot) demonstrated at scale but not in routine release, (research) academic or early-stage.

The two labels are independent, and the gap between them is where buyers lose money. A vendor can have a genuinely (production) product whose flagship number is vendor-self-reported — the deployment is real, the percentage is marketing. The discipline this chapter enforces is to never let a maturity marker borrow credibility from an evidence tier, or vice versa. When a vendor page says a model "cut experiments by 80%," that is a vendor-self-reported figure regardless of how production-grade the platform is [1]. When a peer-reviewed paper co-authored by the vendor and its customer reports a 33% accuracy gain, that is peer-reviewed-self-authored — stronger than a product page, weaker than an independent replication, and the figure you should quote in preference to the vendor page's own larger number [2].

Evidence

There is, as of mid-2026, almost no peer-reviewed-independent evidence for any commercial bioprocess-AI efficiency headline. The strongest independent signal is structural, not numerical: the 7th ISPE Pharma 4.0 survey found AI/ML to be the technology with the most pilot projects and the fewest scaled implementations — it trails big-data analytics, advanced analytics, robotic process automation, GxP cloud, and IIoT, all of which are already more mature in pharma [3]. Read every vendor headline in this chapter against that backdrop: the market is selling the end-state of a journey most plants have not finished.

The MVDA and PAT incumbents: the production-grade core

If you want to know what biomanufacturing AI looks like when it actually ships, look at the oldest products in the market. Multivariate statistical process monitoring — PCA and PLS over batch trajectories, scored by Hotelling's T-squared and squared prediction error, diagnosed by contribution plots — is the one technique that is unambiguously (production) across the industry, and it has been for two decades. The book has already built its open-source core in Book 3's analytics chapter; the commercial incumbents are productized, validated wrappers around the same math.

Sartorius, through its Umetrics line, is the market standard: SIMCA for offline multivariate modeling and golden-batch fingerprinting, SIMCA-online for real-time process monitoring, and MODDE for design of experiments [4]. These are genuinely (production) tools running in commercial GMP plants for Continued Process Verification and fault detection. Sartorius's broader "Digital Twin AI Ecosystem" and "Biobrain" positioning is newer and vendor-self-reported — the MVDA core is proven; the AI-ecosystem branding is a layer of marketing on top of it. The cleanest production anchor in the whole market sits here: Amgen's Juncos site in Puerto Rico runs SIMCA OPLS harvest-titer and in-process models in commercial GMP, reporting the elimination of roughly six hours of harvest idle time and ten hours of column idle per batch — but that is a first-party, vendor-self-reported figure (Amgen engineers with Sartorius as a case-study sponsor), and the hour-savings are not externally verifiable [5].

The other production-grade MVDA house is AspenTech ProMV (formerly ProSensus/MacGregor), now inside Emerson after Emerson's roughly fifteen-billion-dollar move on AspenTech [6]. ProMV does the same fault-detection and contribution-plot diagnosis SIMCA does, and its "fallacy of the golden batch" critique is a useful corrective to the naive single-reference-trajectory idea. Emerson's own DeltaV PredictPro brings model-predictive and analytics features into the DCS layer. All of it is (production) for monitoring; none of it is autonomous control of a critical quality attribute.

The mechanistic and hybrid specialists: where attribution goes wrong

This is the part of the map where slides get the ownership wrong, and getting it right is the whole point of an honest landscape.

Cytiva (a Danaher company) sells GoSilico — the ChromX/DSPX mechanistic chromatography modeling suite it acquired in 2021 [7]. GoSilico is (production) in CMC downstream development, and it is the single most important attribution correction in the chapter: GoSilico is mechanistic, not machine learning. It solves the physics of chromatography — transport-dispersive and steric-mass-action equilibrium equations — to predict elution behavior from a few calibration runs. It is the hybrid-models chapter's "white box," not a learned model, and shelving it under "AI" on a vendor slide is a category error this book refuses to make. The time-savings GoSilico claims for in-silico process development are real in kind but vendor-self-reported in magnitude.

DataHow is the pure-play hybrid-modeling specialist, and the second attribution everyone gets wrong: DataHow is an independent company. It is an ETH Zurich spin-off, with a Series A led by Momenta and including Rockwell Automation and Zurich Kantonal Bank, and an Eppendorf collaboration announced in late 2024 — it is not owned by Sartorius, a confusion that recurs because both sell bioprocess analytics [1]. Its DataHowLab and SpectraHow products do hybrid (mechanistic-plus-data) modeling and transfer learning, claiming 30 to 60 percent — up to 80 percent — fewer experiments; every one of those figures is vendor-self-reported [1]. The honest strong anchor for DataHow is not the product page but the peer-reviewed-self-authored companion to its flagship Bristol Myers Squibb case: a Biotechnology Journal (2024) paper, co-authored by DataHow and BMS, on 48 experiments at 5 L with 12 critical process parameters and 18 critical quality attributes, headlining roughly 33% better prediction accuracy with about half the data versus a black-box model [2]. The vendor's own page cites the larger "22% / 3x" framing; prefer the peer-reviewed numbers, and note the maturity is process-development (pilot), not GMP production.

Yokogawa owns Insilico Biotechnology — the third attribution correction. Yokogawa acquired Insilico in November 2021; it was not Cytiva [8]. Insilico's approach is a genome-scale metabolic model coupled to an artificial neural network — a hybrid digital twin for soft-sensing and model-predictive control — and it sits at (production/pilot). (Full genome-scale metabolic models are rarely used directly inside a real-time twin; the deployed form is a reduced hybrid.) The two consolidation facts to keep straight, because they flip the company name on a slide: GoSilico went to Cytiva, Insilico went to Yokogawa — similar names, different acquirers, different model classes (mechanistic versus hybrid).

The GxP-AI SaaS and data-layer players

A newer cohort sells AI as a regulated service, built for GxP from the start rather than retrofitted.

Aizon is the clearest example: a (production) GxP AI SaaS (Execute, Unify, Predict, plus an "Agentic Studio") whose flagship is a multi-site Grifols deployment. Aizon is also the rare vendor with a genuine peer-reviewed-self-authored anchor that is about the validation question itself — its study in the PDA Journal of Pharmaceutical Science and Technology on qualifying AI algorithms for regulated manufacturing is the notable peer-reviewed exception in a market of product pages [9]. Its customer-outcome numbers (the generic "around 30% deviation reduction") are vendor-self-reported marketing, and its "Agentic AI" was pre-announced for early 2026 — announced, not demonstrated, which is exactly the gap this chapter is about.

TetraScience (Tetra OS / Tetra AI) and Ganymede (Lab-as-Code, since absorbed into Apprentice.io) sell the data layer that AI needs rather than the models themselves — the "AI-ready" replatforming of siloed lab and process data into FAIR, queryable form. This matters because, as the data-the-fuel chapter argued, data readiness is the field's number-one barrier, not algorithms. TetraScience is (production) with vendor-self-reported deployment counts (claims of a dozen of the top-25 pharma; Takeda, Bayer, and others named) and outcome figures (QC turnaround "weeks to days"); Ganymede is (pilot), pre-general-availability. These products are real infrastructure; their headline numbers are marketing.

The automation and MES incumbents: ML layered on, not native

The biggest installed base in the plant belongs to companies that have been there for decades, and their AI story is almost always a layer added to an existing control or execution backbone — which is both its strength (it is where the data and the GMP discipline already live) and its limit (the ML is rarely the product's core).

Korber (Werum PAS-X MES) is the canonical case: a review-by-exception execution layer that is unambiguously (production) in commercial biomanufacturing, onto which ML and "Agentic AI" (K.AI, B.R.A.I.N.) are now being layered. PAS-X's "up to 98% right first time" and review-by-exception language is an exact quote of the vendor page — and it is vendor-self-reported, with "up to 98%" a best-case ceiling, not a typical result; the install-base figures (1000-plus installations, a large share of top-20 biotech) are real Korber claims but appear on different pages and are easy to conflate [10]. The honest reading is that PAS-X is production-grade as an MES; the ML on top is early.

Siemens (PCS 7/neo and SIPAT for PAT, Opcenter for MES, gPROMS for modeling), Rockwell (FactoryTalk PharmaSuite), Emerson (DeltaV with Syncade, plus AspenTech analytics), and AVEVA (PI data infrastructure and predictive analytics) round out the automation incumbents. All are (production) as control, historian, and execution platforms — AVEVA's PI System is the very historian the data-management book builds its data shadow on — and all have a thinner, more (pilot) or aspirational ML story bolted on. Rockwell's ML in PharmaSuite is the thinnest of the group; Siemens and Emerson have the most credible analytics layers. The pattern holds: the backbone is real and validated; the intelligence on top is being layered in.

Consumables, CDS, and the validation stack

Three more clusters complete the map.

The consumables-and-intelligence houses sell manufacturing intelligence alongside their hardware. MilliporeSigma/Merck KGaA offers BioContinuum and Bio4C ProcessPad — manufacturing-intelligence and CPV platforms that are (production) and built mostly on multivariate and statistical methods, with limited genuine deep-learning content under the branding. Thermo Fisher is more ecosystem than product: OSDPredict plus a web of OpenAI, NVIDIA, and TetraScience partnerships, mostly at the (pilot) / announcement stage.

The chromatography-data-system and QC corner belongs to Waters Empower, the dominant CDS, which has added ML-flavored anomaly detection on top of its deterministic ApexTrack peak integration — (production) as a CDS, with the ML as an additive feature rather than a reinvention.

The validation and regulated-content stack is where AI meets the paperwork. ValGenesis (VLMS validation lifecycle management, plus a "Smart GxP" / VAL AI platform) is (production) for digital validation, with its "80% faster" figures vendor-self-reported [11]. Veeva (Vault Quality/QMS, with quality AI agents on a 2026 roadmap) is (production) for regulated content management and (pilot) for its AI agents. This stack is where the most consequential near-term AI actually lands — drafting, review, and change control of GxP documents — and also where the regulator drew its first hard line, as the next section describes.

Hero diagram: a two-axis map of the biomanufacturing ML/AI vendor landscape. The horizontal axis runs from research on the left through pilot to production on the right; the vertical axis is the product category band, stacked from top to bottom: MVDA/PAT monitoring, mechanistic and hybrid modeling, GxP-AI SaaS and data layer, MES and automation backbone, and the validation and quality stack. Each vendor is a labeled pill placed by its honest maturity: Sartorius SIMCA, AspenTech ProMV and Emerson sit far right in the green production zone of the MVDA band; Cytiva GoSilico (marked mechanistic, not ML) and Yokogawa-owned Insilico sit production-to-pilot in the hybrid band, with independent DataHow placed at pilot and explicitly tagged not Sartorius; Aizon and TetraScience sit production with vendor-self-reported badges in the SaaS band, Ganymede at pilot; Korber PAS-X, Siemens, Rockwell, Emerson and AVEVA anchor the production end of the MES band with thin ML layers shown as dashed add-ons; MilliporeSigma Bio4C, Thermo Fisher, Waters Empower, ValGenesis and Veeva populate the lower bands. A vertical dashed line labeled the agentic frontier separates demonstrated production on the left from announced-but-not-demonstrated agentic marketing on the right, with most agentic offerings shown as hollow pills crossing to the right of the line. The vendor map by honest maturity, not marketing: a production-grade core of multivariate monitoring, mechanistic and hybrid modeling, and computer vision, with the agentic-AI offerings drawn as hollow pills crossing the dashed frontier into territory that is announced but not yet demonstrated in routine GMP. Original diagram by the authors, created with AI assistance.

Agentic marketing versus demonstrated production

The defining tension of the 2026 market is the word agentic. Aizon's Agentic Studio, Korber's K.AI and B.R.A.I.N., Veeva's quality agents, Ganymede/Apprentice's scientific agents — nearly every vendor now sells an autonomous-AI story. The reality, consistent across the ISPE survey, the BioPhorum maturity model, and the regulatory record, is that demonstrated production AI clusters in four narrow places: multivariate monitoring, predictive maintenance, computer-vision inspection, and human-in-the-loop documentation. Autonomous control of a critical quality attribute is not on that list, and neither is an unsupervised agent generating GMP records [3].

Two anchors keep that distinction from being merely an opinion. First, the draft EU GMP Annex 22 (joint EU/PIC/S consultation, July to October 2025) — the first manufacturing-specific AI rule — draws the line in regulation: for critical GMP applications it permits only static, deterministic models and excludes dynamic, continuously-learning, probabilistic, and generative AI/LLM models from critical use, requiring locked models with a predetermined change-control plan and human oversight [12]. It is a draft, expected to finalize around mid-2026, and the exclusion is provisional — but it tells a buyer exactly which "agentic" claims cannot legally touch a critical decision today. Second, the Purolea cGMP warning letter (2 April 2026) — the FDA's first AI-citing warning letter — made the gap concrete enforcement: a firm used AI agents to generate specifications, SOPs, and master production records without quality-unit review, and the FDA cited it [13][14]. The lesson for reading a vendor slide is blunt: an "agentic" platform that promises to author GMP records autonomously is selling something the regulator has already named non-compliant. The agent that drafts and a human approves is a product; the agent that decides is a liability.

The honest synthesis of the FDA's posture is its 2023 discussion paper Artificial Intelligence in Drug Manufacturing under the CDER FRAME initiative, and its January 2025 draft framework for AI model credibility — a risk-based, seven-step "context of use" approach where required scrutiny scales with how much the model influences a decision and how serious the consequence is [15][16]. A buyer can use that same risk lens on a vendor: the more a product touches a critical decision, the more it must show locked models, validation evidence, and a human in the loop — and the less a press release should be allowed to substitute for any of it.

# examples/platform/ml/run_suite.py (illustrative wrapper over the Book-5 suite)
# The open-source counterpoint: every capability the vendors sell, runnable from
# the committed datasets, with no license and no service. Each row maps a market
# category to the suite module that implements its core method in the open.
from importlib import import_module

CAPABILITY_MAP = {
# market category (what the vendors sell) -> open suite module + method
"MVDA / golden-batch monitoring (SIMCA, ProMV)": ("mspc", "PCA + T2/SPE, contribution plots"),
"Raman soft sensing (SIMCA, BioPAT, Insilico)": ("soft_sensor_pls", "PLS chemometrics, batch-split honest"),
"deep soft sensing (research-tier)": ("soft_sensor_deep","1D-CNN, beaten by PLS in small data"),
"hybrid digital twin (DataHow, Insilico)": ("hybrid_model", "Monod kinetics + NN residual"),
"computer-vision AVI (Stevanato, Syntegon)": ("vision_avi", "CNN reject classifier on fill events"),
"predictive maintenance (Korber, Siemens)": ("pdm", "anomaly score on equipment signals"),
"release / OOS prediction (Bio4C, iCPV)": ("release_predict", "classifier over the 6-batch release set"),
"drift / MLOps (the validation gap)": ("drift", "PSI + residual drift on the soft sensor"),
}

def survey():
print(f"{'market category':48s} module open method")
print("-" * 96)
for category, (module, method) in CAPABILITY_MAP.items():
import_module(module) # all 20 modules import + run standalone
print(f"{category:48s} {module:18s} {method}")
print(f"\n{len(CAPABILITY_MAP)} vendor categories matched to open modules; "
f"20 runnable modules total, 0 licenses, 0 services.")

if __name__ == "__main__":
survey()
market category module open method
------------------------------------------------------------------------------------------------
MVDA / golden-batch monitoring (SIMCA, ProMV) mspc PCA + T2/SPE, contribution plots
Raman soft sensing (SIMCA, BioPAT, Insilico) soft_sensor_pls PLS chemometrics, batch-split honest
deep soft sensing (research-tier) soft_sensor_deep 1D-CNN, beaten by PLS in small data
hybrid digital twin (DataHow, Insilico) hybrid_model Monod kinetics + NN residual
computer-vision AVI (Stevanato, Syntegon) vision_avi CNN reject classifier on fill events
predictive maintenance (Korber, Siemens) pdm anomaly score on equipment signals
release / OOS prediction (Bio4C, iCPV) release_predict classifier over the 6-batch release set
drift / MLOps (the validation gap) drift PSI + residual drift on the soft sensor

8 vendor categories matched to open modules; 20 runnable modules total, 0 licenses, 0 services.

The point of the mapping is not that the open suite replaces a validated commercial platform — it emphatically does not, as the MLOps chapter and every governance section in this book insist. The point is that the methods the vendors sell are, in almost every case, open and reproducible: PCA, PLS, gradient-boosted trees, a small CNN, a Monod-plus-residual hybrid. What you buy from a vendor is not usually a secret algorithm; it is the validated wrapper — the IQ/OQ/PQ package, the audit trail, the change control, the accountable support contract, the locked model with its predetermined change-control plan — around methods you can otherwise run for free. Knowing that is how you tell a fair price from a marketing premium.

Anatomy of one vendor claim, fact-checked

The series signature is to unpack one record field by field. Here the record is a vendor claim — the unit of currency in this whole market — and the dissection is the fact-check that turns a slide bullet into something a quality unit can actually rely on.

Anatomy figure: a labeled identity card titled vendor claim, fact-checked, dissecting the DataHow Bristol Myers Squibb hybrid-model claim. Rows from top to bottom: claimant DataHow (with a correction flag reading independent ETH Zurich spin-off, NOT Sartorius-owned); the customer Bristol Myers Squibb; the headline figure with two competing values shown side by side, a red vendor-page value of 22 percent accuracy and 3x fewer experiments, and a green peer-reviewed value of 33 percent accuracy and about half the data, with an arrow indicating prefer the peer-reviewed figure; an evidence-tier row reading peer-reviewed-self-authored, Biotechnology Journal 2024, DataHow plus BMS co-authored; a maturity row reading pilot, process-development scale 5 L, 12 CPPs, 18 CQAs, NOT GMP production; a scope row noting prediction accuracy versus a black-box baseline, not closed-loop control; and a verdict footer reading real method, self-authored evidence, pilot maturity, quote the journal not the product page. One vendor claim, fully fact-checked: the claimant with its ownership corrected, the headline number shown in both its vendor-page and peer-reviewed forms, the evidence tier and maturity marker that bound how far the number can travel, and the verdict that turns a marketing figure into a defensible one. Original diagram by the authors, created with AI assistance.

Read the card top to bottom and the chapter's whole method is laid out as fields. The claimant carries an ownership correction, because half of misattributed claims start with the wrong company name — DataHow is independent; GoSilico is Cytiva; Insilico is Yokogawa. The headline figure is shown twice, in its vendor-page form and its peer-reviewed form, because the gap between them is the single most actionable thing on the card: prefer the journal's 33% and "about half the data" over the product page's 22% and "3x" [2]. The evidence tier and maturity marker bound how far the number is allowed to travel — peer-reviewed-self-authored and pilot mean "real, but at development scale and with the vendor as a co-author, not an independent replication at GMP scale." The scope row stops the most common over-read: a prediction-accuracy gain is not a yield gain and is not closed-loop control. The verdict footer is the deliverable: a sentence a quality unit can act on, with the number, its limits, and its source all attached. That is the difference between citing a market and being sold one.

The unsolved part: there is almost no independent evidence

The honest open problem of this whole chapter is that the market grades its own homework. Survey the evidence tiers across every vendor here and the pattern is stark: a thick band of vendor-self-reported product-page figures, a thin band of peer-reviewed-self-authored papers where the vendor or customer is a co-author, and — for commercial efficiency headlines — essentially no peer-reviewed-independent replication at all. The numbers that drive purchasing decisions (Korber's 98% right-first-time, DataHow's 80% fewer experiments, ValGenesis's 80% faster validation, Amgen's 95% auto-released, Aizon's 30% fewer deviations) are every one of them generated by a party with a stake in the result, on its own process, without an outside replication [1][5][10][11].

This is not (mostly) dishonesty; it is structural. Biomanufacturing processes are confidential, expensive, and idiosyncratic, so a head-to-head independent benchmark — the kind that grounds claims in computer vision or NLP — is almost impossible to run on a real GMP line. The consequence is that a buyer cannot resolve "is this 30% real?" by reading the literature; they can only resolve "is this party credible, is this figure peer-reviewed even if self-authored, and does the maturity marker match the deployment I need?" The field's missing institution is an independent, neutral benchmark for bioprocess models — a shared, realistic, public dataset and an agreed scoring protocol against which any vendor's method could be measured by someone with no stake in the answer. Until that exists, the most rigorous thing a buyer can do is exactly the anatomy above: demand the peer-reviewed figure over the product page, match maturity to use, and treat every headline percentage as a hypothesis, not a fact.

What this chapter adds to the model suite

This is a survey chapter, so it contributes no new modeling method; instead it reframes the entire examples/platform/ml/ suite as the open-source counterpoint to the proprietary market it maps. The run_suite.py wrapper above is the chapter's artifact: a capability map that pins each vendor category — MVDA monitoring, Raman soft sensing, hybrid twins, vision AVI, predictive maintenance, release prediction, drift/MLOps — to the open module that implements its core method (mspc.py, soft_sensor_pls.py, soft_sensor_deep.py, hybrid_model.py, vision_avi.py, pdm.py, release_predict.py, drift.py). All twenty runnable modules read the same committed datasets the rest of the book uses (raman_spectra.parquet, hplc_results.csv with the real BATCH-2026-004 OOS, fill_events.csv, and the simulator-backed cohort), run standalone with no services, and demonstrate — module by module — that the methods behind the vendor catalog are open. What the suite deliberately does not provide, and what the vendors genuinely sell, is the validated GxP wrapper around those methods: the locked model, the IQ/OQ/PQ package, the audit trail, the predetermined change-control plan, and the accountable support contract. The suite is the method made transparent; the validated wrapper is the work, and the MLOps chapter is where that work is specified.

Why it matters

A buyer who cannot read this market overpays for marketing and underinvests in the boring infrastructure — data readiness, validation, change control — that actually determines whether any of it works. The map in this chapter is a defense against three specific, expensive mistakes. The first is attribution error: shelving GoSilico under "AI" (it is mechanistic), crediting Sartorius with DataHow's hybrid models (DataHow is independent), or naming the wrong acquirer for Insilico (Yokogawa, not Cytiva) — each of which quietly distorts a build-versus-buy decision. The second is tier confusion: letting a production-grade platform's marketing percentage masquerade as established fact, when it is a self-reported figure on the vendor's own process. The third is the agentic over-buy: paying for autonomous-AI promises that the draft Annex 22 forbids in critical use and that the Purolea warning letter has already shown the regulator will cite. Get the map right and the spend follows the value: a proven MVDA or vision platform where the production evidence is real, an honest pilot where the technology is genuinely early, and human-in-the-loop guardrails everywhere a model touches a critical decision.

In the real world

The market is consolidating fast, and the consolidation map is itself a buyer's tool, because it changes the name on the contract and the support behind the product. The moves to keep straight: GoSilico went to Cytiva (2021); Insilico went to Yokogawa (2021); 908 Devices' bioprocessing analytics portfolio went to Repligen (2025); Ganymede folded into Apprentice.io; and Emerson absorbed AspenTech (a roughly fifteen-billion-dollar move) [6][7][8]. The structural gap underneath all of it is the one the open suite exists to illustrate: vendor platforms are proprietary, and reproducible open-source GMP-grade bioprocess ML tooling remains immature, with DeepChem and academic libraries dominating the open side. The strongest single production case in the whole market — deep-learning automated visual inspection, with Amgen reporting roughly 95% of syringes and vials auto-released — took years and direct conversations with the FDA to qualify, and even that headline is trade-press and vendor-self-reported [17]. That is the market in one sentence: the genuinely production-grade wins are narrow, hard-won, and older than the hype; the agentic future is real as a direction and oversold as a product; and the buyer's edge is the discipline to tell the two apart.

Key terms

  • Evidence tier — how a claim was established: peer-reviewed-independent, peer-reviewed-self-authored, vendor-self-reported, or press-release-only. The market is dominated by the latter two.
  • Maturity marker — how far a product actually got: (production) in GMP/commercial, (pilot) demonstrated at scale, (research) academic/early. Independent of the evidence tier.
  • MVDA / MSPC incumbents — Sartorius SIMCA/Umetrics and AspenTech ProMV (Emerson): productized PCA/PLS monitoring, the genuinely production-grade core of the market.
  • Mechanistic vs hybrid vs ML — GoSilico (Cytiva) is mechanistic (physics, not learning); DataHow and Insilico (Yokogawa) are hybrid (physics plus a learned residual); pure ML is the rarest in production.
  • Attribution corrections — DataHow is independent (not Sartorius); GoSilico is Cytiva and mechanistic; Insilico is Yokogawa (not Cytiva).
  • GxP-AI SaaS / data layer — Aizon (regulated AI service), TetraScience and Ganymede (the AI-ready data layer that addresses the field's number-one barrier).
  • MES/automation backbone — Korber PAS-X, Siemens, Rockwell, Emerson, AVEVA: production-grade execution, control, and historian platforms onto which ML is layered, not native.
  • Agentic frontier — the line between demonstrated production AI (monitoring, maintenance, vision, human-in-the-loop docs) and announced-but-undemonstrated autonomous agents; drawn in regulation by draft Annex 22 and in enforcement by the Purolea warning letter.
  • Consolidation map — who bought whom (GoSilico to Cytiva, Insilico to Yokogawa, 908 Devices to Repligen, Ganymede to Apprentice, Emerson to AspenTech); changes the name on the contract.
  • Validated wrapper — the IQ/OQ/PQ, audit trail, change control, and locked model with a predetermined change-control plan that a vendor sells around methods that are otherwise open. The premium is the wrapper, not the algorithm.

Where this leads

The map tells you who sells what; it does not tell you whether the named deployments behind the brochures actually held up. The next chapter, Case Studies: Named Deployments and Their Evidence, takes the specific companies and numbers — Amgen Juncos, the DataHow/BMS hybrid model, WuXi's autonomous lab, Sanofi's yield analytics, the visual-inspection deployments — and grades each against its actual evidence, applying the very anatomy this chapter built to the headline stories the market tells about itself.