Skip to main content

Preface β€” Making the Same Medicine Twice

πŸ“ Where we are: Right at the door. Before any single step of making a medicine, let us agree on what this book is really about β€” and why a biologic is, in effect, made twice.

Welcome. This is a book about something invisible. When a modern medicine is manufactured, two products come off the line. One you can hold: a clear liquid in a vial, a protein grown by living cells. The other you cannot hold at all β€” it is the complete record of how that liquid came to be, every measurement and signature and test result that proves the vial contains exactly what the label says. This second product is data, and managing it well is the difference between a medicine that reaches a patient and one that does not.

You need no background to read this. If you have never set foot in a factory, never read a regulation, never thought about databases, you are exactly the reader we wrote for. We will define every specialized word the first time it appears, and again in a Key terms box at the end of each chapter.

The simple version

Imagine baking a cake for a contest where the judges do two things: they taste the cake and they read your notebook β€” every ingredient weighed, every oven temperature, every minute timed, signed and dated as you went. A delicious cake is not enough on its own; if a page of the notebook is missing, you are disqualified no matter how perfect the cake. And a flawless notebook cannot rescue a cake that fails the taste test. In regulated medicine-making, the notebook is the product just as much as the cake is β€” and both must pass. This book is about keeping that notebook.

What this chapter covers​

A quick map of the preface: who the book is for and the promise we make to you; the central idea that data is a manufactured product in its own right; how the five parts of the book fit together; how this book relates to its companion process guide, From Cell to Cure; and the small set of conventions β€” citations, glossaries, bilingual text β€” that you will see on every page.

The promise: every claim is traceable​

This is a popular book in tone and a textbook in rigor. Those two goals can fight each other, so we made a rule. Every claim that is not obvious β€” every number, every regulatory fact, every "studies show" β€” carries a small bracketed marker like this [1]. Click it and you land on a single References page that lists the exact peer-reviewed paper or regulatory document behind the statement. Wherever a claim is non-obvious, you can follow it to its source and check.

Data is the product's twin​

Here is the idea the whole book turns on.

A biologic β€” a medicine made by living cells rather than by pure chemistry, such as a monoclonal antibody (one identical antibody protein, grown by the billions in engineered cells, that locks onto a single target molecule β€” an antigen β€” in the body) β€” is hard to make the same way twice. It is grown, not assembled, so the way you make it helps define what it actually is. The regulatory field's answer to that challenge, called Quality by Design (QbD), treats deep, recorded understanding of the process as being as essential to the product as the molecule itself: you identify which process settings are critical process parameters (CPPs) and which measurable product traits are critical quality attributes (CQAs), and you capture the data linking them [1]. In other words, modern medicine-making does not just produce a molecule; it produces knowledge about the molecule, written down.

That knowledge is enormous. A single batch leaves behind a continuous trail of sensor readings β€” temperature, oxygen, acidity, and more β€” captured in real time so that quality can be judged during manufacture rather than only after, an approach regulators call Process Analytical Technology (PAT) [2]. Each reading is a small, structured fact. A single temperature point from a bioreactor might be stored as tag=BR101.Temp.PV, value=37.2, unit=Β°C, timestamp=2026-06-14T14:32:07Z, status=OK β€” a named tag (here following a structured, hierarchical naming convention of the kind associated with the ISA-95 / ISA-88 equipment hierarchy and common OPC/historian process-value (.PV) attributes), a value, a unit, the exact instant, and a quality flag. Millions of such facts accumulate. Add to that the laboratory test results, the step-by-step batch records, and the human signatures, and you have a second artifact running in parallel with the medicine itself. We call it the molecule's data shadow.

One process, two products: living cells in a bioreactor produce both the molecule in the vial and its data shadow; both must pass before a batch is released One process, two products β€” the molecule and its data shadow are made together, judged together, and released together. Original diagram by the authors, created with AI assistance.

The hard consequence is this: in a regulated process, a batch that was not recorded effectively did not happen. If the data proving a batch is good is missing, incomplete, or untrustworthy, the medicine cannot be released β€” even if the molecule in the tank is perfect. The data is not paperwork about the product. It is a product.

How to read this book: the five parts​

The book is one continuous argument, told in five parts. You can read straight through, or jump to the part you need.

  1. Part I β€” Why data is the product's twin β€” the idea you just met, unpacked. What a batch's data shadow contains, and why an undocumented batch did not happen.
  2. Part II β€” Sources, systems and architecture β€” where the data is born (sensors, instruments, people) and the systems that capture and store it across a plant.
  3. Part III β€” Integrity, compliance and validation β€” the rules that make data trustworthy enough to bet a patient's safety on, and how that trust is proven.
  4. Part IV β€” Semantics and the digital thread β€” making data not just stored but meaningful: shared vocabularies and connections so a number means the same thing to every system and person who reads it.
  5. Part V β€” Analytics and the future β€” what becomes possible once the data is clean, connected, and trustworthy, from process insight to the factories of tomorrow.

A thread runs through all five: the FAIR principles, a widely adopted set of guiding principles holding that good scientific data should be Findable, Accessible, Interoperable, and Reusable β€” not locked in a drawer or a format only one machine can read [3]. Keep FAIR in mind; we return to it often.

If you want the biology first​

This book has a companion: From Cell to Cure, a beginner's guide to the actual physical process of making a biologic β€” choosing a target, building the cells, growing them in bioreactors (the warm tanks where living cells produce the medicine), purifying the result, and filling it into vials. That guide answers how the medicine is made. This book answers how the data is made and managed.

You do not need it to follow along β€” we will reintroduce each process step as it becomes relevant. But if you would rather meet the biology and the equipment before the data, start there: the From Cell to Cure guide is its companion volume, and the two are designed to be read side by side.

Note

Throughout, we lean on the standard commercial way of making biologics to teach the basics, and point out where the modern, more continuous approach differs. In the United States, the NIIMBL institute and its SABRE pilot facility, now being built at the University of Delaware, are intended to advance that modern path; you will meet them in the real-world sections.

A few conventions​

These appear on every page, so it is worth knowing them once.

  • Citations. Inline markers like [2] link to the References page. The visible number is local to each chapter and restarts at [1] in every chapter.
  • Key terms. Each chapter ends with a short glossary of the terms it introduced, so you never have to scroll back.
  • Admonitions. Coloured boxes flag the helpful asides: a tip for the plain-English analogy near the top, a note for useful context, and a caution where a misunderstanding could genuinely cause harm.
  • Bilingual. The book is published in English and Korean (ν•œκ΅­μ–΄), so a reader can follow it in either language.
  • Trademarks. Product and company names mentioned in this book (including but not limited to Siemens, gPROMS, AspenTech, Aspen Hybrid Models, DataHow, DataHowLab, OPC UA, GAMP, PI System, Sartorius, Thermo Fisher, Cytiva, Waters, Agilent, AVEVA, OSIsoft, InfoPlus.21) may be trademarks or registered trademarks of their respective owners and are used for identification and editorial purposes only, with no claim of endorsement.
Caution

This book teaches how to think about data management in medicine-making. It is not regulatory advice, and it is not a validated procedure. Real manufacturing decisions must follow current official guidance and your organization's approved processes.

Why it matters​

If you remember one thing, make it this: managing the data is not the clerical tail of manufacturing β€” it is half the manufacturing. The molecule and its data shadow are made together, judged together, and released together. Treat the data as an afterthought and you risk a batch that is physically fine but legally and scientifically unreleasable; treat it as a product β€” designed, built, and quality-checked with the same care as the molecule β€” and everything downstream, from regulatory approval to patient trust, rests on solid ground. Every chapter that follows is, at bottom, about earning that trust.

In the real world​

This is not a theoretical concern. The Quality by Design framework that anchors Part I came directly out of regulatory science β€” set out in the ICH guidelines Q8(R2), Q9, and Q10 β€” reframing process data as core to the product rather than a by-product [1]. The push toward real-time, in-process measurement β€” PAT β€” likewise came from regulators seeking to build quality in during manufacture instead of testing for it at the end, codified in the FDA's 2004 PAT guidance [2]. And the FAIR principles that shape how we store and connect data originated in the broader scientific community and have since spread across industry and government [3].

These ideas are realized on real equipment and real software. The bioreactors that grow the cells come from vendors such as Sartorius, Thermo Fisher, and Cytiva; the chromatography systems that purify the product from Cytiva, Waters, and Agilent. Each instrument's continuous measurements β€” temperature, pH, dissolved oxygen β€” become the dense sensor streams that form part of the data shadow. Those streams do not vanish into the ether: they are captured by process historians (such as the AVEVA/OSIsoft PI System or AspenTech InfoPlus.21), orchestrated by a Manufacturing Execution System (MES), and joined by lab results held in a Laboratory Information Management System (LIMS). And because patients depend on it, that whole data shadow is governed by binding rules: electronic records and electronic signatures must satisfy the FDA's 21 CFR Part 11 and the EU's Annex 11, while computer systems are validated under guidance such as GAMP 5 (ISPE) β€” the regulations and frameworks this book returns to throughout.

Public-private efforts such as NIIMBL (the U.S. National Institute for Innovation in Manufacturing Biopharmaceuticals) carry these ideas into modern biomanufacturing; its SABRE pilot facility β€” a cGMP (current Good Manufacturing Practice β€” the binding quality rules for making medicine) plant at the University of Delaware for scaling up and de-risking manufacturing innovations and training the workforce β€” is being built to support that work. The themes of this book are the live, daily concerns of the people who make medicine.

Key terms​

  • Biologic β€” a medicine made by living cells rather than by pure chemistry.
  • Monoclonal antibody (mAb) β€” one identical antibody protein, grown in vast numbers by engineered cells, that binds to a single target molecule (an antigen) in the body.
  • Bioreactor β€” the warm tank in which living cells are grown to produce the medicine.
  • Batch / lot β€” one complete manufacturing run that produces a defined amount of product.
  • Data shadow β€” the complete body of recorded data (sensor traces, test results, batch records, signatures) generated alongside a batch.
  • Quality by Design (QbD) β€” a framework that treats recorded process understanding as essential to the product itself.
  • Critical process parameter (CPP) β€” a process setting that must be controlled because it affects product quality.
  • Critical quality attribute (CQA) β€” a measurable product trait that must stay within limits for the medicine to be safe and effective.
  • Process Analytical Technology (PAT) β€” measuring critical attributes in real time, during manufacture, rather than only after.
  • FAIR principles β€” the guiding principles that data should be Findable, Accessible, Interoperable, and Reusable.
  • NIIMBL / SABRE β€” a U.S. public-private institute and its cGMP pilot facility at the University of Delaware, now being built, advancing modern biomanufacturing.
  • cGMP (current Good Manufacturing Practice) β€” the legally binding quality rules a facility must follow to manufacture medicine.
  • References page β€” the single page where every inline citation marker resolves to its source.

Where this leads​

We have claimed that every gram of medicine casts a data shadow as essential as the molecule itself. The next chapter, The Biologic and Its Data Shadow, makes that concrete: it walks through the scale and the many types of data a single batch generates β€” the sensor traces, the batch records, the test results, the signatures β€” and introduces the documentation imperative at the heart of regulated manufacturing, the rule that has guided this industry for decades: if it isn't documented, it didn't happen.