Skip to main content

Data Integrity and ALCOA+

πŸ“ Where we are: We have learned to move data between systems with structure and meaning; now we ask the harder question β€” can we trust the record once it arrives?

The previous chapter, Connectivity and Interoperability Standards, showed how biomanufacturing data flows between machines and partners β€” OPC UA, MTP, SiLA 2, AnIML and Allotrope, B2MML/ISA-95 β€” and drew a sharp line between moving bytes and preserving meaning. But there is a deeper line still. A record can arrive perfectly formatted, semantically rich, and completely untrustworthy. If a sensor reading was quietly edited, if a failing test was repeated until it passed, if two operators shared one login so nobody knows who did what β€” then the data is a polished lie. This chapter is about making data true, not just transmittable.

The simple version

Think of data integrity like the chain of custody for evidence in a courtroom. It is not enough that the evidence exists β€” you have to prove who handled it, when, that it was never tampered with, and that what the jury sees is the genuine article. A batch record is the evidence that a medicine was made correctly. Data integrity is the chain of custody that makes that evidence believable.

What this chapter covers​

We will define data integrity and why regulators treat it as the foundation of product quality. We will unpack ALCOA+ β€” the nine attributes of trustworthy data β€” with concrete bioprocess examples. We will tour the common ways data integrity fails, the global wave of guidance that converged to address it, and the idea of building integrity in by design rather than policing it after the fact.

What data integrity means, and why it is the foundation​

Data integrity is the degree to which data is complete, consistent, and accurate throughout its entire life β€” from the moment it is created, through processing and storage, to eventual archiving and retrieval [2]. Regulators do not treat this as a clerical nicety. They treat it as the bedrock of product quality, because the evidence that a medicine was made safely is, in the end, the data generated while making it. A patient cannot inspect an antibody. A regulator cannot re-run a year of manufacturing. Both must rely on the record. If the record cannot be trusted, then no claim about safety or efficacy can be trusted either [1].

The World Health Organization frames it plainly: assuring data integrity requires appropriate quality and risk management systems, and good documentation practices, applied across the whole data life cycle [4]. The European Medicines Agency goes further, calling data integrity a fundamental requirement of the pharmaceutical quality system that applies equally to paper and electronic records, and a responsibility of senior management and the organization's quality culture [5].

ALCOA and ALCOA+: the anatomy of trustworthy data​

The acronym ALCOA was coined inside the U.S. FDA in the 1990s by Stan Woollen, an official working on Good Laboratory Practice (the rules, under 21 CFR Part 58, for non-clinical safety studies), as a personal mnemonic to organize his own presentations on what good data looks like β€” originally as a data-quality aid that later guidance adopted as the backbone of data integrity [8]. It stands for Attributable, Legible, Contemporaneous, Original, and Accurate. Later guidance added four more attributes β€” Complete, Consistent, Enduring, and Available β€” giving ALCOA+ [2]. Here is each one, grounded in a bioreactor.

  • Attributable β€” you can tell who did or recorded something, and when. When an operator adjusts the feed rate, the record must name that person. This is why shared logins are forbidden, and why an electronic signature β€” legally equivalent to a handwritten one when the required controls are met β€” has to show, in plain readable form, the signer's printed name, the date and time, and the meaning of the signature (such as reviewed, approved, or authored) [3].
  • Legible β€” the record is readable and permanent, and stays so for its whole retention period. A pencil entry that can be erased, or a file format no one can open in ten years, fails this test [7].
  • Contemporaneous β€” the record is made at the time the work happens, not reconstructed from memory hours later. A pH reading logged when it was taken, not back-filled at end of shift [2].
  • Original β€” the data is the first capture (or a verified "true copy" of it), not a transcription. The chromatography instrument's own raw data file is original; a number hand-copied into a notebook is not [1].
  • Accurate β€” the data is correct, truthful, and free of errors, reflecting what actually happened [4].

The "plus" attributes round out the picture: Complete (nothing deleted, including failed runs and repeats), Consistent (events in true chronological order with synchronized clocks), Enduring (recorded on durable media, not a sticky note), and Available (retrievable for review throughout its lifetime) [2].

The mechanism that ties Attributable and Contemporaneous together β€” and protects the Original by logging every change to it β€” is the audit trail β€” a secure, computer-generated, time-stamped record of who did what, when, and (where relevant) why, that is protected so it cannot be switched off or quietly altered, and any change to it is itself recorded [3]. Concretely, a single audit-trail entry records the field changed, its old and new values, who made the change, the exact timestamp, and the reason β€” for example: tag BR101.Temp.SP changed from 37.0 Β°C to 36.5 Β°C by j.okoye on 2026-06-13 14:07:22 UTC; reason: deviation DEV-2206 correction. An audit trail turns "trust me" into "check the log."

The ALCOA+ life cycle: every event is captured, attributed, and reviewable β€” so the final record can be trusted. Original diagram by the authors, created with AI assistance.

Interior of a controlled cleanroom A controlled cleanroom. Just as the physical environment is controlled to protect the product, data controls protect the integrity of the record that proves it. Cleanroom. Image by UCL Mathematical and Physical Sciences, CC BY 2.0, via Wikimedia Commons.

How data integrity fails​

Knowing the rules makes the failure modes easier to spot. Regulatory guidance and years of inspection findings point to a recurring cast of problems [1][3]:

  • Shared logins. When several people use one account, attribution collapses β€” you cannot tell who made an entry, defeating Attributable [1].
  • Disabled or unreviewed audit trails. If the audit trail is turned off, or switched on but never examined, changes go unseen. Guidance now expects audit trails to be reviewed with the same rigor as the data itself [1].
  • "Testing into compliance." Repeating a test until a passing result appears, then reporting only that result and discarding the failures. FDA explicitly prohibits this; deleting the failing runs violates Complete and Original [1].
  • Orphaned data. Results recorded under no batch, no sample, or no clear owner β€” data with no home that cannot be reconciled with the official record [3].
  • Clock manipulation. Changing a system clock to back-date an entry, or running unsynchronized clocks so the true order of events is unknowable, attacks Contemporaneous and Consistent [3].
caution

Many of these failures are not dramatic fraud. They start as shortcuts under deadline pressure β€” one shared password "just for tonight," one re-run "because the column was equilibrating." A healthy data-integrity culture makes the honest path the easy path, so the shortcut is never tempting [7].

The guidance wave that reshaped the industry​

In the 2010s, a string of high-profile data-integrity findings prompted regulators worldwide to publish convergent guidance. The result is a remarkably consistent global expectation [6]:

GuidanceBodyYear
Questions and Answers: GMP β€” Data Integrity [5]EMA (EU)2016
Data Integrity and Compliance with Drug CGMP: Q&A [1]FDA (US)2018
'GXP' Data Integrity Guidance and Definitions [2]MHRA (UK)2018
Guideline on Data Integrity (TRS 1033, Annex 4) [4]WHO2021
Good Practices for Data Management and Integrity (PI 041-1) [3]PIC/S2021

These guidance documents do not stand alone β€” they interpret and reinforce the binding regulations beneath them. In the United States, electronic records and electronic signatures are governed by 21 CFR Part 11; in the European Union, computerized systems used in manufacturing fall under EU GMP Annex 11. The data-integrity guidance explains how to satisfy those rules in practice, which is why the two are usually read together (the next chapter takes up Part 11 and Annex 11 directly).

note

Although these documents come from different agencies, they say strikingly similar things: define ALCOA+, govern data across its full life cycle, apply controls commensurate with risk (more rigor for data that affects patient safety), and treat integrity as a quality-culture responsibility led from the top [3][4]. A company that satisfies one is largely aligned with all.

Integrity by design​

The most important shift in modern thinking is that integrity should be built in, not inspected in [6]. The guidance distinguishes two layers of control. Technical (system) controls are enforced by the computer itself: unique user accounts, role-based access, synchronized system clocks (time sync), audit trails that cannot be switched off, and routine audit-trail review [2]. Behavioral (procedural) controls are enforced by people: training, good documentation practices, and an open culture where staff feel safe reporting errors rather than hiding them [7]. Technical controls are stronger because they do not depend on willpower β€” but neither layer alone is enough [3].

Why it matters​

For data management, ALCOA+ is not abstract philosophy β€” it is a concrete specification for how systems must be built and operated. Every database, historian (the time-series database that archives process data), and laboratory system in the regulated world inherits these requirements: it must attribute actions to individuals, time-stamp them against a trusted clock, preserve originals, and keep an immutable audit trail [6]. Designing a data architecture without ALCOA+ in mind guarantees expensive rework β€” or a record that cannot be defended when a regulator asks, "How do you know this is true?"

In the real world​

This is why interoperability and integrity must be designed together. A standard like AnIML or Allotrope can carry an instrument's raw data and its full context, helping satisfy Original and Complete β€” but only if the pipeline never silently transforms or drops fields along the way [6]. The U.S. NIIMBL institute's real-time lab-data integration work β€” including the NIIMBL–NIST proof of concept that streams live instrument data against a shared ontology β€” faces exactly this tension: pulling data live from many instruments and partners while preserving attribution, time order, and the unbroken audit trail that ALCOA+ demands. Real-time data is only an asset if it is real-time trustworthy data.

Key terms​

  • Data integrity β€” the degree to which data is complete, consistent, and accurate throughout its entire life cycle.
  • ALCOA β€” Attributable, Legible, Contemporaneous, Original, Accurate; the original five attributes of good data.
  • ALCOA+ β€” ALCOA plus Complete, Consistent, Enduring, and Available.
  • Audit trail β€” a secure, time-stamped, computer-generated log of who did what, when, and why, protected so it cannot be switched off or quietly altered.
  • Electronic signature β€” a computer entry, legally equivalent to a handwritten signature, that shows the signer's printed name, the date and time, and the meaning of the signing (reviewed, approved, or authored).
  • Good Documentation Practices β€” the behavioral rules for recording data correctly and contemporaneously.
  • Technical controls β€” integrity safeguards enforced by the system (unique logins, time sync, audit trails).
  • Behavioral controls β€” integrity safeguards enforced by people (training, culture, procedures).
  • Testing into compliance β€” the prohibited practice of repeating a test until it passes and reporting only the pass.
  • Risk-commensurate controls β€” applying more rigorous controls to data of greater impact on patient safety.

Where this leads​

Audit trails, originals, and attributed signatures only carry weight if the law treats an electronic record as equal to a signed piece of paper. The next chapter, Records, Signatures, and the Law: 21 CFR Part 11 and EU Annex 11, turns ALCOA+ from good practice into legal obligation β€” explaining what makes an electronic record and an electronic signature trustworthy and legally equivalent to ink on paper, and how closed and open systems, audit trails, copies, retention, and validation make that equivalence hold.