Skip to main content

Data Across Jurisdictions: FDA, EU, PIC/S, NMPA, PMDA, MFDS

๐Ÿ“ Where we are: Part VI ยท Operating at Scale โ€” Chapter 23. The platform is built, trustworthy, and validated. Now it has to satisfy more than one regulator at the same time: a single CHO mAb line whose data is reviewed by the FDA, the EU, China's NMPA, Japan's PMDA, and Korea's MFDS. This chapter encodes residency, retention, and cross-border-transfer rules as policy-as-data and enforces the hard ones โ€” China's data localization especially โ€” directly in PostgreSQL with row-level security.

The simple version

Think of one warehouse storing parcels for five different countries. Most rules are shared โ€” every parcel must be labeled, sealed, and kept for years โ€” so you can run one warehouse, not five. But a few rules are non-negotiable and local: China says "parcels addressed to China physically stay in China, and you may not hand them to a foreign official without permission." So you paint a line on the floor, give each loading dock a country stamp, and wire the doors so a worker badged for the EU dock literally cannot open the China cage. That floor line, those door locks, and the "keep for N years, then shred" schedule are exactly what we build in this chapter โ€” in a database, where the locks are row-level security and the schedule is a retention clock.

What this chapter coversโ€‹

For twenty-two chapters we built one platform. This chapter asks the question that breaks naive designs: what happens when one platform must answer to six regulators with overlapping-but-not-identical rules? We will:

  • separate what is shared across regulators (an ALCOA+ data-integrity baseline, audit-trail review, long retention) from what is per-region (exact retention spans, residency, cross-border transfer);
  • encode the per-region rules as policy-as-data โ€” a gov.jurisdiction_policy table the application, the retention clock, and the policy engine all read from one source of truth;
  • enforce data residency with PostgreSQL row-level security (RLS), so a session badged for one region cannot see or write another region's rows;
  • run a retention clock as a queryable view that surfaces exactly which records have aged past their region's window;
  • and face the architectural wall honestly: China's NMPA-plus-PIPL/DSL localization regime is the one place where "one platform, many regions" stops being a software trick and becomes a deployment decision.

Every snippet here comes from two real, tested files: the governance schema in examples/platform/db/40-gov.sql, which the PostgreSQL container applies automatically when it first initializes (the ../db directory is mounted into /docker-entrypoint-initdb.d, where Postgres runs the 00โ€“60 schema files in order), and the residency/retention logic plus demonstration in examples/chapters/23-multi-jurisdiction/residency.sql. The policy rows, the RLS objects, and every output you will see are produced by running residency.sql against the live PostgreSQL 17 service on a laptop:

docker exec -i -e PGPASSWORD=bioproc sensor-to-submission-postgres-1 \
psql -U bioproc -d bioproc -q < chapters/23-multi-jurisdiction/residency.sql

That one command seeds the jurisdiction policy, creates the regulated-record table and its row-level-security policy, inserts the per-region demo records, and runs the session queries shown below โ€” so you can reproduce every row.

The landscape: converged, but not identicalโ€‹

A CHO + Protein A mAb made for a global market is inspected by a crowd. The good news is that the crowd mostly agrees. Most of the world's drug-GMP inspectorates โ€” the US FDA, the EU member-state authorities, Japan's PMDA, and Korea's MFDS among them โ€” are PIC/S Participating Authorities, which means they have aligned their data-management expectations on a shared baseline [1]. That baseline is PIC/S PI 041, the guide that turns ALCOA+ data integrity, audit-trail review, access control, and retention into a common inspection language [2]. The MHRA's data-integrity guidance โ€” explicitly harmonized with PIC/S, WHO, OECD, and EMA โ€” says the same thing in British English [3]. And above all of it sits ICH, whose lifecycle-management framework (Q12 and its established-conditions concept) is adopted across FDA, EU, PMDA, and MFDS, so the scientific model of the product is genuinely shared [4].

That convergence is why a single platform is even thinkable. The audit trail, hash chain, and e-signature work from Chapters 20 and 21 satisfy the shared ALCOA+ core for every PIC/S member at once. We do not build five audit systems.

The bad news is in the gaps. Three things refuse to converge, and they are exactly the three this chapter must encode:

Rule that divergesUS (FDA)EUChina (NMPA)Japan (PMDA)Korea (MFDS)
Retention spanโ‰ฅ1 yr past expiry; ~10 yr typical [5]until 1 yr past expiry or 5 yr past QP certification, whichever is longer [6]long (NMPA GMP)long (PMDA)long (MFDS)
Residencyglobal_okin_region (GDPR-shaped)in_region (mandatory) [7]global_okin_region
Cross-border transferretrievable copy OK [5]adequacy / SCCsassessment + consent; no foreign-authority handover [8]low frictionconsent-based

The retention difference is not academic. The EU rule โ€” one year past expiry or five years past the Qualified Person's certification, whichever is longer โ€” can be strictly longer than the US "one year past expiry," because QP certification happens at release [6]. Hard-coding "ten years" would quietly under-retain an EU batch. So retention has to be data, not a constant in a script.

Policy-as-data: the jurisdiction tableโ€‹

The cleanest way to serve many regulators is to stop scattering their rules through code and put them in one table that everything reads. That table is defined once, in examples/platform/db/40-gov.sql:

-- Per-region residency/retention policy as data (Ch 23); OPA reads this too.
CREATE TABLE gov.jurisdiction_policy (
region text NOT NULL, -- US | EU | CN | JP | KR
data_class text NOT NULL, -- gmp_record | personal | telemetry
residency text NOT NULL, -- in_region | global_ok
retention_days int NOT NULL,
PRIMARY KEY (region, data_class)
);

Four columns carry the whole multi-jurisdiction story. region and data_class form the key, because a country can rule batch records and personal data differently (China's PIPL is about personal information; its DSL graded-data rules are about "important data" โ€” the same region, two classes). residency decides whether data may leave: in_region or global_ok. retention_days is the shred-clock. Because this is a table, a regulatory change is a one-row UPDATE under change control (Chapter 24), not a code release โ€” and the same row is read by the application and the retention clock, and is designed to be read by an external policy engine (the OPA hook discussed at the end of the chapter), so the components that share it cannot drift apart.

The chapter file then seeds it with real spans, in examples/chapters/23-multi-jurisdiction/residency.sql:

-- retention policy as data (per region + data class), seeded with real spans
INSERT INTO gov.jurisdiction_policy (region, data_class, residency, retention_days) VALUES
('US', 'gmp_record', 'global_ok', 3650), -- 21 CFR 211: >=1 yr past expiry; ~10 yr typical
('EU', 'gmp_record', 'in_region', 3650), -- Annex 11 / GMP retention
('CN', 'gmp_record', 'in_region', 3650), -- NMPA + data-localization (PIPL/DSL)
('JP', 'gmp_record', 'global_ok', 1825), -- PMDA
('KR', 'gmp_record', 'in_region', 1825) -- MFDS
ON CONFLICT (region, data_class) DO UPDATE
SET residency = EXCLUDED.residency, retention_days = EXCLUDED.retention_days;

The ON CONFLICT ... DO UPDATE (an upsert) makes the seed idempotent: re-running it updates a region's span rather than erroring, which is exactly what you want when a rule changes. Querying the table after running residency.sql (the command above) gives the policy the rest of the system obeys:

region | data_class | residency | retention_days
--------+------------+-----------+----------------
CN | gmp_record | in_region | 3650
EU | gmp_record | in_region | 3650
JP | gmp_record | global_ok | 1825
KR | gmp_record | in_region | 1825
US | gmp_record | global_ok | 3650
(5 rows)

These spans are deliberately conservative round numbers (3650 days โ‰ˆ 10 years; 1825 โ‰ˆ 5 years) chosen to be defensible, not to litigate every clause โ€” the point the chapter teaches is the mechanism, and the real numbers live in your validated SOP. What matters is that they are looked up, not assumed.

Residency by construction: row-level securityโ€‹

Retention is a clock. Residency is a wall โ€” and a wall is only real if a session physically cannot step over it. PostgreSQL's row-level security gives us that wall inside the database: a policy attached to a table filters every query so a session sees only the rows a USING expression admits, with the database enforcing it on SELECT, INSERT, UPDATE, and DELETE rather than trusting the application to remember [9]. Here is the regulated-record table and its policy, from examples/chapters/23-multi-jurisdiction/residency.sql:

-- regulated records carry the region that owns them
CREATE TABLE IF NOT EXISTS gov.regulated_record (
record_id bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
region text NOT NULL, -- US | EU | CN | JP | KR
batch_id text,
data_class text NOT NULL DEFAULT 'gmp_record',
created_ts timestamptz NOT NULL DEFAULT now(),
payload jsonb NOT NULL DEFAULT '{}'
);

ALTER TABLE gov.regulated_record ENABLE ROW LEVEL SECURITY;
ALTER TABLE gov.regulated_record FORCE ROW LEVEL SECURITY;

DROP POLICY IF EXISTS region_isolation ON gov.regulated_record;
CREATE POLICY region_isolation ON gov.regulated_record
USING (region = current_setting('app.region', true)
OR current_setting('app.region', true) = 'GLOBAL');

Two design choices carry real weight. FORCE ROW LEVEL SECURITY makes the policy apply even to the table owner โ€” without it, the user who owns the table is exempt, which would punch a hole straight through the wall. And the USING expression keys off current_setting('app.region', true), a per-session variable the application sets at connection time with SET app.region = 'EU'. The true argument means "return NULL, don't error, if it's unset" โ€” so a connection that forgets to declare its region sees nothing, which is the safe default. The 'GLOBAL' escape hatch is for a privileged audit/QA role that legitimately needs to review across regions.

There is one honest subtlety the file's comments call out: a PostgreSQL superuser, or any role created with BYPASSRLS, ignores row security entirely. RLS only protects you if your application connects as a least-privilege role. So the demo creates exactly that โ€” app_rls, a NOSUPERUSER NOBYPASSRLS role โ€” and runs as it. Same honesty as the audit chapter: the control binds the application and the ordinary user; it does not defeat the platform's own administrator, and that gap is closed by operational segregation of duties, not SQL.

A single PostgreSQL regulated_record table with five colored region bands (US, EU, CN, JP, KR); three application sessions badged EU, CN, and GLOBAL connect through a row-level-security gate that filters each session to only its region&#39;s rows, while a retention-clock arrow sweeps the table and flags an over-age CN record for purge; the CN band is drawn inside a heavier locked boundary labelled in-region / no foreign handover.

One table, many regulators: row-level security filters each session to the region it is badged for (GLOBAL is the cross-region audit role), the retention clock sweeps for over-age records per region, and China's in_region band sits behind a hard residency boundary that software alone cannot relax. Original diagram by the authors, created with AI assistance.

Watching the wall holdโ€‹

The mechanism is only convincing if you can see it work. After seeding one record per region (plus one deliberately ancient CN record, BATCH-2015-099, given an explicit created_ts of 2015-07-02 so its retention math is checkable), we connect as app_rls, declare a region, and look. An EU session:

SET ROLE app_rls;
SET app.region = 'EU';
SELECT record_id, region, batch_id FROM gov.regulated_record ORDER BY record_id;
record_id | region | batch_id
-----------+--------+----------------
2 | EU | BATCH-2026-002
(1 row)

One row โ€” the EU's. The US, CN, JP, and KR rows are not "hidden by the UI"; they do not exist as far as this session's SQL is concerned. Switch the same role to a China session and the view changes completely:

SET app.region = 'CN';
SELECT record_id, region, batch_id FROM gov.regulated_record ORDER BY record_id;
record_id | region | batch_id
-----------+--------+----------------
3 | CN | BATCH-2026-003
6 | CN | BATCH-2015-099
(2 rows)

Now the real test โ€” can a session smuggle data across the wall? A China-badged session tries to write an EU record:

SET app.region = 'CN';
INSERT INTO gov.regulated_record (region, batch_id) VALUES ('EU','SNEAKY-001');
ERROR: new row violates row-level security policy for table "regulated_record"

The database refuses. Under FORCE ROW LEVEL SECURITY, the USING expression is applied as the implicit write check, so a session can only insert rows that match its own region โ€” residency is enforced on the way in, not just on the way out. This is the difference between a policy you document and a policy the system cannot violate.

The retention clockโ€‹

Residency decides where data lives; retention decides when it dies. Because the spans are data, the clock is just a join against gov.jurisdiction_policy. The chapter file defines it as a view, in examples/chapters/23-multi-jurisdiction/residency.sql:

-- find records past their region's retention window (the clock job acts on these)
CREATE OR REPLACE VIEW gov.v_retention_due AS
SELECT r.record_id, r.region, r.batch_id, r.data_class, r.created_ts, p.retention_days,
(r.created_ts + (p.retention_days || ' days')::interval)::date AS purge_after
FROM gov.regulated_record r
JOIN gov.jurisdiction_policy p
ON p.region = r.region AND p.data_class = r.data_class
WHERE now() > r.created_ts + (p.retention_days || ' days')::interval;

The view computes each record's purge_after date โ€” its creation timestamp plus the region's retention_days rendered as an interval, then cast to a plain date so the output reads as a calendar day โ€” and returns only the records whose date is already in the past. Querying it surfaces precisely the records a scheduled job (a cron-driven psql call, or a service) should act on:

record_id | region | batch_id | retention_days | purge_after
-----------+--------+----------------+----------------+-------------
6 | CN | BATCH-2015-099 | 3650 | 2025-06-29
(1 row)

The 2015 CN record has aged past its 10-year window (created_ts 2015-07-02 + 3650 days = purge-after 2025-06-29, today being mid-2026) and is flagged; every other record is still within its region's span and is correctly left alone. Crucially, a view names the work but does not do it โ€” and that is deliberate. GMP retention is "keep until," not "auto-delete the instant the clock strikes." A real deletion of a regulated record runs under change control with QA sign-off, and the act of deletion is itself an audited, hash-chained event (Chapter 20). The clock's job is to make the candidate set explicit and reviewable, not to silently destroy records.

At the volumes a high-rate historian produces, the efficient way to execute a purge is not row-by-row DELETE but PostgreSQL declarative range partitioning: partition the underlying time-series by time (and, where residency demands, by region), so retiring a window of expired data is a near-instant DETACH/DROP of a whole partition rather than a costly scan-and-delete [10]. The view tells you what is due; partitioning is how you drop it cheaply.

Why it mattersโ€‹

Get multi-jurisdiction wrong and the failure modes are expensive in opposite directions. Under-retain โ€” shred an EU batch record at the US ten-year mark when the QP-certification clock demanded longer โ€” and you have destroyed a GMP record an inspector can still ask for [6]. Over-share โ€” let a China-resident record sync to a US cloud region, or hand it to a foreign authority during a routine request โ€” and you have breached China's PIPL and DSL, which carry penalties an order of magnitude beyond a GMP observation [8][7].

The architectural lesson is that the shared 80% and the divergent 20% want different treatment. The shared ALCOA+ baseline is best solved once, deeply, with the audit and signature machinery already built โ€” there is no value in five copies. The divergent rules are best solved as data plus a wall: a policy table any component can read, and an RLS boundary the database enforces regardless of application bugs. Encoding the rules this way turns "we have a procedure for that" into "the system cannot do otherwise," which is the difference between a binder and a control.

In the real worldโ€‹

Here is the honest reckoning. Most of this chapter genuinely works in pure open source, and works well. PostgreSQL RLS is a mature, battle-tested control โ€” the same mechanism multi-tenant SaaS products lean on โ€” and using it for data residency is squarely within its design [9]. Policy-as-data plus a retention view is simple, auditable, and version-controlled. For the FDA, EU, PMDA, and MFDS โ€” converged PIC/S members sharing an ALCOA+ baseline โ€” one validated platform with per-region policy rows is a defensible architecture [2][1].

China is where "one platform" hits a wall that no SQL policy can climb. PIPL requires a security assessment, certification, or standard contract before personal information leaves China, plus separate consent โ€” and, pointedly, bars handing over China-stored data to a foreign judicial or law-enforcement authority without PRC approval [8]. The DSL layers a graded "important data" regime with its own outbound-management controls on top [7]. RLS can stop a query from reading CN rows, but it cannot stop your backups, replication, and disaster-recovery from copying the bytes to another country โ€” and that copy is the violation. The realistic answer is architectural: a separate in-China deployment of the stack (its own PostgreSQL, object store, and backups, all physically inside China), with only de-identified or aggregated, transfer-approved data crossing the border. The residency = 'in_region' flag is the trigger for that decision; the decision itself is a second cluster. Pure OSS does not make the requirement disappear โ€” it just makes the boundary explicit and the same software portable to both sides.

This is also where the cross-border decision outgrows a single SQL USING clause. "May this record move from CN to a US analytics workspace?" depends on region, data class, consent, and transfer mechanism together โ€” a policy richer than a row filter. The natural next step is to externalize that decision into a dedicated policy engine such as Open Policy Agent (OPA), reading the very same gov.jurisdiction_policy table so the database and the engine share one source of truth; OPA ships under Apache-2.0, so there is no license trap in adopting it. The schema is deliberately designed for this โ€” the comment in 40-gov.sql notes the policy table is meant to be read by OPA "too" โ€” but, to be clear, this chapter's code does not ship an OPA service or a Rego policy: there is no opa container in the companion compose.yaml and no .rego file in the repo. The table is the honest, present-tense foundation; the Rego policy that would consume it is left as an illustrative extension, not a running part of the stack.

The grounding case is NIIMBL's SABRE facility โ€” the NIIMBL / University of Delaware pilot-scale cGMP (current Good Manufacturing Practice) plant that broke ground in April 2024. SABRE is a facility, not a data standard or a jurisdiction program; but it is exactly the kind of multi-partner, technology-transfer-heavy line whose process data may be consumed by collaborators and regulators across borders, which is precisely the setting where residency-by-construction and policy-as-data earn their keep. The takeaway for a builder: the open-source stack gets you a genuinely strong, inspection-aligned residency-and-retention layer for the converged regulators โ€” and an honest, software-portable boundary for the one (China) that legally cannot be served from a single global deployment.

Key termsโ€‹

  • Jurisdiction / regulator โ€” a legal authority (FDA, EU member states, NMPA, PMDA, MFDS) whose GMP and data rules a record must satisfy.
  • PIC/S โ€” the Pharmaceutical Inspection Co-operation Scheme; its PI 041 guide harmonizes data-integrity expectations across participating inspectorates, which is why one ALCOA+ baseline serves many of them.
  • Policy-as-data โ€” encoding regulatory rules (residency, retention) as rows in gov.jurisdiction_policy that the app and the retention clock read (and that an external policy engine such as OPA is designed to read), rather than hard-coding them.
  • Data residency โ€” the requirement that certain data physically remain within a region; enforced here by row-level security and, for China, by a separate in-region deployment.
  • Row-level security (RLS) โ€” a PostgreSQL feature where a CREATE POLICY USING expression filters every query by session context (app.region); FORCE ROW LEVEL SECURITY extends it to the table owner.
  • BYPASSRLS / superuser โ€” roles that ignore RLS entirely; the reason the application must connect as a least-privilege role (app_rls) and segregation of duties is still required.
  • Retention period โ€” how long a record must be kept (retention_days), differing per region; the EU's "1 yr past expiry or 5 yr past QP certification, whichever is longer" can exceed the US span.
  • Retention clock โ€” the gov.v_retention_due view that surfaces records past their window for reviewed deletion, not automatic destruction.
  • PIPL / DSL โ€” China's Personal Information Protection Law and Data Security Law; together they impose data localization and cross-border-transfer controls that force a separate in-China deployment.
  • Cross-border transfer โ€” moving data between jurisdictions; permitted freely for some regions, gated by assessment/consent/approval for China.

Where this leadsโ€‹

Residency and retention assume the platform itself holds still โ€” but in a working plant, recipes get revised, skids get swapped, and data formats evolve, and every one of those changes must happen without breaking the audit trail, the genealogy, or the region tagging we just built. In Chapter 24 โ€” Managing Change: Process Changes, Equipment Swaps & Schema Evolution, we treat change as a first-class data problem: effective-dated recipe versions, swapping a bioreactor or instrument while preserving lot genealogy, and migrating a changed data format with reversible, verified migrations under change control โ€” so the platform can move forward without losing a single link in the chain of trust.