Data Across Jurisdictions: FDA, EU, PIC/S, NMPA, PMDA, MFDS

📍 Where we are: Part VI · Operating at Scale — Chapter 26. The platform is built, trustworthy, and validated. Now it has to satisfy more than one regulator at the same time: a single CHO (Chinese Hamster Ovary cell) mAb (monoclonal antibody) line — one engineered, master-banked cell population making one product — whose data is reviewed by five national drug regulators (each spelled out under Key terms): the FDA, the EU, China's NMPA, Japan's PMDA, and Korea's MFDS. This chapter encodes residency, retention, and cross-border-transfer rules as policy-as-data and enforces the hard ones — China's data localization especially, the requirement that certain data physically stay inside the country — directly in PostgreSQL with row-level security.

The simple version

Think of one warehouse storing parcels for five different countries. Most rules are shared — every parcel must be labeled, sealed, and kept for years — so you can run one warehouse, not five. But a few rules are non-negotiable and local: China says "parcels addressed to China physically stay in China, and you may not hand them to a foreign official without permission." So you paint a line on the floor, give each loading dock a country stamp, and wire the doors so a worker badged for the EU dock literally cannot open the China cage. That floor line, those door locks, and the "keep for N years, then shred" schedule are exactly what we build in this chapter — in a database, where the locks are row-level security and the schedule is a retention clock.

What this chapter covers

For twenty-two chapters we built one platform. This chapter asks the question that breaks naive designs: what happens when one platform must answer to five regulators with overlapping-but-not-identical rules? We will:

separate what is shared across regulators (an ALCOA+ data-integrity baseline — the regulator expectation that records be Attributable, Legible, Contemporaneous, Original, and Accurate, plus Complete, Consistent, Enduring, and Available — audit-trail review, long retention) from what is per-region (exact retention spans, residency, cross-border transfer);
encode the per-region rules as policy-as-data — a gov.jurisdiction_policy table the application, the retention clock, and the policy engine all read from one source of truth;
enforce data residency with PostgreSQL row-level security (RLS), so a session badged for one region cannot see or write another region's rows;
run a retention clock as a queryable view that surfaces exactly which records have aged past their region's window;
and face the architectural wall honestly: China's NMPA-plus-PIPL/DSL localization regime (PIPL, the Personal Information Protection Law; DSL, the Data Security Law) is the one place where "one platform, many regions" stops being a software trick and becomes a deployment decision — where deployment means where and how the software actually runs (one shared cluster, or a second one physically inside China), as opposed to a change you make in the code alone.

Every snippet here comes from two real, tested files: the governance schema in examples/platform/db/40-gov.sql, which the PostgreSQL container (the isolated, self-contained PostgreSQL instance the companion stack runs) applies automatically when it first initializes — the ../db directory of schema files is mounted (made visible) inside the container at the special path /docker-entrypoint-initdb.d, the folder where Postgres, on first startup, runs every .sql file it finds in filename order, which is why the files are numbered 00–60, and the residency/retention logic plus demonstration in examples/chapters/23-multi-jurisdiction/residency.sql. The policy rows, the RLS objects, and every output you will see are produced by running residency.sql against the live PostgreSQL 17 service on a laptop:

docker exec -i -e PGPASSWORD=bioproc bioprocess-data-stack-postgres-1 \
  psql -U bioproc -d bioproc -q < chapters/23-multi-jurisdiction/residency.sql

That one command runs inside the database container (docker exec), opens PostgreSQL's command-line client (psql) as the bioproc user against the bioproc database, and feeds it the residency.sql file (the < redirect). It seeds the jurisdiction policy, creates the regulated-record table and its row-level-security policy, inserts the per-region demo records, and runs the session queries shown below — so you can reproduce every row.

The landscape: converged, but not identical

A CHO + Protein A mAb — an antibody grown in CHO cells and captured on a Protein A purification column (the biologic whose manufacture Book 1 walks through) made for a global market is inspected by a crowd. The good news is that the crowd mostly agrees. Most of the world's drug-GMP inspectorates — the US FDA, the EU member-state authorities, Japan's PMDA, and Korea's MFDS among them — are PIC/S (the Pharmaceutical Inspection Co-operation Scheme) Participating Authorities, which means they have aligned their data-management expectations on a shared baseline [7]. That baseline is PIC/S PI 041 — a published inspection guide, identified by that document code — the guide that turns ALCOA+ data integrity, audit-trail review, access control, and retention into a common inspection language [3]. The MHRA's data-integrity guidance — explicitly harmonized with PIC/S, WHO, OECD, and EMA — says the same thing in British English [4]. And above all of it sits ICH (the International Council for Harmonisation), whose lifecycle-management framework (its Q12 guideline and the "established conditions" it defines) is adopted across FDA, EU, PMDA, and MFDS, so the scientific model of the product is genuinely shared [8].

That convergence is why a single platform is even thinkable. The audit trail, hash chain, and e-signature work from Chapters 23 and 24 satisfy the shared ALCOA+ core for every PIC/S member at once. We do not build five audit systems.

The bad news is in the gaps. Three things refuse to converge, and they are exactly the three this chapter must encode:

Rule that diverges	US (FDA)	EU	China (NMPA)	Japan (PMDA)	Korea (MFDS)
Retention span	≥1 yr past expiry; ~10 yr typical [1]	until 1 yr past expiry or 5 yr past QP certification, whichever is longer [2]	~10 yr (NMPA GMP)	~5 yr (PMDA)	~5 yr (MFDS)
Residency	global_ok	in_region (GDPR-shaped)	in_region (mandatory) [6]	global_ok	in_region
Cross-border transfer	retrievable copy OK [1]	adequacy / SCCs	assessment + consent; no foreign-authority handover [5]	low friction	consent-based

Residency values are the literal policy-table flags: global_ok = the data may leave its region; in_region = it must physically stay. The retention spans are the deliberately conservative round numbers seeded as data below (about 10 years = 3650 days; about 5 years = 1825 days); the exact statutory number lives in your validated SOP. Japan and Korea are seeded at the shorter five-year span here — concrete, not vague — so the table agrees with the policy rows the rest of the system reads.

The retention difference is not academic. The EU rule — one year past expiry or five years past the Qualified Person's certification, whichever is longer — can be strictly longer than the US "one year past expiry," because QP certification — the EU sign-off, by the legally accountable Qualified Person, that releases a finished batch for sale — happens at release [2]. Hard-coding "ten years" would quietly under-retain an EU batch. So retention has to be data, not a constant in a script. (The residency column's "GDPR-shaped" note refers to the EU's General Data Protection Regulation, the law that shapes where personal data may reside; "adequacy / SCCs" are its two routes for moving that data abroad — an adequacy decision for trusted countries, or Standard Contractual Clauses where none exists.)

Policy-as-data: the jurisdiction table

The cleanest way to serve many regulators is to stop scattering their rules through code and put them in one table that everything reads. That table is defined once, in examples/platform/db/40-gov.sql:

-- Per-region residency/retention policy as data (Ch 26); OPA reads this too.
CREATE TABLE gov.jurisdiction_policy (
    region        text NOT NULL,            -- US | EU | CN | JP | KR
    data_class    text NOT NULL,            -- gmp_record | personal | telemetry
    residency     text NOT NULL,            -- in_region | global_ok
    retention_days int  NOT NULL,
    PRIMARY KEY (region, data_class)
);

Four columns carry the whole multi-jurisdiction story. region and data_class form the key, because a country can rule batch records and personal data differently (China's PIPL is about personal information; its DSL graded-data rules are about "important data" — the same region, two classes). residency decides whether data may leave: in_region or global_ok. retention_days is the shred-clock. Because this is a table, a regulatory change is a one-row UPDATE under change control (Chapter 27), not a code release — and the same row is read by the application and the retention clock, and is designed to be read by an external policy engine (the OPA hook discussed at the end of the chapter), so the components that share it cannot drift apart.

The chapter file then seeds it with real spans, in examples/chapters/23-multi-jurisdiction/residency.sql:

-- retention policy as data (per region + data class), seeded with real spans
INSERT INTO gov.jurisdiction_policy (region, data_class, residency, retention_days) VALUES
    ('US', 'gmp_record', 'global_ok',  3650),   -- 21 CFR 211: >=1 yr past expiry; ~10 yr typical
    ('EU', 'gmp_record', 'in_region',  3650),   -- Annex 11 / GMP retention
    ('CN', 'gmp_record', 'in_region',  3650),   -- NMPA + data-localization (PIPL/DSL)
    ('JP', 'gmp_record', 'global_ok',  1825),   -- PMDA
    ('KR', 'gmp_record', 'in_region',  1825)    -- MFDS
    ON CONFLICT (region, data_class) DO UPDATE
        SET residency = EXCLUDED.residency, retention_days = EXCLUDED.retention_days;

The ON CONFLICT ... DO UPDATE (an upsert) makes the seed idempotent: re-running it updates a region's span rather than erroring, which is exactly what you want when a rule changes. Querying the table after running residency.sql (the command above) gives the policy the rest of the system obeys:

 region | data_class | residency | retention_days
--------+------------+-----------+----------------
 CN     | gmp_record | in_region |           3650
 EU     | gmp_record | in_region |           3650
 JP     | gmp_record | global_ok |           1825
 KR     | gmp_record | in_region |           1825
 US     | gmp_record | global_ok |           3650
(5 rows)

These spans are deliberately conservative round numbers (3650 days ≈ 10 years; 1825 ≈ 5 years) chosen to be defensible, not to litigate every clause — the point the chapter teaches is the mechanism, and the real numbers live in your validated SOP. What matters is that they are looked up, not assumed.

Residency by construction: row-level security

Retention is a clock. Residency is a wall — and a wall is only real if a session physically cannot step over it. PostgreSQL's row-level security gives us that wall inside the database: a policy attached to a table filters every query so a session (one authenticated database connection) sees only the rows a USING expression admits, with the database enforcing it on SELECT, INSERT, UPDATE, and DELETE rather than trusting the application to remember [9]. Here is the regulated-record table and its policy, from examples/chapters/23-multi-jurisdiction/residency.sql:

-- regulated records carry the region that owns them
CREATE TABLE IF NOT EXISTS gov.regulated_record (
    record_id  bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
    region     text NOT NULL,                 -- US | EU | CN | JP | KR
    batch_id   text,
    data_class text NOT NULL DEFAULT 'gmp_record',
    created_ts timestamptz NOT NULL DEFAULT now(),
    payload    jsonb NOT NULL DEFAULT '{}'
);

ALTER TABLE gov.regulated_record ENABLE ROW LEVEL SECURITY;
ALTER TABLE gov.regulated_record FORCE ROW LEVEL SECURITY;

DROP POLICY IF EXISTS region_isolation ON gov.regulated_record;
CREATE POLICY region_isolation ON gov.regulated_record
    USING (region = current_setting('app.region', true)
           OR current_setting('app.region', true) = 'GLOBAL');

Two design choices carry real weight. FORCE ROW LEVEL SECURITY makes the policy apply even to the table owner — without it, the user who owns the table is exempt, which would punch a hole straight through the wall. And the USING expression keys off current_setting('app.region', true), a per-session variable the application sets at connection time with SET app.region = 'EU'. The true argument means "return NULL, don't error, if it's unset" — so a connection that forgets to declare its region sees nothing, which is the safe default. The 'GLOBAL' escape hatch is for a privileged audit/QA role that legitimately needs to review across regions.

There is one honest subtlety the file's comments call out: a PostgreSQL superuser, or any role created with BYPASSRLS, ignores row security entirely. RLS only protects you if your application connects as a least-privilege role. So the demo creates exactly that — app_rls, a NOSUPERUSER NOBYPASSRLS role — and runs as it. Same honesty as the audit chapter: the control binds the application and the ordinary user; it does not defeat the platform's own administrator, and that gap is closed by operational segregation of duties, not SQL.

Anatomy of a regulated record: the fields the wall reads

Everything in this chapter — residency, retention, the join, the purge date — is read off of one row of gov.regulated_record. It is worth slowing down and naming each field, because the row is the central artifact: the region it carries is simultaneously the residency key the RLS wall filters on and the join key the retention clock follows. Here is the deliberately-ancient CN record (record_id 6, BATCH-2015-099) dissected field by field, alongside the gov.jurisdiction_policy row it matches and the purge_after date that join derives.

One row of gov.regulated_record: region is the residency key the wall reads, data_class plus region join to the matching gov.jurisdiction_policy row (in_region, 3650 days), and purge_after is the date the retention clock derives — here 2025-06-29, already in the past, which is why this row is flagged. Original diagram by the authors, created with AI assistance.

Read the row top to bottom. record_id is GENERATED ALWAYS AS IDENTITY — the database, never the application, mints it, so the primary key cannot be spoofed or reused. region is the load-bearing field: it is the value current_setting('app.region', true) is compared against, so a row's region is literally what decides whether a session can see it. batch_id is the human-facing tie to the GMP batch record; data_class defaults to gmp_record and, together with region, forms the two-part key into the policy table — which is why a country can rule batch records and personal data on different clocks. created_ts is the retention anchor (timestamptz, defaulting to now()), and payload is the jsonb body that carries whatever the record actually is. Not one of these fields is decorative: drop region and the wall has nothing to filter on; drop created_ts and the clock has nothing to count from.

Anatomy of the region_isolation policy: a wall in one USING clause

The wall that reads those fields is itself a single, dissectible object. region_isolation is a CREATE POLICY whose entire behavior lives in one USING expression and two ALTER TABLE flags — small enough to read in full, consequential enough that each clause maps to a real failure mode if you get it wrong.

The whole residency wall is one policy: ENABLE switches it on, FORCE extends it to the owner, the USING expression is checked on both reads and writes, GLOBAL is the audit hatch, and the rose panel is the honest boundary — RLS binds the application, not the DBA, the backups, or the replication stream. Original diagram by the authors, created with AI assistance.

The green block is the policy's heart. The same USING expression does double duty: on a SELECT it is the read filter (return only rows whose region matches the session), and under FORCE ROW LEVEL SECURITY it is also the implicit write check (an INSERT or UPDATE whose new row fails the expression is rejected). That single reuse is why residency holds on the way in as well as out — the SNEAKY-001 insert below fails for exactly this reason. The true argument to current_setting is the quiet safety: it means "unset returns NULL, do not raise," so a connection that forgets SET app.region matches no row rather than erroring open. And the rose panel states the boundary plainly: this is a policy the application and ordinary users cannot violate, but a BYPASSRLS role, a superuser, or an uncontrolled backup stream still can — which is the whole reason China forces a deployment answer, not just a SQL one.

One table, many regulators: row-level security filters each session to the region it is badged for (GLOBAL is the cross-region audit role), the retention clock sweeps for over-age records per region, and China's in_region band sits behind a hard residency boundary that software alone cannot relax. Original diagram by the authors, created with AI assistance.

The session lifecycle: badging in and being refused

The mechanism is only convincing if you can see it work, and the cleanest way to see it is to follow one session through its whole life: (1) become the least-privilege role, (2) badge in by declaring a region, (3) read — and see only your region, (4) switch regions and watch the view change, (5) try to smuggle a row across the wall — and be refused. After seeding one record per region (plus one deliberately ancient CN record, BATCH-2015-099, given an explicit created_ts of 2015-07-02 so its retention math is checkable), we connect as app_rls, declare a region, and look. An EU session:

SET ROLE app_rls;
SET app.region = 'EU';
SELECT record_id, region, batch_id FROM gov.regulated_record ORDER BY record_id;

 record_id | region |    batch_id
-----------+--------+----------------
         2 | EU     | BATCH-2026-002
(1 row)

One row — the EU's. The US, CN, JP, and KR rows are not "hidden by the UI"; they do not exist as far as this session's SQL is concerned. Switch the same role to a China session and the view changes completely:

SET app.region = 'CN';
SELECT record_id, region, batch_id FROM gov.regulated_record ORDER BY record_id;

 record_id | region |    batch_id
-----------+--------+----------------
         3 | CN     | BATCH-2026-003
         6 | CN     | BATCH-2015-099
(2 rows)

Now the real test — can a session smuggle data across the wall? A China-badged session tries to write an EU record:

SET app.region = 'CN';
INSERT INTO gov.regulated_record (region, batch_id) VALUES ('EU','SNEAKY-001');

ERROR:  new row violates row-level security policy for table "regulated_record"

The database refuses. Under FORCE ROW LEVEL SECURITY, the USING expression is applied as the implicit write check, so a session can only insert rows that match its own region — residency is enforced on the way in, not just on the way out. This is the difference between a policy you document and a policy the system cannot violate.

The retention clock

Residency decides where data lives; retention decides when it dies. Because the spans are data, the clock is just a join (matching each record to its region's policy row on a shared key, so one query reads both tables together) against gov.jurisdiction_policy. The chapter file defines it as a view, in examples/chapters/23-multi-jurisdiction/residency.sql:

-- find records past their region's retention window (the clock job acts on these)
CREATE OR REPLACE VIEW gov.v_retention_due AS
SELECT r.record_id, r.region, r.batch_id, r.data_class, r.created_ts, p.retention_days,
       (r.created_ts + (p.retention_days || ' days')::interval)::date AS purge_after
FROM gov.regulated_record r
JOIN gov.jurisdiction_policy p
  ON p.region = r.region AND p.data_class = r.data_class
WHERE now() > r.created_ts + (p.retention_days || ' days')::interval;

The view computes each record's purge_after date — its creation timestamp plus the region's retention_days rendered as an interval, then cast to a plain date so the output reads as a calendar day — and returns only the records whose date is already in the past. Querying it surfaces precisely the records a scheduled job (a cron-driven psql call, or a service) should act on:

 record_id | region |    batch_id    | retention_days | purge_after
-----------+--------+----------------+----------------+-------------
         6 | CN     | BATCH-2015-099 |           3650 | 2025-06-29
(1 row)

The 2015 CN record has aged past its 10-year window (created_ts 2015-07-02 + 3650 days = purge-after 2025-06-29, today being mid-2026) and is flagged; every other record is still within its region's span and is correctly left alone. Crucially, a view names the work but does not do it — and that is deliberate. GMP retention is "keep until," not "auto-delete the instant the clock strikes." A real deletion of a regulated record runs under change control with QA sign-off, and the act of deletion is itself an audited, hash-chained event (Chapter 23). The clock's job is to make the candidate set explicit and reviewable, not to silently destroy records.

At the volumes a high-rate historian produces, the efficient way to execute a purge is not row-by-row DELETE but PostgreSQL declarative range partitioning: partition the underlying time-series by time (and, where residency demands, by region), so retiring a window of expired data is a near-instant DETACH/DROP of a whole partition rather than a costly scan-and-delete [10]. The view tells you what is due; partitioning is how you drop it cheaply.

Why it matters

Get multi-jurisdiction wrong and the failure modes are expensive in opposite directions. Under-retain — shred an EU batch record at the US ten-year mark when the QP-certification clock demanded longer — and you have destroyed a GMP record an inspector can still ask for [2]. Over-share — let a China-resident record sync to a US cloud region, or hand it to a foreign authority during a routine request — and you have breached China's PIPL and DSL, which carry penalties an order of magnitude beyond a GMP observation (a deficiency an inspector writes up for the firm to correct) [5][6].

The architectural lesson is that the shared 80% and the divergent 20% want different treatment. The shared ALCOA+ baseline is best solved once, deeply, with the audit and signature machinery already built — there is no value in five copies. The divergent rules are best solved as data plus a wall: a policy table any component can read, and an RLS boundary the database enforces regardless of application bugs. Encoding the rules this way turns "we have a procedure for that" into "the system cannot do otherwise," which is the difference between a binder and a control.

In the real world

Here is the honest reckoning. Most of this chapter genuinely works in pure open source, and works well. PostgreSQL RLS is a mature, battle-tested control — the same mechanism multi-tenant SaaS products lean on — and using it for data residency is squarely within its design [9]. Policy-as-data plus a retention view is simple, auditable, and version-controlled. For the FDA, EU, PMDA, and MFDS — converged PIC/S members sharing an ALCOA+ baseline — one validated platform with per-region policy rows is a defensible architecture [3][7].

China is where "one platform" hits a wall that no SQL policy can climb. PIPL requires a security assessment, certification, or standard contract before personal information leaves China, plus separate consent — and, pointedly, bars handing over China-stored data to a foreign judicial or law-enforcement authority without PRC approval [5]. The DSL layers a graded "important data" regime with its own outbound-management controls on top [6]. RLS can stop a query from reading CN rows, but it cannot stop your backups, replication, and disaster-recovery — the routine machinery that copies the database to a second server or site so nothing is lost if the first one fails — from copying the bytes to another country, and that copy is the violation. The realistic answer is architectural: a separate in-China deployment of the stack (its own PostgreSQL, object store, and backups, all physically inside China), with only de-identified or aggregated, transfer-approved data crossing the border. The residency = 'in_region' flag is the trigger for that decision; the decision itself is a second cluster. Pure OSS does not make the requirement disappear — it just makes the boundary explicit and the same software portable to both sides.

When the wall is not enough: the named transfer regime behind in_region

It is fair to ask whether "separate in-China deployment" is an over-cautious reading or a real legal requirement. The honest answer is that China's outbound-transfer regime is concrete, named, and tiered — not a vibe. The Cyberspace Administration of China's Provisions on Promoting and Regulating Cross-Border Data Flows, effective 22 March 2024, set the thresholds that decide which of the three statutory mechanisms a given transfer must clear before any byte may leave the country [11]:

Volume of personal information transferred in a calendar year	Mechanism the transfer must clear
Under 100,000 individuals (non-sensitive)	exempt — no mechanism required
100,000 to 1,000,000 individuals (non-sensitive)	CAC Standard Contract or certification
Over 1,000,000 individuals, or any "important data"	mandatory CAC Security Assessment

The 2024 Provisions actually relaxed the earlier 2022/2023 measures — raising the exemption ceiling and shortening the cumulative-volume window from two years to one — yet even after that relaxation, two facts make a separate cluster the pragmatic default for a GMP platform. First, "important data" has no volume floor: a single record classified as important data triggers a full CAC Security Assessment regardless of count [11], and a multi-year batch history is exactly the kind of dataset a regulator may later designate. Second, the assessment is not a one-time form: it expires (now after three years) and must be renewed, and it sits on top of PIPL's separate-consent and no-foreign-handover rules [5]. A control whose cost is "pass and re-pass a state security assessment, indefinitely, for every cross-border sync" is one you design out of the hot path — which is precisely what residency = 'in_region' plus a second cluster does. The policy row is not the law; it is the flag that says this row is governed by that law, so the architecture routes around the transfer entirely.

This is also where the cross-border decision outgrows a single SQL USING clause. "May this record move from CN to a US analytics workspace?" depends on region, data class, consent, and transfer mechanism together — a policy richer than a row filter. The natural next step is to externalize that decision into a dedicated policy engine such as Open Policy Agent (OPA), reading the very same gov.jurisdiction_policy table so the database and the engine share one source of truth; OPA ships under Apache-2.0, so there is no license trap in adopting it. The schema is deliberately designed for this — the comment in 40-gov.sql notes the policy table is meant to be read by OPA "too" — but, to be clear, this chapter's code does not ship an OPA service or a Rego policy: there is no opa container in the companion compose.yaml and no .rego file in the repo. The table is the honest, present-tense foundation; the Rego policy that would consume it is left as an illustrative extension, not a running part of the stack.

The same policy, as a graph: residency as an RDF shape and a competency question

The relational wall is one way to enforce residency; it is worth seeing how the same artifact reads in the semantic stack this book builds in Semantics & the Digital Thread and Book 4 formalizes around the lifecycle, because the two views are complementary, not rival. A gov.regulated_record row is already a node-of-triples: bp:BATCH-2015-099 bp:region "CN", bp:BATCH-2015-099 a bp:RegulatedRecord, bp:BATCH-2015-099 bp:dataClass "gmp_record". The residency rule the RLS USING clause encodes is, in graph terms, a SHACL (Shapes Constraint Language) shape — the same closed-world gate Book 4 uses for the release decision: a sh:NodeShape targeting bp:RegulatedRecord that requires exactly one bp:region, drawn from the controlled set, and — for an in_region class — forbids a bp:storedAt location outside that region.

# illustrative residency shape — the RLS USING clause, expressed closed-world
bp:ResidencyShape a sh:NodeShape ;
    sh:targetClass bp:RegulatedRecord ;
    sh:property [ sh:path bp:region ;   sh:minCount 1 ; sh:maxCount 1 ;
                  sh:in ( "US" "EU" "CN" "JP" "KR" ) ] ;
    sh:property [ sh:path bp:storedAt ; sh:minCount 1 ;
                  sh:message "in_region record stored outside its jurisdiction" ] .

The retention clock has a graph twin too: the gov.v_retention_due join is a SPARQL competency question — exactly the requirements-as-tests discipline Book 4 runs over 23 questions — "which bp:RegulatedRecord individuals have a created_ts plus their region's retention_days already in the past?", an ASK/SELECT whose expected answer (here, the lone 2015 CN lot) is fixed so a model that stops flagging it fails visibly. And the audited deletion the view leads to is precisely a PROV-O event — prov:wasGeneratedBy ties the purge to a prov:Activity with a prov:Agent (the QA approver) and a prov:atTime, the W3C-standard provenance vocabulary turning "we deleted it under change control" into a queryable fact rather than a line in a binder. The region field doing triple duty — residency key, retention join key, and now an RDF property — is why the relational and graph views never disagree: they read the same attribute, one as a column, one as a predicate.

What residency does to a model: the training set has a border

Residency is not only a storage constraint; it silently redraws the boundary of every dataset a model may learn from, which is where this chapter touches the ML pipeline of Book 5. A model trained to predict a release CQA or recover titer from Raman spectra wants the largest, most representative training set it can get — but residency = 'in_region' means CN batch records legally cannot be pooled into a US or EU training corpus without clearing the CAC mechanisms above. The practical consequence is that the legal partition and the statistical partition must be honored together: the same leave-one-batch-out grouping that prevents leakage (never letting rows from one batch sit in both train and test) becomes a leave-one-region-out discipline when a model is meant to generalize across sites it was never allowed to co-train on. A model fit only on EU and US lots and then scored on a CN process is, in the language of drift and the applicability domain, extrapolating outside its applicability domain — the input region it was calibrated on — and the residency flag is, usefully, an a-priori marker of that boundary: a record carrying a region the model never saw is out-of-domain by construction, not just by a post-hoc distance check. Model lineage closes the loop — the region and data_class that govern a record must travel with any feature derived from it, so an audit can prove a deployed model was never trained on data it had no legal right to see, the same change-controlled, locked-then-relearn governance Book 5 applies to the model itself.

The second cluster is a tech transfer, not a copy-paste

One last honest note for the manufacturing reader: standing up the "separate in-China deployment" is not a deployment chore, it is a technology transfer in the GMP sense. A second physical site running the same stack must be independently qualified — its PostgreSQL, object store, and backups carried through IQ/OQ/PQ (Installation, Operational, and Performance Qualification — the documented proof that a system is installed right, operates to spec, and performs reproducibly in use) under GAMP 5 and the CSV-to-CSA model — before a single CN batch record may live on it, because a regulator inspects the site, not the source tree. The policy-as-data design pays off here precisely because the schema is portable: the identical 40-gov.sql and residency.sql initialize both clusters, so the China cluster differs from the global one only in its seeded region rows and its physical location, not in its code — which is exactly the property that keeps a two-site, multi-jurisdiction validation tractable rather than a fork to maintain.

Key terms

Jurisdiction / regulator — a legal authority (FDA, EU member states, NMPA, PMDA, MFDS) whose GMP and data rules a record must satisfy.
PIC/S — the Pharmaceutical Inspection Co-operation Scheme; its PI 041 guide harmonizes data-integrity expectations across participating inspectorates, which is why one ALCOA+ baseline serves many of them.
ALCOA+ — the data-integrity attributes (Attributable, Legible, Contemporaneous, Original, Accurate, plus Complete, Consistent, Enduring, Available) that define a trustworthy GMP record; the shared baseline PIC/S PI 041 turns into a common inspection language.
Policy-as-data — encoding regulatory rules (residency, retention) as rows in gov.jurisdiction_policy that the app and the retention clock read (and that an external policy engine such as OPA is designed to read), rather than hard-coding them.
Data residency — the requirement that certain data physically remain within a region; enforced here by row-level security and, for China, by a separate in-region deployment.
Row-level security (RLS) — a PostgreSQL feature where a CREATE POLICY USING expression filters every query by session context (app.region); FORCE ROW LEVEL SECURITY extends it to the table owner.
BYPASSRLS / superuser — roles that ignore RLS entirely; the reason the application must connect as a least-privilege role (app_rls) and segregation of duties is still required.
Retention period — how long a record must be kept (retention_days), differing per region; the EU's "1 yr past expiry or 5 yr past QP certification, whichever is longer" can exceed the US span.
Retention clock — the gov.v_retention_due view that surfaces records past their window for reviewed deletion, not automatic destruction.
PIPL / DSL — China's Personal Information Protection Law and Data Security Law; together they impose data localization and cross-border-transfer controls that force a separate in-China deployment.
Cross-border transfer — moving data between jurisdictions; permitted freely for some regions, gated by assessment/consent/approval for China.
gov.regulated_record — the chapter's central artifact: one row per region-owned GMP record, carrying region (the residency key), data_class, created_ts (the retention anchor), and a jsonb payload; its region + data_class join to the policy row that governs it.
USING expression / implicit write check — the single predicate in region_isolation that PostgreSQL applies both as the read filter and, under FORCE ROW LEVEL SECURITY, as the check on every INSERT/UPDATE, so residency is enforced on the way in as well as out.
CAC Security Assessment / Standard Contract — the tiered mechanisms (under the 2024 Cross-Border Data Flow Provisions) a transfer of Chinese personal information or "important data" must clear before leaving China; the cost that makes a separate in-China deployment the default.
SHACL residency shape — the closed-world graph twin of the RLS USING clause: a sh:NodeShape over bp:RegulatedRecord requiring exactly one in-set bp:region and forbidding an in_region record from being stored outside its jurisdiction, the same gate Book 4 uses for the release decision.
Competency question / PROV-O — the retention-clock join read as a SPARQL question with a fixed expected answer (requirements-as-tests), and the audited deletion modeled as a PROV-O prov:Activity with its prov:Agent and prov:atTime, so provenance is a queryable fact, not a binder line.
Leave-one-region-out / applicability domain — residency turns leave-one-batch-out cross-validation into a regional partition: a model never legally co-trained across sites is extrapolating outside its applicability domain when scored on a region it never saw, and the region flag marks that out-of-domain boundary a priori.
IQ/OQ/PQ (second-cluster qualification) — Installation, Operational, and Performance Qualification; the GAMP 5 / CSA evidence that the separate in-China deployment is a real GMP technology transfer to a qualified site, not a copy-paste, made tractable because the policy-as-data schema is portable.

Where this leads

Residency and retention assume the platform itself holds still — but in a working plant, recipes get revised, skids get swapped, and data formats evolve, and every one of those changes must happen without breaking the audit trail, the genealogy, or the region tagging we just built. In Chapter 27 — Managing Change: Process Changes, Equipment Swaps & Schema Evolution, we treat change as a first-class data problem: effective-dated recipe versions, swapping a bioreactor or instrument while preserving lot genealogy, and migrating a changed data format with reversible, verified migrations under change control — so the platform can move forward without losing a single link in the chain of trust.

What this chapter covers​

The landscape: converged, but not identical​

Policy-as-data: the jurisdiction table​

Residency by construction: row-level security​

Anatomy of a regulated record: the fields the wall reads​

Anatomy of the region_isolation policy: a wall in one USING clause​

The session lifecycle: badging in and being refused​

The retention clock​

Why it matters​

In the real world​

When the wall is not enough: the named transfer regime behind in_region​

The same policy, as a graph: residency as an RDF shape and a competency question​

What residency does to a model: the training set has a border​

The second cluster is a tech transfer, not a copy-paste​

Key terms​

Where this leads​