Contextualization: Joining Time-Series to the Batch

📍 Where we are: Part III · Storing & Connecting — Chapter 17. The historian now holds millions of readings; this chapter gives every reading a meaning by joining it to the batch, equipment, and phase it came from.

The simple version

A raw historian is a shoebox of receipts with the dates torn off. Each slip says BR101.Temp.PV = 37.00 °C at some instant — the measured (PV = process value) temperature on bioreactor BR101, true but mute. Contextualization staples each receipt to the right page of the right batch record: this reading happened during the Growth phase of BATCH-2026-001, running on bioreactor BR101, under recipe CHO-MAB-001 (a recipe is the batch-manufacturing procedure for the product — here a monoclonal antibody, MAB, made in Chinese-hamster-ovary, CHO, cells). Suddenly you can ask human questions — "what was dissolved oxygen during the production phase of last week's lot?" — and the database can answer.

What this chapter covers

In Chapter 16 we built a TimescaleDB historian that swallows every sensor tag. In Chapter 4 we modeled the ISA-88/95 batch and equipment world (the two industry standards defined just below: ISA-88 for batch control — recipes, operations, phases — and ISA-95 for enterprise-control integration — the equipment hierarchy) in PostgreSQL. They live side by side and, so far, ignore each other. This chapter marries them.

We will:

join the historian stream to ISA-88 phases and the ISA-95 equipment hierarchy with a single SQL view (a saved query that behaves like a virtual table — you query it as if it were a table, but it computes its rows on demand);
run batch-aware queries — "DO during the production phase of batch X" — that would be impossible against raw tags;
build a per-phase, per-tag summary that becomes the foundation of a golden-batch overlay — a reference trace built from known-good batches that new runs are compared against;
and push the phase boundaries from Postgres into a Grafana dashboard so an analyst sees the trend with the recipe drawn on top of it.

The two views at the heart of this chapter live in examples/platform/db/60-views.sql and are created when the database first initializes (make up, make load, and make seed are the companion repo's setup commands — they start the database, load the historian rows, and seed the batch model, respectively); they join the historian rows loaded by make load (from examples/platform/db/20-historian.sql) to the batch model seeded by make seed (from examples/platform/db/seed/seed_cho_line.sql). The later materialized-view and Grafana snippets are illustrative — they show where the model goes next and are built out in Chapter 18; they are labelled as such where they appear.

The two worlds we are joining

The historian table is deliberately dumb. Here is its definition, from examples/platform/db/20-historian.sql:

CREATE TABLE ts.sensor_reading (
    ts       timestamptz      NOT NULL,
    tag      text             NOT NULL,
    value    double precision,
    unit     text,
    quality  smallint         NOT NULL DEFAULT 192,  -- legacy OPC DA: 192 Good, 64 Uncertain, 0 Bad
    batch_id text
);

Six columns, no opinions. A typical slice looks like this:

ts                      | tag           |  value  | unit | quality | batch_id
------------------------+---------------+---------+------+---------+----------------
2026-01-13T08:00:00Z    | BR101.Temp.PV | 36.9993 | degC |     192 | BATCH-2026-001
2026-01-13T08:00:00Z    | BR101.DO.PV   | 36.4576 | %sat |     192 | BATCH-2026-001
2026-01-13T08:00:00Z    | BR101.pH.PV   |  6.9922 | pH   |     192 | BATCH-2026-001

(Those three tags are the temperature, the dissolved oxygen — DO, reported in %sat, percent of saturation — and the pH of the culture in BR101.) Notice the table already carries a batch_id. That is the single most important design decision of the whole capture layer — the collector stamps the active batch onto each reading as it writes (we set that up in Chapter 7). Without it, contextualization becomes a fragile guessing game of "which run was on BR101 at 08:00 that Tuesday?". With it, the join is exact.

But a batch_id is only half the story. It tells you which run; it does not tell you which step of that run. Was 08:00 still the inoculation/lag phase (the brief settling-in window right after the cells are transferred in, before they begin multiplying in earnest) just after the seed train (the expanding series of smaller cultures grown to seed BR101, from Book 1) was transferred into BR101, or had the culture crossed — by way of the Growth phase — into the production phase where titer (the accumulating concentration of antibody product, built up in Book 1's Production Bioreactor) accumulates? That answer lives in the relational world we built in Chapter 4.

The ISA-88 batch-control standard gives us the procedural vocabulary: a recipe is made of operations, each made of phases — the smallest meaningful procedural element [1]. The ISA-95 enterprise-control standard gives us the physical side: enterprise → site → area → unit, so every tag can be tied to the equipment it ran on and, through the batch, to its lot [2]. Both models ship as royalty-free, machine-readable XML schemas (B2MML/BatchML), which is what let us turn them into concrete relational tables rather than prose [3].

In examples/platform/db/seed/seed_cho_line.sql the fed-batch recipe (fed-batch is a culture mode where nutrient feeds are added during the run rather than all at the start — see Book 1's Production Bioreactor) is broken into four operations and five phases:

INSERT INTO s88.phase VALUES
    ('PH1', 'OP1', 1, 'Inoculate'),
    ('PH2', 'OP2', 1, 'Growth'),
    ('PH3', 'OP2', 2, 'Production'),
    ('PH4', 'OP3', 1, 'Harvest'),
    ('PH5', 'OP4', 1, 'Capture') ON CONFLICT DO NOTHING;

A recipe phase like Growth is an abstract template. The thing that actually anchors a trace in time is the batch phase — the record of when that phase really ran for one specific batch. In production these start_ts/end_ts boundaries are not typed by hand: the batch execution engine (the automation layer running the recipe on BR101) emits a phase-transition event each time it advances from Inoculate to Growth to Production, and that event timestamp is what lands in batch_phase — the same control layer that stamps batch_id onto each reading. Here we seed equivalent windows by hand so the join has something to bracket against. Those are the windows the join hinges on, also in the seed file:

-- phase windows for the golden batch (drives the contextualization view)
INSERT INTO s88.batch_phase (batch_id, phase_id, unit_id, start_ts, end_ts) VALUES
    ('BATCH-2026-001', 'PH1', 'BR101', '2026-01-05T00:00:00Z', '2026-01-05T12:00:00Z'),
    ('BATCH-2026-001', 'PH2', 'BR101', '2026-01-05T12:00:00Z', '2026-01-12T00:00:00Z'),
    ('BATCH-2026-001', 'PH3', 'BR101', '2026-01-12T00:00:00Z', '2026-01-18T00:00:00Z'),
    ('BATCH-2026-001', 'PH4', 'BR101', '2026-01-18T00:00:00Z', '2026-01-19T00:00:00Z')
    ON CONFLICT DO NOTHING;

Read those four rows as a timeline: a half-day inoculation, then a week of growth, then the long production phase where the cells make antibody, then harvest. (PH5 Capture runs later on the Protein A skid PA01 and has no bioreactor window, so it does not appear in this BR101 timeline.) Our 08:00-on-January-13 reading falls inside the third window, so it is a Production-phase reading. We just need SQL to figure that out automatically.

That PH5 Capture row is worth pausing on, because it is the join's first step downstream. The same batch_phase table that brackets the bioreactor's Inoculate-to-Harvest run also brackets the purification train that follows on different equipment: Protein A capture (PA01), then — in a full process model — the low-pH viral-inactivation hold, polishing chromatography, viral filtration, and UF/DF (ultrafiltration/diafiltration), the downstream steps Book 1 walks in capture chromatography onward. The contextualization view does not care that PA01 is a chromatography skid rather than a stirred tank — it brackets any phase window against any unit's tags, so the same one line of SQL that finds a Production-phase DO reading finds a Capture-phase UV280 reading on the chromatogram, or the conductivity and pH traces of the elution. The unit_id is what keeps the bioreactor's BR101.DO.PV and the skid's PA01.UV280.PV on their own equipment while a shared batch_id still threads them into one lot record — which is precisely how a query can later follow that lot from the culture that grew it to the column that purified it.

One manufacturing nuance the hand-seeded windows gloss over: in a real plant these start_ts/end_ts are not the only — or even the cleanest — source of phase boundaries. The batch execution engine emits a phase-transition event on every advance, but operators also pause, hold, and resume; a phase can be aborted and re-run; and a manual override can move a boundary the recipe did not predict. The contextualization layer should bind to the recorded batch_phase (what the MES/electronic-batch-record says actually happened), not to the recipe's planned durations, because the validated record of as-run phase times — not the design intent — is what a Continued Process Verification trend and a deviation investigation must align against. Our seeded windows stand in for that authoritative event log so the join has something to bracket against on a laptop.

A historian tag flowing left to right gains a batch_id stamp, then is matched against ISA-88 phase windows on a timeline and the ISA-95 equipment tree, emerging on the right as a fully contextualized row that names its batch, phase, unit, and recipe.

From mute tag to meaningful record: the contextualization join staples each historian reading to the phase window it falls inside and to the equipment/recipe it belongs to. Original diagram by the authors, created with AI assistance.

Anatomy of a contextualized reading: one v_batch_sensor row

Before we write the join, it is worth dissecting the thing the join produces — a single row of s88.v_batch_sensor. Take the chapter's running example: the temperature reading on BR101.Temp.PV at 2026-01-13T08:00:00Z. In the bare historian it is six columns. After the view runs it is eleven, and those eleven fall into four meaningful bands.

The raw band is the six historian columns carried through unchanged: ts, tag, value, unit, quality, and batch_id. These are exactly what Chapter 16 stored — the view adds nothing here, it just passes them along. The value of 36.9993 is meaningless without its unit of degC; the quality of 192 is the legacy OPC DA (OPC Data Access, an older industrial data-communication protocol) packed quality byte for Good — 192, 64, and 0 are bit-coded values defined by the OPC standard, not arbitrary numbers (192 Good / 64 Uncertain / 0 Bad), as Chapter 7 (Speaking OT: OPC UA, MQTT) established, distinct from the newer OPC UA (OPC Unified Architecture) 32-bit StatusCode where Good is 0x00000000), travelling beside the value so an uncertain point is never silently averaged in — keeping the contextualized record accurate and complete, the ALCOA+ (Attributable, Legible, Contemporaneous, Original, Accurate, plus Complete, Consistent, Enduring, and Available) data-integrity expectation regulators apply to every GMP (Good Manufacturing Practice) record — engineered in code in Chapter 23, ALCOA+ by Construction; and batch_id is the stamp the collector wrote at acquisition time (Chapter 7), the key that makes everything below possible.

The batch band is three columns the view joins in from s88.batch on that batch_id: product_id (MAB-001), recipe_id (CHO-MAB-001), and unit_id (BR101). This is the answer to which run, of what, under which recipe, on which equipment. (The s88.batch row also carries status — here released — and a lot of L26001; the view does not project — that is, does not include in its column list — those, but they are one column-list edit away if a consumer needs them.) The seed actually loads a six-batch campaign of this same CHO-MAB-001 recipe on BR101: the golden batch BATCH-2026-001 plus released siblings 002, 003, and 005, the rejected (out-of-specification) lot 004, and a complete 006 (execution finished, awaiting QA disposition). The rest of the chapter centers on 001, but those siblings are why a golden-batch comparison is possible at all.

The phase band is the two columns that did not exist anywhere until the join manufactured them: phase_id (PH3) and phase_name (Production). They are derived, not stored — and the next subsection is entirely about how. This is the field that turns the reading from "a number at a time" into "a Production-phase reading", and it is the single most valuable thing on the row, which is why the card highlights it.

The fourth piece is not a column but a decode: the tag BR101.Temp.PV is itself structured as asset . measurement . role — the ISA-95 unit, the measured quantity, and PV for process value (the evidence) as opposed to .SP, the recipe setpoint. The same grammar lets a query select every PV on BR101 or every temperature across the plant.

An identity card for one s88.v_batch_sensor row for BR101.Temp.PV at 2026-01-13T08:00:00Z, grouped into a raw band of six historian columns carried through, a batch band of three columns joined from s88.batch, a highlighted green phase band where phase_id PH3 and phase_name Production are derived by the temporal join, and a violet panel decoding the tag name into asset, measurement, and role. One contextualized row, field by field: six raw historian columns, three joined from the batch, and the two phase columns the temporal join derives on the fly. Original diagram by the authors, created with AI assistance.

Where this row comes from — the trilogy spine

This eleven-column row is the open-source implementation of an idea the other two books set up. The 36.9993 °C value was physically generated in the bioreactor of Book 1's Production Bioreactor chapter — the cell-culture step the sensor was watching. Book 2 then framed it as a data-point: the banded, contextualized reading that ISA-95 demands and the batch_id join across plant systems that stitches the historian to the batch record. What was a modeling aspiration there is, here, two SQL views you can run.

The join that does the work

Here is the heart of the chapter — the whole view, from examples/platform/db/60-views.sql:

-- A reading with its full batch + phase context.
CREATE OR REPLACE VIEW s88.v_batch_sensor AS
SELECT r.ts, r.tag, r.value, r.unit, r.quality, r.batch_id,
       b.product_id, b.recipe_id, b.unit_id,
       bp.phase_id, ph.name AS phase_name
FROM ts.sensor_reading r
JOIN s88.batch b              ON b.batch_id = r.batch_id
LEFT JOIN s88.batch_phase bp  ON bp.batch_id = r.batch_id
     AND r.ts >= bp.start_ts AND (bp.end_ts IS NULL OR r.ts < bp.end_ts)
LEFT JOIN s88.phase ph        ON ph.phase_id = bp.phase_id;

It is short, but every clause is load-bearing.

The first JOIN s88.batch is an inner join on batch_id: a reading with no matching batch is dropped. That is intentional — a sensor blipping while no batch is running is not part of any GMP record and should not silently appear in a batch-aware query.

The temporal-containment join: ts within a half-open interval

The clever part is the LEFT JOIN s88.batch_phase. The match condition is not an equality on a key; it is a temporal containment: r.ts >= bp.start_ts AND (bp.end_ts IS NULL OR r.ts < bp.end_ts). For each reading, Postgres finds the phase window whose start/end brackets the reading's timestamp. The half-open interval (>= start, < end) means a reading exactly on a phase boundary belongs to the new phase, never to both — no double-counting. The bp.end_ts IS NULL branch handles a phase that is still running live, with no end yet recorded. We keep it a LEFT join so a reading that arrives before any phase window is opened still surfaces (with a NULL phase) rather than vanishing.

Walk it for our running example. The four batch_phase rows for BATCH-2026-001 lay four windows on the timeline. Our reading's ts of 2026-01-13T08:00:00Z is past the start of PH3 (2026-01-12T00:00:00Z) and before its end (2026-01-18T00:00:00Z), so it matches exactly one window and inherits phase_name = Production. A reading at the instant 2026-01-12T00:00:00Z belongs to PH3 and not PH2, because PH2's condition uses < on its end — the boundary tips into the new phase. That single <-versus-<= choice is the difference between every boundary reading being counted once and being counted twice.

That single view is the contract every downstream consumer queries — dashboards, analytics notebooks, the knowledge-graph loader (Chapter 19, which folds these rows into an RDF graph of batch, equipment, material, and result). None of them ever touch the raw ts.sensor_reading table again. Now the human question becomes one line of SQL:

-- "Show me dissolved oxygen during the Production phase of the golden batch."
SELECT ts, value
FROM s88.v_batch_sensor
WHERE batch_id   = 'BATCH-2026-001'
  AND tag        = 'BR101.DO.PV'
  AND phase_name = 'Production'
ORDER BY ts;

ts                      |  value
------------------------+---------
2026-01-12T00:00:00Z    | 36.8646
2026-01-12T00:01:00Z    | 35.888
2026-01-12T00:02:00Z    | 36.5891
...
2026-01-17T23:59:00Z    | 34.0293

You could never write that against the bare historian. phase_name = 'Production' is knowledge the time-series table simply does not contain; the join manufactured it on the fly.

From readings to a golden-batch building block

A trace is useful; a summary by phase is where process understanding starts. The second view in examples/platform/db/60-views.sql rolls the contextualized readings up to one row per batch, phase, and tag:

-- Per-batch, per-phase, per-tag summary: the "golden batch" building block.
CREATE OR REPLACE VIEW s88.v_phase_summary AS
SELECT batch_id, phase_name, tag, unit,
       count(*)               AS n,
       round(avg(value)::numeric, 4) AS avg_value,
       round(min(value)::numeric, 4) AS min_value,
       round(max(value)::numeric, 4) AS max_value
FROM s88.v_batch_sensor
WHERE phase_name IS NOT NULL
GROUP BY batch_id, phase_name, tag, unit;

Querying it for temperature across the golden batch gives a compact process fingerprint:

batch_id        | phase_name | tag           | unit | n    | avg_value | min_value | max_value
----------------+------------+---------------+------+------+-----------+-----------+----------
BATCH-2026-001  | Growth     | BR101.Temp.PV | degC | 9360 |   37.0002 |   36.8724 |   37.1211
BATCH-2026-001  | Production | BR101.Temp.PV | degC | 8640 |   36.9893 |   36.3571 |   37.1218
BATCH-2026-001  | Harvest    | BR101.Temp.PV | degC | 1440 |   37.0008 |   36.9132 |   37.0981

Why the day-7 excursion only surfaces sliced by phase

That 36.36 °C minimum in Production is the deliberate day-7 excursion (a temporary deviation out of the normal range) that the companion repo's data simulator seeds into the fed-batch trace — and because the summary is sliced by phase, it shows up exactly where it happened instead of being averaged into invisibility across a 14-day run. During that excursion the DO and temperature probes also flag 64 Uncertain for those ~3 hours, so the same query that finds the temperature dip can exclude the Uncertain points instead of averaging them in — a concrete reminder of why the quality code travels beside the value.

Do the arithmetic on why the slicing matters. Across the whole batch the temperature mean is essentially 37.0 °C; a single dip to 36.36 °C lasting a few hours moves a 14-day average by thousandths of a degree — well inside any rounding you would ever report. Sliced by phase, the same dip is the minimum of the Production row, and the gap between the Production minimum (36.3571) and its mean (36.9893) is more than half a degree — a signal that is impossible to miss. The excursion did not get bigger; the denominator got honest. That is exactly the failure mode of monitoring on whole-batch aggregates: a real, localized deviation is diluted below the reporting threshold by the hours of normal operation around it, and an investigator scrolling batch-level numbers never sees it.

This per-phase reduction is not a convenience; it is the statistical prerequisite for monitoring batch processes properly. The seminal multiway-PCA work on batch monitoring (multiway principal component analysis — a statistical method that compresses many correlated process variables across time into a few summary trajectories, detailed in Book 5) built its reference trajectories — the "golden batch" — by aligning normal historical batches and comparing new runs against them [4]. And the reason we slice by ISA-88 phase first, rather than by wall-clock time, is that fed-batch trajectories only line up batch-to-batch once they are aligned on process phase and indicator variables; comparing minute-30 to minute-30 across two batches is meaningless if one was still inoculating and the other was already feeding [5]. v_phase_summary is the humble, queryable seed of all of that: every sibling batch reduced to the same phase-keyed shape, ready to stack.

The contextualized view is also the feature contract for a model

Before a v_phase_summary row is a golden-batch envelope, it is a feature — and that makes this view the layer where machine learning either gets honest data or gets fooled. Two things the join already enforces are exactly the discipline an ML pipeline needs and most often skips. First, the batch_id on every row is the grouping key a leak-free split must respect: because each fed-batch run is one genuinely independent observation (701 wavenumbers or 16 tags do not make 16 independent samples), a soft sensor must be validated under a batch-grouped / leave-one-batch-out cross-validation (training on whole batches and testing on held-out ones, never splitting rows of the same run across the train/test line), or the reported accuracy is an artifact of the model having already seen the batch it is scored on. Book 5's data chapter and models-and-validation chapter make this the cold-start rule of bioprocess ML; the batch_id here is the column that makes it enforceable. Second, the phase_name is the alignment key: an MSPC (multivariate statistical process control) monitor or a Raman soft sensor compares like-with-like only when the rows are phase-aligned, which is the same reason the golden-batch overlay slices by phase rather than wall-clock time.

The quality byte earns a second job here too. A row stamped 64 Uncertain during the day-7 excursion is a row a model should down-weight or gate out, not silently train on — the database-level analogue of an applicability-domain check (the gate that flags an input lying outside the model's training envelope before its prediction is trusted, built in Book 5's validation chapter). And the distinction this chapter draws between a contextualized reading and the relational model behind it is exactly the distinction Book 5's MLOps chapter draws between process drift (the living culture genuinely wandering batch-to-batch — what v_phase_summary is built to measure) and model drift (a soft sensor decaying as its probe fouls — what a residual control chart on the same phase-keyed trend is built to catch). The contextualized view is the shared substrate: the same phase-aligned, batch-grouped, quality-flagged rows that feed the golden-batch overlay are the rows that feed — and, crucially, the rows that validate — every model downstream of it.

Materializing context when the join gets expensive

v_batch_sensor is a plain view: the temporal join re-runs on every query. For the golden batch over a few tags that is instant. Across the seeded six-batch campaign, its sixteen tags, and a dashboard that auto-refreshes, the same containment join executes again and again.

Two OSS mechanisms fix this without changing the model.

For the raw-rate roll-ups, TimescaleDB continuous aggregates are materialized views over a hypertable that refresh incrementally as new data lands, so you never recompute the whole history [6]. We already created them in Chapter 16 (ts.sensor_1m, ts.sensor_1h). The historian stores one reading per tag per minute (60 × 24 × 14 = about 20,160 rows per tag over a 14-day run), so a phase-aware dashboard joins the pre-rolled 1-hour continuous-aggregate buckets (ts.sensor_1h) to the phase windows instead of scanning every one-minute reading.

For the context layer, plain PostgreSQL materialized views snapshot the join result to disk and serve it until you REFRESH [7]. Turning the summary into a materialization is a one-word change. The block below is an illustrative snippet — it is not committed to the repo and is not created by make seed; it shows the pattern you would add when the live join gets expensive:

-- Illustrative — not in the repo; shows how you would materialize v_phase_summary.
CREATE MATERIALIZED VIEW s88.mv_phase_summary AS
SELECT * FROM s88.v_phase_summary;
-- refresh after a batch completes (or on a schedule):
REFRESH MATERIALIZED VIEW s88.mv_phase_summary;

One honest architectural note. In this book the historian and the relational model live in the same PostgreSQL instance (TimescaleDB is a Postgres extension, not a separate database), so s88.batch and ts.sensor_reading join natively. In the real world your historian is often a different server — a separate Postgres, or a commercial system such as AVEVA PI. PostgreSQL solves the same-SQL-different-server case with postgres_fdw, a foreign-data wrapper that exposes a remote table as if it were local so one view can still join across the boundary [8]. We use the single-instance join here because it is what runs on a laptop; the FDW pattern is exactly how you would stretch this view across two servers, and the bridge chapters (17–19) take that idea all the way to PI and SAP.

Drawing the recipe on the trend: the Grafana overlay

A contextualized view earns its keep the moment a human looks at it. Grafana reads PostgreSQL/TimescaleDB natively — it needs a time column and gives you the $__timeFilter and time_bucket helpers to bucket data to the panel's width [9]. The two queries below are illustrative — they run inside Grafana, not via make seed, and the provisioned dashboard that wraps them is built in Chapter 18. The trend panel is just our view:

-- Illustrative Grafana panel query (runs in Grafana, not via make seed).
SELECT ts AS "time", value, tag
FROM s88.v_batch_sensor
WHERE batch_id = '$batch'
  AND tag IN ($tags)
  AND $__timeFilter(ts)
ORDER BY ts;

The magic is the second query, which turns phase windows into shaded annotation regions drawn behind the trend:

-- Illustrative Grafana annotation query (runs in Grafana, not via make seed).
SELECT bp.start_ts AS "time", bp.end_ts AS "timeEnd", ph.name AS text
FROM s88.batch_phase bp
JOIN s88.phase ph ON ph.phase_id = bp.phase_id
WHERE bp.batch_id = '$batch'
ORDER BY bp.start_ts;

Now the analyst does not read a wall of wiggling lines. They see dissolved oxygen with the Growth and Production bands painted underneath, and the day-7 temperature dip sitting visibly inside Production — the context is on the picture. To compare runs, you stack v_phase_summary for the three released siblings (002, 003, 005 — 004 is the rejected lot, so it is left out) as a faint envelope and draw the new batch on top: that is the golden-batch overlay, which Chapter 18 builds out.

Three relational sources — the raw ts.sensor_reading historian, the s88.batch_phase phase windows, and the s88.batch unit and recipe ISA-88/95 context — converge into the s88.v_batch_sensor temporal-join view, which fans out to the s88.v_phase_summary golden-batch fingerprint and a Grafana phase-band trend; the summary in turn feeds the golden-batch overlay built out in Chapter 18.

Why it matters

Contextualization is the hinge between data collection and process understanding. Uncontextualized, your historian can only answer "what was the number?" — a question no investigator, no statistician, and no inspector ever actually asks. Contextualized, it answers "what was the number, during which step, of which batch, on which equipment?" — which is every question that matters.

It is also the technical substrate of two regulatory expectations. Continued Process Verification — Stage 3 of the FDA's three-stage process-validation lifecycle (Stage 1 process design, Stage 2 process qualification, Stage 3 the ongoing verification that the validated process stays in control) — requires ongoing, documented assurance that the process stays in a state of control, batch after batch [10]. You cannot do CPV on raw tags; you do it on phase-aligned, batch-keyed trends — precisely what v_phase_summary produces. And ICH Q10 — the International Council for Harmonisation's pharmaceutical-quality-system guideline — makes process-performance and product-quality monitoring a standing objective of the pharmaceutical quality system, enabling review-by-exception and continual improvement [11]. A reviewer who can pull every batch's Production-phase DO in one reproducible query, and see only the batch that deviated, is doing review-by-exception — which is far faster and far less error-prone than scrolling paper.

In the real world

This pattern is everywhere, under many names. Commercial historians sell it as "asset framework" or "batch context" (AVEVA PI AF, for example, layers an equipment/event model over PI tags so you can query by asset and event rather than by point name). MES (Manufacturing Execution System) platforms call it the electronic batch record. What we built in two SQL views is the same idea, expressed in the open relational primitives every engineer already knows.

The OSS-vs-validated-asset-model honest seam

The honest OSS-vs-commercial reckoning: the join is genuinely solved in open source, and solved well. PostgreSQL's temporal joins, TimescaleDB's continuous aggregates (a free, source-available TSL feature — not OSI open source, as Chapter 16 noted), FDWs, and Grafana cover the mechanics completely and at no licensing cost. What pure OSS does not hand you is the surrounding management: a vendor-maintained, validated asset model with point-and-click event-frame configuration, change control on the context model itself, and the supplier accountability a GAMP-5 (Good Automated Manufacturing Practice) audit expects. With our views, you own the context model — which means you also own validating it, version-controlling the DDL (the Data Definition Language — the CREATE statements that define the schema; it lives in Git, which is a real advantage), and proving the join logic is correct under qualification.

Concretely, the seam runs right through the anatomy card. The raw band and the batch band are cheap to trust: they are values stored verbatim and a key equality. The phase band — the two derived columns — is where the validation burden lives, because it is computed by the half-open temporal-containment predicate, and a single boundary-condition bug (<= where you meant <, or a window with a missing end_ts) would silently mis-assign readings to the wrong phase and corrupt every golden-batch comparison built on top. A commercial asset framework asks you to trust the vendor's tested event-frame engine; the OSS path asks you to own a test that proves no reading lands in two phases and none falls through a gap. The DDL being three lines of inspectable SQL in Git is what makes that test writable at all — but writing it is on you. That is the recurring shape of this book: open source gets you a clean, inspectable ~80%; the validated-system wrapper around it is yours to build or buy.

The same context as a triple, a shape, and a competency question

The relational view is one expression of the contextualized reading; the semantic expression is the next chapter's, and it is worth seeing that the join above is already a small ontology in disguise. Each v_batch_sensor row asserts, in Chapter 19's RDF graph, a handful of triples — subject, predicate, object facts — about one reading: that it wasMeasuredOn an ISA-95 unit, duringPhase an ISA-88 phase, and belongsToBatch a lot. The batch band is an object property (an edge you can walk — BATCH-2026-001 ranOn BR101), exactly the object-vs-datatype-property fork Book 4 draws in its relations chapter; the raw value/unit is a datatype property (a typed literal you read, ideally carrying its unit as a machine-readable QUDT quantity rather than the bare string degC). What the relational batch_id foreign key does locally, a globally-unique IRI (Internationalized Resource Identifier) does across systems — the property that lets the LIMS, the MES, and the historian agree they mean the same batch, covered in Book 4's identifiers chapter.

The boundary-condition test the seam just demanded also has a formal twin. "No reading lands in two phases, and none falls through a gap" is a SHACL shape (the Shapes Constraint Language — a W3C standard for validating that graph data has the required structure), the closed-world gate Book 4 builds in its release-gate chapter: a sh:NodeShape over each contextualized reading with sh:maxCount 1 on its duringPhase edge catches the double-count, and a sh:minCount 1 catches the gap — the half-open-interval invariant expressed as a constraint a validator enforces, rather than as a hand-written SQL assertion you must remember to run. And the human question this chapter keeps answering — "DO during the Production phase of batch X" — is, in ontology-engineering terms, a competency question (a plain-English question the model must be able to answer, used as a pass/fail acceptance test). Book 4 turns 23 of them into runnable PASS/FAIL checks in its competency-questions chapter; the SQL above is the relational answer to one such question, and the knowledge-graph loader is where the same context becomes a query that walks lot genealogy with PROV-O-style provenance (the W3C vocabulary for what was derived from what, by which activity) all the way back to the cell bank. Relational, RDF, and SHACL are three renderings of one model — and keeping them from drifting apart is exactly the single-source-of-truth discipline (LinkML) the next chapter closes on.

Key terms

Contextualization — joining raw time-series readings to the batch, equipment, phase, and recipe they belong to, so the data becomes queryable as process knowledge.
Batch phase (batch_phase) — the record of when a specific ISA-88 phase actually ran for one batch, given as a start_ts/end_ts window; the timeline the historian join brackets against.
Temporal join — a join whose match condition is a time-containment test (ts >= start AND ts < end) rather than a key equality; here it assigns each reading to its phase.
Half-open interval — a window [start, end) that includes its start instant but excludes its end, so a reading exactly on a phase boundary is counted in the new phase and never in both; the >= start AND < end predicate that makes the temporal join unambiguous.
View — a saved query stored under a name that behaves like a virtual table: you SELECT from it as if it were a table, but it recomputes its rows from the underlying tables on each query. v_batch_sensor and v_phase_summary are views.
v_batch_sensor row — the contextualized reading: eleven columns in four bands — six raw historian fields carried through, three batch columns joined from s88.batch, and two phase columns (phase_id, phase_name) derived on the fly by the temporal join.
Asset framework / asset model — a vendor-maintained equipment-and-event model layered over historian tags (e.g. AVEVA PI AF) so data is queried by asset and event rather than point name; the validated, supplier-accountable counterpart to the two SQL views built here.
Golden batch — a reference trajectory built from normal historical batches that new runs are compared against; v_phase_summary is its phase-keyed building block.
Continuous aggregate — a TimescaleDB materialized view over a hypertable that refreshes incrementally as new data arrives, used for fast pre-rolled summaries.
Materialized view — a PostgreSQL view whose results are stored on disk and served until explicitly REFRESHed; used to cache the contextualization join.
postgres_fdw — PostgreSQL's foreign-data wrapper, which exposes a table on a remote server as if it were local so one view can join across servers.
Continued Process Verification (CPV) — Stage 3 of process validation: ongoing assurance the process stays in a state of control, done over contextualized, phase-aligned batch data.
Review-by-exception — a quality-review practice of inspecting only deviations rather than every value, enabled by contextualized, exception-flagging queries.
Batch-grouped cross-validation — leave-one-batch-out validation that keeps every row of a run on one side of the train/test line, because a fed-batch run is one independent observation; the batch_id is the grouping key that prevents a soft sensor from being scored on a batch it trained on.
Applicability domain — the region of inputs a model was trained on; a reading flagged 64 Uncertain or lying outside that envelope is gated out of the model's trusted range, the data-level analogue of the quality byte travelling beside the value.
Process drift vs. model drift — the living culture genuinely wandering batch-to-batch (what v_phase_summary is built to measure) versus a soft sensor decaying as its probe fouls (what a residual chart on the same phase-keyed trend is built to catch).
SHACL shape — a Shapes Constraint Language rule that validates the RDF rendering of a contextualized reading; sh:maxCount 1 on its phase edge formalizes the half-open invariant that no reading lands in two phases, and sh:minCount 1 that none falls through a gap.
Competency question — a plain-English question the data model must answer, used as a pass/fail acceptance test; "DO during the Production phase of batch X" is one, answered relationally here and as a graph query in Chapter 19.

Where this leads

We have given every reading a meaning and proven the join with real queries against the seeded golden batch. But SQL output is a wall of numbers, and process control is a visual discipline. In Chapter 18 — Visualization & Trending with Grafana, we take the v_batch_sensor trend and the batch_phase annotation query from this chapter and turn them into provisioned, dashboards-as-code: a batch-overlay dashboard where the recipe phases are painted behind the trend and yesterday's golden batch is drawn faintly behind today's run.

What this chapter covers​

The two worlds we are joining​

Anatomy of a contextualized reading: one v_batch_sensor row​

The join that does the work​

The temporal-containment join: ts within a half-open interval​

From readings to a golden-batch building block​

Why the day-7 excursion only surfaces sliced by phase​

The contextualized view is also the feature contract for a model​

Materializing context when the join gets expensive​

Drawing the recipe on the trend: the Grafana overlay​

Why it matters​

In the real world​

The OSS-vs-validated-asset-model honest seam​

The same context as a triple, a shape, and a competency question​

Key terms​

Where this leads​