Skip to main content

Contextualization: Joining Time-Series to the Batch

๐Ÿ“ Where we are: Part III ยท Storing & Connecting โ€” Chapter 14. The historian now holds millions of readings; this chapter gives every reading a meaning by joining it to the batch, equipment, and phase it came from.

The simple version

A raw historian is a shoebox of receipts with the dates torn off. Each slip says BR101.Temp.PV = 37.00 ยฐC at some instant โ€” true, but mute. Contextualization staples each receipt to the right page of the right batch record: this reading happened during the Growth phase of BATCH-2026-001, running on bioreactor BR101, under recipe CHO-MAB-001. Suddenly you can ask human questions โ€” "what was dissolved oxygen during the production phase of last week's lot?" โ€” and the database can answer.

What this chapter coversโ€‹

In Chapter 13 we built a TimescaleDB historian that swallows every sensor tag. In Chapter 3 we modeled the ISA-88/95 batch and equipment world in PostgreSQL. They live side by side and, so far, ignore each other. This chapter marries them.

We will:

  • join the historian stream to ISA-88 phases and the ISA-95 equipment hierarchy with a single SQL view;
  • run batch-aware queries โ€” "DO during the production phase of batch X" โ€” that would be impossible against raw tags;
  • build a per-phase, per-tag summary that becomes the foundation of a golden-batch overlay;
  • and push the phase boundaries from Postgres into a Grafana dashboard so an analyst sees the trend with the recipe drawn on top of it.

The two views at the heart of this chapter live in examples/platform/db/60-views.sql and are created by make seed, joining the historian table from examples/platform/db/20-historian.sql to the batch model seeded by examples/platform/db/seed/seed_cho_line.sql. The later materialized-view and Grafana snippets are illustrative โ€” they show where the model goes next and are built out in Chapter 15; they are labelled as such where they appear.

The two worlds we are joiningโ€‹

The historian table is deliberately dumb. Here is its definition, from examples/platform/db/20-historian.sql:

CREATE TABLE ts.sensor_reading (
ts timestamptz NOT NULL,
tag text NOT NULL,
value double precision,
unit text,
quality smallint NOT NULL DEFAULT 192, -- OPC UA: 192 Good, 64 Uncertain, 0 Bad
batch_id text
);

Six columns, no opinions. A typical slice looks like this:

ts | tag | value | unit | quality | batch_id
------------------------+---------------+---------+------+---------+----------------
2026-01-13T08:00:00Z | BR101.Temp.PV | 36.9993 | degC | 192 | BATCH-2026-001
2026-01-13T08:00:00Z | BR101.DO.PV | 36.4576 | %sat | 192 | BATCH-2026-001
2026-01-13T08:00:00Z | BR101.pH.PV | 6.9922 | pH | 192 | BATCH-2026-001

Notice the table already carries a batch_id. That is the single most important design decision of the whole capture layer โ€” the collector stamps the active batch onto each reading as it writes (we set that up in Chapter 5). Without it, contextualization becomes a fragile guessing game of "which run was on BR101 at 08:00 that Tuesday?". With it, the join is exact.

But a batch_id is only half the story. It tells you which run; it does not tell you which step of that run. Was 08:00 still seed-train inoculation, or had the culture crossed into the production phase where titer accumulates? That answer lives in the relational world we built in Chapter 3.

The ISA-88 batch-control standard gives us the procedural vocabulary: a recipe is made of operations, each made of phases โ€” the smallest meaningful procedural element [1]. The ISA-95 enterprise-control standard gives us the physical side: enterprise โ†’ site โ†’ area โ†’ unit, so every tag can be tied to the equipment it ran on and, through the batch, to its lot [2]. Both models ship as royalty-free, machine-readable XML schemas (B2MML/BatchML), which is what let us turn them into concrete relational tables rather than prose [3].

In examples/platform/db/seed/seed_cho_line.sql the fed-batch recipe is broken into four operations and five phases:

INSERT INTO s88.phase VALUES
('PH1', 'OP1', 1, 'Inoculate'),
('PH2', 'OP2', 1, 'Growth'),
('PH3', 'OP2', 2, 'Production'),
('PH4', 'OP3', 1, 'Harvest'),
('PH5', 'OP4', 1, 'Capture') ON CONFLICT DO NOTHING;

A recipe phase like Growth is an abstract template. The thing that actually anchors a trace in time is the batch phase โ€” the record of when that phase really ran for one specific batch. Those are the windows the join hinges on, also in the seed file:

-- phase windows for the golden batch (drives the contextualization view)
INSERT INTO s88.batch_phase (batch_id, phase_id, unit_id, start_ts, end_ts) VALUES
('BATCH-2026-001', 'PH1', 'BR101', '2026-01-05T00:00:00Z', '2026-01-05T12:00:00Z'),
('BATCH-2026-001', 'PH2', 'BR101', '2026-01-05T12:00:00Z', '2026-01-12T00:00:00Z'),
('BATCH-2026-001', 'PH3', 'BR101', '2026-01-12T00:00:00Z', '2026-01-18T00:00:00Z'),
('BATCH-2026-001', 'PH4', 'BR101', '2026-01-18T00:00:00Z', '2026-01-19T00:00:00Z')
ON CONFLICT DO NOTHING;

Read those four rows as a timeline: a half-day inoculation, then a week of growth, then the long production phase where the cells make antibody, then harvest. Our 08:00-on-January-13 reading falls inside the third window, so it is a Production-phase reading. We just need SQL to figure that out automatically.

A historian tag flowing left to right gains a batch_id stamp, then is matched against ISA-88 phase windows on a timeline and the ISA-95 equipment tree, emerging on the right as a fully contextualized row that names its batch, phase, unit, and recipe.

From mute tag to meaningful record: the contextualization join staples each historian reading to the phase window it falls inside and to the equipment/recipe it belongs to. Original diagram by the authors, created with AI assistance.

The join that does the workโ€‹

Here is the heart of the chapter โ€” the whole view, from examples/platform/db/60-views.sql:

-- A reading with its full batch + phase context.
CREATE OR REPLACE VIEW s88.v_batch_sensor AS
SELECT r.ts, r.tag, r.value, r.unit, r.quality, r.batch_id,
b.product_id, b.recipe_id, b.unit_id,
bp.phase_id, ph.name AS phase_name
FROM ts.sensor_reading r
JOIN s88.batch b ON b.batch_id = r.batch_id
LEFT JOIN s88.batch_phase bp ON bp.batch_id = r.batch_id
AND r.ts >= bp.start_ts AND (bp.end_ts IS NULL OR r.ts < bp.end_ts)
LEFT JOIN s88.phase ph ON ph.phase_id = bp.phase_id;

It is short, but every clause is load-bearing.

The first JOIN s88.batch is an inner join on batch_id: a reading with no matching batch is dropped. That is intentional โ€” a sensor blipping while no batch is running is not part of any GMP record and should not silently appear in a batch-aware query.

The clever part is the LEFT JOIN s88.batch_phase. The match condition is not an equality on a key; it is a temporal containment: r.ts >= bp.start_ts AND (bp.end_ts IS NULL OR r.ts < bp.end_ts). For each reading, Postgres finds the phase window whose start/end brackets the reading's timestamp. The half-open interval (>= start, < end) means a reading exactly on a phase boundary belongs to the new phase, never to both โ€” no double-counting. The bp.end_ts IS NULL branch handles a phase that is still running live, with no end yet recorded. We keep it a LEFT join so a reading that arrives before any phase window is opened still surfaces (with a NULL phase) rather than vanishing.

That single view is the contract every downstream consumer queries โ€” dashboards, analytics notebooks, the knowledge-graph loader. None of them ever touch the raw ts.sensor_reading table again. Now the human question becomes one line of SQL:

-- "Show me dissolved oxygen during the Production phase of the golden batch."
SELECT ts, value
FROM s88.v_batch_sensor
WHERE batch_id = 'BATCH-2026-001'
AND tag = 'BR101.DO.PV'
AND phase_name = 'Production'
ORDER BY ts;
ts | value
------------------------+---------
2026-01-12T00:00:00Z | 36.8646
2026-01-12T00:01:00Z | 35.888
2026-01-12T00:02:00Z | 36.5891
...
2026-01-17T23:59:00Z | 34.0293

You could never write that against the bare historian. phase_name = 'Production' is knowledge the time-series table simply does not contain; the join manufactured it on the fly.

From readings to a golden-batch building blockโ€‹

A trace is useful; a summary by phase is where process understanding starts. The second view in examples/platform/db/60-views.sql rolls the contextualized readings up to one row per batch, phase, and tag:

-- Per-batch, per-phase, per-tag summary: the "golden batch" building block.
CREATE OR REPLACE VIEW s88.v_phase_summary AS
SELECT batch_id, phase_name, tag, unit,
count(*) AS n,
round(avg(value)::numeric, 4) AS avg_value,
round(min(value)::numeric, 4) AS min_value,
round(max(value)::numeric, 4) AS max_value
FROM s88.v_batch_sensor
WHERE phase_name IS NOT NULL
GROUP BY batch_id, phase_name, tag, unit;

Querying it for temperature across the golden batch gives a compact process fingerprint:

batch_id | phase_name | tag | unit | n | avg_value | min_value | max_value
----------------+------------+---------------+------+------+-----------+-----------+----------
BATCH-2026-001 | Growth | BR101.Temp.PV | degC | 9360 | 37.0002 | 36.8724 | 37.1211
BATCH-2026-001 | Production | BR101.Temp.PV | degC | 8640 | 36.9893 | 36.3571 | 37.1218
BATCH-2026-001 | Harvest | BR101.Temp.PV | degC | 1440 | 37.0008 | 36.9132 | 37.0981

That 36.36 ยฐC minimum in Production is the deliberate day-7 excursion the simulator seeds into the fed-batch trace โ€” and because the summary is sliced by phase, it shows up exactly where it happened instead of being averaged into invisibility across a 14-day run.

This per-phase reduction is not a convenience; it is the statistical prerequisite for monitoring batch processes properly. The seminal multiway-PCA work on batch monitoring built its reference trajectories โ€” the "golden batch" โ€” by aligning normal historical batches and comparing new runs against them [4]. And the reason we slice by ISA-88 phase first, rather than by wall-clock time, is that fed-batch trajectories only line up batch-to-batch once they are aligned on process phase and indicator variables; comparing minute-30 to minute-30 across two batches is meaningless if one was still inoculating and the other was already feeding [5]. v_phase_summary is the humble, queryable seed of all of that: every sibling batch reduced to the same phase-keyed shape, ready to stack.

Materializing context when the join gets expensiveโ€‹

v_batch_sensor is a plain view: the temporal join re-runs on every query. For the golden batch over a few tags that is instant. Across a six-batch campaign, sixteen tags, and a dashboard that refreshes every ten seconds, the same containment join executes again and again.

Two OSS mechanisms fix this without changing the model.

For the raw-rate roll-ups, TimescaleDB continuous aggregates are materialized views over a hypertable that refresh incrementally as new data lands, so you never recompute the whole history [6]. We already created them in Chapter 13 (ts.sensor_1m, ts.sensor_1h). The historian stores the bioreactor's native 5-second acquisition downsampled to one row per minute (about 20,160 rows per tag over a 14-day run), so a phase-aware dashboard joins those pre-rolled buckets to the phase windows instead of scanning every one-minute reading.

For the context layer, plain PostgreSQL materialized views snapshot the join result to disk and serve it until you REFRESH [7]. Turning the summary into a materialization is a one-word change. The block below is an illustrative snippet โ€” it is not committed to the repo and is not created by make seed; it shows the pattern you would add when the live join gets expensive:

-- Illustrative โ€” not in the repo; shows how you would materialize v_phase_summary.
CREATE MATERIALIZED VIEW s88.mv_phase_summary AS
SELECT * FROM s88.v_phase_summary;
-- refresh after a batch completes (or on a schedule):
REFRESH MATERIALIZED VIEW s88.mv_phase_summary;

One honest architectural note. In this book the historian and the relational model live in the same PostgreSQL instance (TimescaleDB is a Postgres extension, not a separate database), so s88.batch and ts.sensor_reading join natively. In the real world your historian is often a different server โ€” a separate Postgres, or a commercial system such as AVEVA PI. PostgreSQL solves the same-SQL-different-server case with postgres_fdw, a foreign-data wrapper that exposes a remote table as if it were local so one view can still join across the boundary [8]. We use the single-instance join here because it is what runs on a laptop; the FDW pattern is exactly how you would stretch this view across two servers, and the bridge chapters (17โ€“19) take that idea all the way to PI and SAP.

Drawing the recipe on the trend: the Grafana overlayโ€‹

A contextualized view earns its keep the moment a human looks at it. Grafana reads PostgreSQL/TimescaleDB natively โ€” it needs a time column and gives you the $__timeFilter and time_bucket helpers to bucket data to the panel's width [9]. The two queries below are illustrative โ€” they run inside Grafana, not via make seed, and the provisioned dashboard that wraps them is built in Chapter 15. The trend panel is just our view:

-- Illustrative Grafana panel query (runs in Grafana, not via make seed).
SELECT ts AS "time", value, tag
FROM s88.v_batch_sensor
WHERE batch_id = '$batch'
AND tag IN ($tags)
AND $__timeFilter(ts)
ORDER BY ts;

The magic is the second query, which turns phase windows into shaded annotation regions drawn behind the trend:

-- Illustrative Grafana annotation query (runs in Grafana, not via make seed).
SELECT bp.start_ts AS "time", bp.end_ts AS "timeEnd", ph.name AS text
FROM s88.batch_phase bp
JOIN s88.phase ph ON ph.phase_id = bp.phase_id
WHERE bp.batch_id = '$batch'
ORDER BY bp.start_ts;

Now the analyst does not read a wall of wiggling lines. They see dissolved oxygen with the Growth and Production bands painted underneath, and the day-7 temperature dip sitting visibly inside Production โ€” the context is on the picture. To compare runs, you stack v_phase_summary for the five released siblings as a faint envelope and draw the new batch on top: that is the golden-batch overlay, which Chapter 15 builds out.

Why it mattersโ€‹

Contextualization is the hinge between data collection and process understanding. Uncontextualized, your historian can only answer "what was the number?" โ€” a question no investigator, no statistician, and no inspector ever actually asks. Contextualized, it answers "what was the number, during which step, of which batch, on which equipment?" โ€” which is every question that matters.

It is also the technical substrate of two regulatory expectations. Continued Process Verification โ€” Stage 3 of the FDA's process-validation lifecycle โ€” requires ongoing, documented assurance that the process stays in a state of control, batch after batch [10]. You cannot do CPV on raw tags; you do it on phase-aligned, batch-keyed trends โ€” precisely what v_phase_summary produces. And ICH Q10 makes process-performance and product-quality monitoring a standing objective of the pharmaceutical quality system, enabling review-by-exception and continual improvement [11]. A reviewer who can pull every batch's Production-phase DO in one reproducible query, and see only the batch that deviated, is doing review-by-exception โ€” which is far faster and far less error-prone than scrolling paper.

In the real worldโ€‹

This pattern is everywhere, under many names. Commercial historians sell it as "asset framework" or "batch context" (AVEVA PI AF, for example, layers an equipment/event model over PI tags so you can query by asset and event rather than by point name). MES platforms call it the electronic batch record. What we built in two SQL views is the same idea, expressed in the open relational primitives every engineer already knows.

The honest OSS-vs-commercial reckoning: the join is genuinely solved in open source, and solved well. PostgreSQL's temporal joins, TimescaleDB's continuous aggregates, FDWs, and Grafana cover the mechanics completely and at no licensing cost. What pure OSS does not hand you is the surrounding management: a vendor-maintained, validated asset model with point-and-click event-frame configuration, change control on the context model itself, and the supplier accountability a GAMP-5 audit expects. With our views, you own the context model โ€” which means you also own validating it, version-controlling the DDL (it lives in Git, which is a real advantage), and proving the join logic is correct under qualification. That is the recurring shape of this book: open source gets you a clean, inspectable ~80%; the validated-system wrapper around it is yours to build or buy.

A grounding note for U.S. readers: NIIMBL's SABRE facility (the NIIMBL / University of Delaware pilot-scale cGMP plant โ€” cGMP being current Good Manufacturing Practice โ€” that broke ground in April 2024) is exactly the kind of multi-vendor, sensor-rich line where contextualization is unavoidable: data from different skids, all needing to be tied back to one batch and one phase before anyone can reason about it. SABRE is a facility, not a data standard, but it is the physical setting this entire stack is built to serve.

Key termsโ€‹

  • Contextualization โ€” joining raw time-series readings to the batch, equipment, phase, and recipe they belong to, so the data becomes queryable as process knowledge.
  • Batch phase (batch_phase) โ€” the record of when a specific ISA-88 phase actually ran for one batch, given as a start_ts/end_ts window; the timeline the historian join brackets against.
  • Temporal join โ€” a join whose match condition is a time-containment test (ts >= start AND ts < end) rather than a key equality; here it assigns each reading to its phase.
  • Golden batch โ€” a reference trajectory built from normal historical batches that new runs are compared against; v_phase_summary is its phase-keyed building block.
  • Continuous aggregate โ€” a TimescaleDB materialized view over a hypertable that refreshes incrementally as new data arrives, used for fast pre-rolled summaries.
  • Materialized view โ€” a PostgreSQL view whose results are stored on disk and served until explicitly REFRESHed; used to cache the contextualization join.
  • postgres_fdw โ€” PostgreSQL's foreign-data wrapper, which exposes a table on a remote server as if it were local so one view can join across servers.
  • Continued Process Verification (CPV) โ€” Stage 3 of process validation: ongoing assurance the process stays in a state of control, done over contextualized, phase-aligned batch data.
  • Review-by-exception โ€” a quality-review practice of inspecting only deviations rather than every value, enabled by contextualized, exception-flagging queries.

Where this leadsโ€‹

We have given every reading a meaning and proven the join with real queries against the seeded golden batch. But SQL output is a wall of numbers, and process control is a visual discipline. In Chapter 15 โ€” Visualization & Trending with Grafana, we take the v_batch_sensor trend and the batch_phase annotation query from this chapter and turn them into provisioned, dashboards-as-code: a batch-overlay dashboard where the recipe phases are painted behind the trend and yesterday's golden batch is drawn faintly behind today's run.