Visualization & Trending with Grafana
๐ Where we are: Part III, the layer where the contextualized batch data we have been storing finally becomes a picture an operator watches and an engineer trusts.
Think of your historian and batch database as a huge, perfectly organized library of numbers. Grafana is the reading room. It does not own a single book; it walks to the right shelf, pulls the rows you asked for, and lays them out as a trend you can read at a glance. The important part is the one most people forget: every chart is just a saved question. Change the data and the chart changes. That is exactly why a Grafana trend can be evidence and a screenshot of it cannot โ the screenshot is a photograph of the reading room, not the book.
What this chapter coversโ
We already have data worth looking at: a 14-day fed-batch CHO trace in the TimescaleDB historian, an ISA-88/95 batch model in PostgreSQL, and contextualization views that join the two. This chapter stands up the open-source dashboard layer on top of all of it.
We will (1) bring Grafana up from the one pinned line in the shared compose.yaml; (2) provision its data source and dashboards as code instead of clicking them together; (3) build an operator view and a denser engineer view over the same contextualized data; (4) wire one alert rule; and (5) confront the two things that make Grafana a grown-up choice for a regulated plant โ its AGPLv3 license clause, and the hard rule that a trend is only evidence if it is reproducible from validated data. Grafana is the open-source analogue to a commercial process-visualization product like AVEVA PI Vision, and operator/engineer dashboards over contextualized, real-time data are central to actually running a monoclonal-antibody (mAb) line [1].
Grafana is already in the stackโ
You do not install Grafana in this book. It was declared once, in the shared platform stack, and it comes up with the core profile alongside Postgres, the broker, and the simulator. Here is the real service, from examples/platform/compose/compose.yaml:
# examples/platform/compose/compose.yaml
grafana:
image: grafana/grafana-oss:11.4.0
profiles: ["core"]
<<: *restart
ports: ["3000:3000"]
environment:
GF_SECURITY_ADMIN_PASSWORD: ${GRAFANA_PASSWORD:-admin}
GF_USERS_ALLOW_SIGN_UP: "false"
volumes:
- ../dashboards/provisioning:/etc/grafana/provisioning:ro
- grafana:/var/lib/grafana
depends_on:
postgres:
condition: service_healthy
Read that block slowly, because almost every line is a deliberate, GxP-flavored choice:
grafana/grafana-oss:11.4.0โ pinned by tag, with pin-by-digest as the intended next step (thecompose.yamlheader notes images are meant to be locked by digest in aversions.lock; that lock file is the planned mechanism, not yet committed in this repo). This is the open-source distribution, not Grafana Enterprise or Grafana Cloud. We use a fixed11.4.xso adocker compose pullcan never silently swap the version under a validated dashboard.profiles: ["core"]โ Grafana is always-on foundation. Per the compose header, thecoreprofile covers the store/visualize chapters (Ch 1โ4, 13โ15); the capture, semantics, and analytics chapters all read through it as well.GF_USERS_ALLOW_SIGN_UP: "false"โ anonymous self-registration is off. In a regulated context you want every action attributable to a named, provisioned identity, never a walk-up account.../dashboards/provisioning:/etc/grafana/provisioning:roโ the entire provisioning tree is mounted read-only. Grafana reads its data sources and dashboards from version-controlled files at startup; it cannot write back to them.depends_on: postgres โฆ service_healthyโ Grafana does not even start until the database that holds both the historian and the batch model passes its healthcheck. No dashboards pointing at a database that is not ready yet.
Bring it up and reach it on http://localhost:3000:
docker compose --profile core up -d
A note the book owes you about that tag. The spec narrative for the platform aims at a newer Grafana, but the tested compose file pins 11.4.0, so that is what the repo runs and what this chapter prints. The number matters less than the discipline: the goal of a digest-level lock (the versions.lock the compose header points toward) is that whatever it records is exactly what CI pulls, what the license inventory records, and what the supplier register validates. The running stack and its paperwork should not be allowed to drift apart.
Dashboards as code, not clicksโ
The fastest way to lose a regulated dashboard is to build it beautifully by hand in the browser and store it nowhere. Grafana's answer is provisioning: data sources and dashboards are defined in version-controlled YAML/JSON files loaded at startup rather than clicked together in the UI [2]. That mounted provisioning/ directory โ committed in the repo and mounted read-only by the compose service above โ has a fixed shape:
examples/platform/dashboards/provisioning/
โโโ datasources/
โ โโโ timescaledb.yaml # how Grafana reaches Postgres/TimescaleDB
โโโ dashboards/
โ โโโ dashboards.yaml # tells Grafana where to find dashboard JSON
โ โโโ json/
โ โโโ br101-batch-overlay.json # the engineer/operator dashboard
โโโ alerting/
โโโ batch-alerts.yaml # alert rules + contact points as code
The data source is one file, datasources/timescaledb.yaml (the historian and the batch model are the same PostgreSQL instance, so one data source serves both):
# examples/platform/dashboards/provisioning/datasources/timescaledb.yaml
apiVersion: 1
datasources:
- name: TimescaleDB
uid: timescaledb # stable uid so dashboard JSON references survive
type: postgres
url: postgres:5432 # the compose service hostname, not localhost
user: ${POSTGRES_USER}
jsonData:
database: ${POSTGRES_DB}
sslmode: disable # fine inside the compose network; TLS is Ch 25
timescaledb: true # enables Grafana's TimescaleDB-aware $__timeGroupAlias / time_bucket handling
secureJsonData:
password: ${POSTGRES_PASSWORD}
editable: false # provisioned, read-only: changes go through Git
Two details earn their keep. uid: timescaledb is a stable identifier: dashboard JSON references this uid, not an auto-generated one, so a dashboard exported on a laptop loads unchanged on a server. And editable: false means an operator cannot quietly repoint the data source at a different database โ the change has to go through the file, which goes through Git review.
A dashboard provider file simply points Grafana at a folder of dashboard JSON:
# examples/platform/dashboards/provisioning/dashboards/dashboards.yaml
apiVersion: 1
providers:
- name: bioproc-dashboards
type: file
allowUiUpdates: false # the UI cannot overwrite the file-of-record
options:
path: /etc/grafana/provisioning/dashboards/json
foldersFromFilesStructure: true
allowUiUpdates: false is the regulated posture: someone can explore and tweak a panel in the browser, but the file on disk โ the one in version control โ stays the record of truth. For real plants, Grafana's recommended path goes further: manage these files through CI/CD and Git Sync so every dashboard change is reviewed, versioned, and reproducibly deployed, exactly like application code [3].
A dashboard is a JSON document, and a panel inside it is a saved SQL query against our historian. Here is the heart of the committed dashboards/json/br101-batch-overlay.json panel โ the query that draws the bioreactor temperature trend, time-bucketed for smooth rendering:
{
"title": "BR101 Temperature (PV vs setpoint)",
"type": "timeseries",
"datasource": { "type": "postgres", "uid": "timescaledb" },
"fieldConfig": { "defaults": { "unit": "celsius" } },
"targets": [
{
"rawSql": "SELECT time_bucket('1 minute', ts) AS time, avg(value) AS \"Temp PV\" FROM ts.sensor_reading WHERE tag = 'BR101.Temp.PV' AND batch_id = '$batch' AND $__timeFilter(ts) GROUP BY 1 ORDER BY 1",
"format": "time_series"
}
]
}
The panel is nothing but a question: for the selected batch, bucket the temperature tag by minute and average it. $batch is a dashboard variable an operator picks from a dropdown; $__timeFilter(ts) is Grafana's macro that injects the visible time range. Point this at the same historian a week later and you get the same line, because the line is computed from the data, not pasted on top of it.
Two audiences, one set of dataโ
The same contextualized data serves two very different readers, and good practice is to give them two dashboards, not one crowded compromise.
The operator view is calm and decision-oriented: a handful of big stat panels (current titer, viable-cell density, DO, pH), the temperature trend against its setpoint, and an unmistakable red/green state for "in band / out of band." The query above already tags each value with its batch_id, so the operator's dropdown filters everything to the batch on the floor right now.
The engineer view is dense and investigative: every tag overlaid, the day-7 0.5 ยฐC excursion that the simulator deliberately seeds, the bolus-feed events from the batch model drawn as annotations, and Protein A chromatogram phases pulled from events.operation_event. Because both dashboards read the contextualized layer โ the historian joined to the ISA-88/95 batch model โ an engineer can overlay "titer" and "feed event" and see cause next to effect.
Here is the kind of row our panels actually plot, straight from the historian hypertable (long format, one row per tag per timestamp):
ts | tag | value | unit | quality | batch_id
-------------------+-----------------+-------+------+---------+---------------
2026-05-08 07:00 | BR101.Temp.PV | 37.01 | degC | 192 | BATCH-2026-001
2026-05-08 07:00 | BR101.pH.PV | 7.05 | pH | 192 | BATCH-2026-001
2026-05-08 07:00 | BR101.DO.PV | 48.2 | %sat | 192 | BATCH-2026-001
2026-05-08 07:00 | BR101.Titer.PV | 2.41 | g/L | 192 | BATCH-2026-001
2026-05-08 07:01 | BR101.DO.PV | 47.9 | %sat | 64 | BATCH-2026-001
That quality column is not decoration. It carries the OPC UA status code captured at the edge, and the historian DDL (examples/platform/db/20-historian.sql) pins the encoding in a comment: 192 = Good, 64 = Uncertain, 0 = Bad. Four of the rows above arrived Good (192); the last DO sample came back Uncertain (64), so a panel can grey out or flag any value that did not arrive Good โ the dashboard never quietly trends an unreliable point as if it were sound.
Grafana owns no data. Provisioned-as-code dashboards and alerts query the contextualized historian on demand, so every trend is a reproducible question rather than a stored picture.
Original diagram by the authors, created with AI assistance.
Alerting: the dashboard that pages youโ
A trend nobody is watching at 3 a.m. is not much use. Grafana's unified alerting evaluates rules across one or more data sources and routes notifications through contact points and notification policies โ the open-source analogue to PI Vision/Notifications [4]. Rules, like dashboards, are provisioned as code. The committed alerting/batch-alerts.yaml rule for the seeded day-7 temperature excursion looks like this:
# examples/platform/dashboards/provisioning/alerting/batch-alerts.yaml
apiVersion: 1
groups:
- orgId: 1
name: bioreactor
folder: BR101
interval: 1m
rules:
- title: BR101 temperature out of band
condition: C
data:
- refId: A # query the historian
datasourceUid: timescaledb
model:
rawSql: >
SELECT $__timeGroupAlias(ts,'1m'), avg(value) AS temp
FROM ts.sensor_reading
WHERE tag='BR101.Temp.PV' AND $__timeFilter(ts)
GROUP BY 1 ORDER BY 1
- refId: C # threshold: outside 36.5-37.5 degC
type: threshold
model:
conditions:
- evaluator: { type: outside_range, params: [36.5, 37.5] }
for: 5m # must persist 5 min to avoid flapping
labels: { severity: warning }
The for: 5m clause is the difference between a useful alert and an annoying one: a single noisy sample will not page anyone, but a genuine five-minute drift will. Note what the alert is not โ it is not a stored judgment. It is a query plus a threshold, both in Git, both re-evaluable. If a deviation investigation later asks "what fired and why," the answer is the rule definition and the historian rows it ran against, not a recollection.
Why it mattersโ
The deepest idea in this chapter is one line: a trend is evidence only if it is reproducible from validated data; a screenshot is not a record. Part 11 expects systems to generate accurate and complete copies of records and to keep time-stamped audit trails (specifically 11.10(b) and 11.10(e)) [5]. EU Annex 11 requires that printouts and trends from a computerised system be clear and that the data underlying any report be traceable back to the system [6]. FDA's data-integrity Q&A and the MHRA's ALCOA+ guidance both make the same point from the integrity side: GMP decisions must rest on attributable, original-or-certified-copy, reconstructable records, not on transient displays [7][8].
Grafana fits this beautifully if you use it correctly. Because every panel is a stored query against the historian โ not a baked-in image โ anyone with access can re-run it and regenerate the identical trend on demand. That is what "reproducible from data" means in practice. The failure mode is paste-a-PNG-into-a-report culture, where the picture outlives any link to the rows that produced it. Dashboards-as-code reinforces the good path: the exact query that drew a release trend is a versioned artifact you can diff, review, and reproduce.
In the real worldโ
Operator and engineer dashboards over contextualized, real-time process data are not a nice-to-have on a modern mAb line โ they are how the process is run and controlled, increasingly alongside soft sensors and multivariate analytics [1]. The commercial reference point most plants know is AVEVA PI Vision sitting on a PI System historian. Grafana is the credible open-source analogue, and at the pilot scale โ like NIIMBL's SABRE facility (the NIIMBL / University of Delaware pilot-scale cGMP โ current good manufacturing practice โ facility that broke ground in April 2024) โ an open dashboard layer over an open historian is an entirely reasonable build.
Now the honest part. Two limits separate "runs great on a laptop" from "carries a GMP release decision."
The AGPLv3 clause. In 2021 Grafana Labs relicensed the core of Grafana (and Loki and Tempo) from Apache 2.0 to AGPLv3 [9]. AGPLv3 adds Section 13, the remote network interaction clause: if you modify Grafana's source and then offer that modified version to users over a network, you must make your corresponding modified source available to those remote users [10]. For the overwhelming majority of plants this changes nothing: running the unmodified official image internally for your own operators imposes no source-disclosure obligation. The trap is narrower โ forking and patching Grafana itself and then exposing that fork as a service, or bundling-and-redistributing it. The book ships the unmodified grafana-oss image precisely to stay on the safe side, and flags AGPLv3 in the license inventory so the obligation is a documented decision, not a surprise during an audit.
The validation last mile. Grafana is not 21 CFR Part 11 / Annex 11 compliant out of the box โ no OSS tool is, and that is true of the whole stack. Compliance is a property of a validated system plus procedures, not a download. Grafana OSS gives you dashboards-as-code, provisioned identities, and queries that are reproducible from data โ genuinely the ~80% that pure open source can deliver here. The GxP last mile is where the hybrid reality bites: Grafana OSS has no native Part 11 e-signatures, and its richer access controls (fine-grained RBAC, SSO/SAML, reporting, data-source permissions) live in Grafana Enterprise / Cloud, not the OSS build. A regulated deployment closes the gap with the validated system around Grafana โ the identity provider, the audit trail in the database, change control over the provisioning repository, and an SOP that says release trends are regenerated from the historian, never trusted as saved images. The dashboard is open source; the trust comes from the system you build around it.
Key termsโ
- Dashboard-as-code โ defining Grafana data sources, dashboards, and alerts in version-controlled YAML/JSON files loaded at startup, rather than clicking them in the UI.
- Provisioning โ the Grafana mechanism that reads those files at boot from a mounted directory; here mounted read-only so the UI cannot overwrite the record of truth.
- Data source โ a configured connection (here
uid: timescaledb) Grafana queries on demand; Grafana stores none of the data itself. - Panel / target โ a single visualization and the SQL query that feeds it; the reason a trend is a reproducible question, not a stored picture.
- Contact point / notification policy โ where and how Grafana sends an alert when a rule fires.
- AGPLv3 Section 13 โ the network/remote-interaction clause requiring source disclosure when you offer a modified Grafana over a network.
- PI Vision โ AVEVA's commercial process-visualization product; Grafana is the open-source analogue this chapter builds.
- Reproducible-from-data โ the regulated rule that a trend used as evidence must be regenerable from validated records, which a screenshot is not.
Where this leadsโ
We can now see our data and trust the picture. But Grafana, the historian, and the batch model still speak slightly different dialects of "the same thing." The next chapter, Semantics & the Digital Thread: Ontologies and a Knowledge Graph, gives the whole plant one shared meaning โ modeling equipment, material, recipe, and result as an ontology and weaving them into an RDF/SPARQL knowledge graph, so a single critical quality attribute can be traced across upstream, downstream, and QC in one query.