Visualization & Trending with Grafana

📍 Where we are: Part III, the layer where the contextualized batch data we have been storing finally becomes a picture an operator watches and an engineer trusts.

The simple version

Think of your historian and batch database as a huge, perfectly organized library of numbers. Grafana is the reading room. It does not own a single book; it walks to the right shelf, pulls the rows you asked for, and lays them out as a trend you can read at a glance. The important part is the one most people forget: every chart is just a saved question. Change the data and the chart changes. That is exactly why a Grafana trend can be evidence and a screenshot of it cannot — the screenshot is a photograph of the reading room, not the book.

What this chapter covers

We already have data worth looking at: a 14-day fed-batch CHO trace in the TimescaleDB historian, an ISA-88/95 batch model in PostgreSQL, and contextualization views that join the two. In plain terms: the historian is the time-series database holding every sensor reading; the batch model is the table of what each batch was doing; the contextualization views join the raw numbers to that batch context. (CHO is the Chinese-hamster-ovary cell line that makes the antibody — see Books 1-2; ISA-88/95 are the standards that model a batch and a plant.) This chapter stands up the open-source dashboard layer on top of all of it.

We will (1) bring Grafana up from the one pinned line in the shared compose.yaml; (2) provision its data source and dashboards as code instead of clicking them together; (3) build an operator view and a denser engineer view over the same contextualized data; (4) wire one alert rule; and (5) confront the two things that make Grafana a grown-up choice for a regulated plant — its AGPLv3 license clause, and the hard rule that a trend is only evidence if it is reproducible from validated data. Grafana is, plainly, a tool that draws charts from a database — the open-source analogue to a commercial process-visualization product like AVEVA PI Vision, and operator/engineer dashboards over contextualized, real-time data are central to actually running a monoclonal-antibody (mAb) line [1]. ("Regulated plant" and "GxP" throughout this chapter mean a pharma facility bound by the Good-x-Practice rules — GMP, GLP, and the like — that govern how medicines are made and recorded.)

Grafana is already in the stack

You do not install Grafana in this book. It was declared once, in the shared platform stack, and it comes up with the core profile alongside Postgres and the MQTT message broker (Mosquitto — the message bus the capture layer publishes sensor readings to); the CHO simulator is a separate Python package run via make data, not a core service. Here is the real service, from examples/platform/compose/compose.yaml:

# examples/platform/compose/compose.yaml
  grafana:
    image: grafana/grafana-oss:11.4.0
    profiles: ["core"]
    <<: *restart
    ports: ["3000:3000"]
    environment:
      GF_SECURITY_ADMIN_PASSWORD: ${GRAFANA_PASSWORD:-admin}
      GF_USERS_ALLOW_SIGN_UP: "false"
    volumes:
      - ../dashboards/provisioning:/etc/grafana/provisioning:ro
      - grafana:/var/lib/grafana
    depends_on:
      postgres:
        condition: service_healthy

Read that block slowly, because almost every line is a deliberate, GxP-flavored choice:

grafana/grafana-oss:11.4.0 — pinned by tag, and the matching manifest digest is recorded in examples/platform/versions.lock (regenerated by make lock), where the grafana-oss:11.4.0 line carries its sha256: (a content fingerprint of the exact image bytes; a different image computes a different fingerprint, so a pull that returned anything else would be rejected) — so a docker compose pull can never silently swap the image under a validated dashboard. This is the open-source distribution, not Grafana Enterprise or Grafana Cloud, and the fixed 11.4.x is what the repo runs.
profiles: ["core"] — Grafana is always-on foundation. Per the compose header, the core profile covers the store/visualize chapters (Ch 1-2, 4-6, 16–18); the capture, semantics, and analytics chapters all read through it as well.
GF_USERS_ALLOW_SIGN_UP: "false" — anonymous self-registration is off. In a regulated context you want every action attributable to a named, provisioned identity, never a walk-up account.
../dashboards/provisioning:/etc/grafana/provisioning:ro — the entire provisioning tree is mounted read-only. Grafana reads its data sources and dashboards from version-controlled files at startup; it cannot write back to them.
depends_on: postgres … service_healthy — Grafana does not even start until the database that holds both the historian and the batch model passes its healthcheck. No dashboards pointing at a database that is not ready yet.

Bring it up and reach it on http://localhost:3000:

docker compose --profile core up -d

A note the book owes you about that tag. The spec narrative for the platform aims at a newer Grafana, but the tested compose file pins 11.4.0, so that is what the repo runs and what this chapter prints. The number matters less than the discipline: the point of the digest-level lock (the committed versions.lock, regenerated by make lock) is that whatever it records is exactly what CI pulls, what the license inventory records, and what the supplier register validates. The running stack and its paperwork should not be allowed to drift apart.

Dashboards as code, not clicks

The fastest way to lose a regulated dashboard is to build it beautifully by hand in the browser and store it nowhere. Grafana's answer is provisioning: data sources and dashboards are defined in version-controlled YAML/JSON files loaded at startup rather than clicked together in the UI [2].

The provisioning tree, file by file

That mounted provisioning/ directory — committed in the repo and mounted read-only by the compose service above — has a fixed shape:

examples/platform/dashboards/provisioning/
├── datasources/
│   └── timescaledb.yaml          # how Grafana reaches Postgres/TimescaleDB
├── dashboards/
│   ├── dashboards.yaml           # tells Grafana where to find dashboard JSON
│   └── json/
│       └── br101-batch-overlay.json   # the engineer/operator dashboard
└── alerting/
    └── batch-alerts.yaml          # alert rules + contact points as code

The data source is one file, datasources/timescaledb.yaml (the historian and the batch model are the same PostgreSQL instance, so one data source serves both):

# examples/platform/dashboards/provisioning/datasources/timescaledb.yaml
apiVersion: 1
datasources:
  - name: TimescaleDB
    uid: timescaledb            # stable uid so dashboard JSON references survive
    type: postgres
    url: postgres:5432          # the compose service hostname, not localhost
    user: ${POSTGRES_USER}
    jsonData:
      database: ${POSTGRES_DB}
      sslmode: disable          # fine inside the compose network; TLS is Ch 28
      timescaledb: true         # enables Grafana's TimescaleDB-aware $__timeGroupAlias / time_bucket handling
    secureJsonData:
      password: ${POSTGRES_PASSWORD}
    editable: false             # provisioned, read-only: changes go through Git

Two details earn their keep. uid: timescaledb is a stable identifier: dashboard JSON references this uid, not an auto-generated one, so a dashboard exported on a laptop loads unchanged on a server. And editable: false means an operator cannot quietly repoint the data source at a different database — the change has to go through the file, which goes through Git review.

A dashboard provider file simply points Grafana at a folder of dashboard JSON:

# examples/platform/dashboards/provisioning/dashboards/dashboards.yaml
apiVersion: 1
providers:
  - name: bioproc-dashboards
    type: file
    allowUiUpdates: false       # the UI cannot overwrite the file-of-record
    options:
      path: /etc/grafana/provisioning/dashboards/json
      foldersFromFilesStructure: true

allowUiUpdates: false is the regulated posture: someone can explore and tweak a panel in the browser, but the file on disk — the one in version control — stays the record of truth. For real plants, Grafana's recommended path goes further: manage these files through CI/CD and Git Sync so every dashboard change is reviewed, versioned, and reproducibly deployed, exactly like application code [3].

Anatomy of a Grafana panel: a saved question, field by field

A dashboard is a JSON document, and a panel inside it is a saved SQL query against our historian. The committed dashboards/json/br101-batch-overlay.json carries a uid, a title, tags, a templating list of dashboard variables, and a panels array. Here is the heart of that file — the panel that draws the bioreactor temperature trend (its measured process value, PV, versus the recipe setpoint), time-bucketed: the SQL averages each tag's readings into 1-minute groups so the chart renders fewer, smoother points instead of one dot per raw sample. Each tag follows the <asset>.<measurement>.<role> convention from Upstream Bioreactor — here BR101 is the bioreactor unit, Temp the measurement, and .PV the role (process value, the measured reading, as opposed to .SP, the setpoint):

{
  "id": 1,
  "title": "BR101 Temperature (PV vs setpoint)",
  "type": "timeseries",
  "datasource": { "type": "postgres", "uid": "timescaledb" },
  "fieldConfig": { "defaults": { "unit": "celsius" } },
  "gridPos": { "h": 9, "w": 24, "x": 0, "y": 0 },
  "targets": [
    {
      "refId": "A",
      "rawSql": "SELECT time_bucket('1 minute', ts) AS time, avg(value) AS \"Temp PV\" FROM ts.sensor_reading WHERE tag = 'BR101.Temp.PV' AND batch_id = '$batch' AND $__timeFilter(ts) GROUP BY 1 ORDER BY 1",
      "format": "time_series"
    }
  ]
}

Walk the panel field by field and the abstract claim — a panel is a saved question — becomes concrete. The type is timeseries; the datasource is named only by its stable uid (timescaledb), so the panel travels unchanged between machines; fieldConfig.defaults.unit is celsius, so the y-axis carries a real engineering unit rather than a bare number. The templating list holds the $batch variable — itself a query (SELECT DISTINCT batch_id FROM ts.sensor_reading …) — which is what populates the operator's dropdown. And the load-bearing part is targets[0]: a refId, a format of time_series, and a rawSql string. That string is the whole point.

Identity card dissecting one panel of br101-batch-overlay.json field by field: the dashboard uid, the panel title, tags, the timescaledb datasource, the celsius unit, the templating $batch query variable, and the highlighted target holding the rawSql with time_bucket, the $batch and timeFilter macros, refId A, and format time_series. Every field of a saved panel is metadata about a question; the only payload is the SQL in the target, which is re-run against live data each time the panel renders. Original diagram by the authors, created with AI assistance.

The panel is nothing but a question: for the selected batch, bucket the temperature tag by minute and average it. $batch is the dashboard variable an operator picks from a dropdown; $__timeFilter(ts) is Grafana's macro that injects the visible time range. (For a GxP posture, prefer Grafana's value-escaping ${batch:sqlstring} over the raw '$batch' interpolation, so the variable can never reshape the SQL — the blast radius is small here because $batch comes from a constrained SELECT DISTINCT, but escaping it is the disciplined default.) Point this at the same historian a week later and you get the same line, because the line is computed from the data, not pasted on top of it. Nothing in the JSON is a pixel of a trend — the answer is never stored, only the question is.

Flow diagram: three version-controlled Git files (datasources, dashboards, alerting) are mounted read-only into Grafana 11.4.0, which queries a PostgreSQL plus TimescaleDB historian over uid timescaledb and gets rows back, then renders an operator view, an engineer view, and an alert rule routed to a contact point.

Two audiences, one set of data

The same contextualized data serves two very different readers, and good practice is to give them two dashboards, not one crowded compromise.

The operator view vs the engineer view

The operator view is calm and decision-oriented: a handful of big stat panels over genuinely online tags — measured continuously by an in-line probe — (current titer (the concentration of antibody product, in g/L), DO (dissolved oxygen, the oxygen available to the cells), pH, online glucose; the .PV suffix on every tag is the process value — the live measured reading — as opposed to .SP, the setpoint the recipe aimed for), the temperature trend (viable-cell density is an offline bench result, measured intermittently in the QC lab, joined twice-daily from lab.result, not an online historian tag, so it belongs on the engineer view, not a live operator stat — the online/in-line vs offline distinction is built in Seed Train & Cell-Culture Offline Analytics, and the bioreactor tags themselves in Upstream Bioreactor), and an unmistakable red/green state for "in band / out of band." The query above already tags each value with its batch_id, so the operator's dropdown — the $batch variable from the panel anatomy above — filters everything to the batch on the floor right now. An operator should be able to glance and leave; the dashboard's job is to make "is this batch in control?" answerable in a second.

The engineer view is dense and investigative: every tag overlaid, the day-7 0.5 °C excursion that the simulator deliberately seeds, the bolus-feed events from the batch model drawn as annotations, and Protein A chromatogram phases (built in Downstream Chromatography) pulled from events.operation_event — for instance, overlaying the capture elution phase (a sharp UV280 peak as pH drops to ~3.3) against the upstream titer trend. Because both dashboards read the contextualized layer — the historian joined to the ISA-88/95 batch model — an engineer can overlay "titer" and "feed event" and see cause next to effect. The committed br101-batch-overlay.json carries the engineer tag for exactly this reader.

Two dashboards beat one because the two readers ask different questions at different tempos, and a single crowded board serves neither: the operator drowns in overlaid traces, and the engineer loses the cross-tag context to a layout tuned for at-a-glance status. The data underneath is identical — the same contextualized historian — so splitting the views costs nothing but a second JSON file under the same provisioned provider.

The quality column: greying out a sample that did not arrive Good

Here is the kind of row our panels actually plot, straight from the historian hypertable (long format, one row per tag per timestamp):

        ts         |       tag       | value | unit | quality |   batch_id
-------------------+-----------------+-------+------+---------+---------------
 2026-05-08 07:00  | BR101.Temp.PV   | 37.01 | degC |     192 | BATCH-2026-001
 2026-05-08 07:00  | BR101.pH.PV     |  7.05 | pH   |     192 | BATCH-2026-001
 2026-05-08 07:00  | BR101.DO.PV     | 39.4  | %sat |     192 | BATCH-2026-001
 2026-05-08 07:00  | BR101.Titer.PV  |  2.41 | g/L  |     192 | BATCH-2026-001
 2026-05-08 07:01  | BR101.DO.PV     | 38.7  | %sat |      64 | BATCH-2026-001

That quality column is not decoration. It carries the quality code captured at the edge, and the historian DDL — the Data Definition Language file that creates the table (examples/platform/db/20-historian.sql) — pins the encoding in a comment: 192 = Good, 64 = Uncertain, 0 = Bad. As built in Connectivity: OPC UA & MQTT, those 192/64/0 values are the legacy OPC DA (Classic) codes a Sparkplug bridge often passes through, not OPC UA-native quality — which is an entirely separate encoding scheme (there Good is a StatusCode of 0, with badness in the high bits). The same number 0 therefore means opposite things in the two schemes — Bad in the OPC DA codes this column uses, Good in OPC UA — which is exactly why the historian fixes one convention; only the OPC DA meaning (0 = Bad) applies here. Four of the rows above arrived Good (192); the last DO sample came back Uncertain (64), so a panel can grey out or flag any value that did not arrive Good — the dashboard never quietly trends an unreliable point as if it were sound. In the seeded dataset these Uncertain DO readings are not scattered at random: the simulator sets the flag only during the day-7 cooling excursion window, when the probe reads become untrustworthy during the upset — a teachable link between a quality flag and a real process event.

A layered diagram showing version-controlled YAML and JSON provisioning files mounted read-only into Grafana, Grafana issuing SQL against a combined PostgreSQL and TimescaleDB historian, and the same contextualized data rendered into a calm operator dashboard, a dense engineer dashboard, and an alert routed to a contact point. Grafana owns no data. Provisioned-as-code dashboards and alerts query the contextualized historian on demand, so every trend is a reproducible question rather than a stored picture. Original diagram by the authors, created with AI assistance.

Alerting: the dashboard that pages you

A trend nobody is watching at 3 a.m. is not much use. Grafana's unified alerting evaluates rules across one or more data sources and routes notifications through contact points and notification policies — the open-source analogue to PI Vision/Notifications [4]. Rules, like dashboards, are provisioned as code.

Anatomy of a provisioned alert rule: a query plus a threshold

The committed alerting/batch-alerts.yaml rule for the seeded day-7 temperature excursion looks like this:

# examples/platform/dashboards/provisioning/alerting/batch-alerts.yaml
apiVersion: 1
groups:
  - orgId: 1
    name: bioreactor
    folder: BR101
    interval: 1m
    rules:
      - title: BR101 temperature out of band
        condition: C
        data:
          - refId: A                 # query the historian
            datasourceUid: timescaledb
            model:
              rawSql: >
                SELECT $__timeGroupAlias(ts,'1m'), avg(value) AS temp
                FROM ts.sensor_reading
                WHERE tag='BR101.Temp.PV' AND $__timeFilter(ts)
                GROUP BY 1 ORDER BY 1
          - refId: C                 # threshold: outside 36.5-37.5 degC
            type: threshold
            model:
              conditions:
                - evaluator: { type: outside_range, params: [36.5, 37.5] }
        for: 5m                      # must persist 5 min to avoid flapping
        labels: { severity: warning }

The rule reads as two stages, and the field-by-field dissection makes the mechanism unmistakable. The group fixes the where and how often: orgId: 1, name: bioreactor, folder: BR101, interval: 1m. The rule itself sets condition: C, which names the data node that decides fire-or-not. Then data carries the two stages: refId: A is the historian query (a rawSql over ts.sensor_reading, time-grouped to 1-minute buckets), and refId: C is a threshold whose evaluator is outside_range with params: [36.5, 37.5]. A feeds C; C decides. The for: 5m clause is the anti-flap valve — it stops the rule from "flapping," firing and clearing repeatedly on momentary noise — and labels: { severity: warning } is the key the notification policy routes on.

Identity card dissecting one rule of batch-alerts.yaml field by field: apiVersion, orgId, group name bioreactor, folder BR101, interval 1m, condition C, then a two-stage data model where refId A queries the historian and feeds refId C, a threshold evaluator of outside_range with params 36.5 to 37.5, plus the for 5m anti-flap clause and the severity warning label. An alert rule is a historian query (refId A) wired into a threshold test (refId C); both halves live in Git, so a deviation investigation can re-evaluate exactly what fired and why. Original diagram by the authors, created with AI assistance.

The for: 5m clause is the difference between a useful alert and an annoying one: a single noisy sample will not page anyone, but a genuine five-minute drift will. One honest caveat about the demo: the seeded excursion is engineered to sit right at the lower bound (36.5 °C), so this rule is deliberately a near-boundary case that can flap; in a real plant the alarm band sits comfortably inside the control band (e.g. a low-only lt 36.8) so a half-degree dip clears it decisively. Note what the alert is not — it is not a stored judgment. It is a query plus a threshold, both in Git, both re-evaluable. If a deviation investigation later asks "what fired and why," the answer is the rule definition and the historian rows it ran against, not a recollection.

Why it matters

The deepest idea in this chapter is one line: a trend is evidence only if it is reproducible from validated data; a screenshot is not a record. Three regulators say the same thing in their own words — the US FDA (21 CFR Part 11 plus its data-integrity Q&A), the EU (Annex 11), and the UK MHRA (its GxP data-integrity guidance, a leading articulation of the ALCOA+ principles — that records be Attributable, Legible, Contemporaneous, Original and Accurate (the base ALCOA), plus Complete, Consistent, Enduring and Available (the "+"); the full nine are unpacked in The ALCOA+ Integrity Code). Part 11 expects systems to generate accurate and complete copies of records and to keep time-stamped audit trails (specifically 11.10(b) and 11.10(e)) [5]. EU Annex 11 requires that printouts and trends from a computerised system be clear and that the data underlying any report be traceable back to the system [6]. FDA's data-integrity Q&A and the MHRA's ALCOA+ guidance both make the same point from the integrity side: GMP decisions must rest on attributable, original-or-certified-copy, reconstructable records, not on transient displays [7][8].

Grafana fits this beautifully if you use it correctly. Because every panel is a stored query against the historian — not a baked-in image — anyone with access can re-run it and regenerate the identical trend on demand. That is what "reproducible from data" means in practice. The failure mode is paste-a-PNG-into-a-report culture, where the picture outlives any link to the rows that produced it. Dashboards-as-code reinforces the good path: the exact query that drew a release trend is a versioned artifact you can diff, review, and reproduce.

Why a screenshot fails an inspection

This is not a stylistic preference; it is where real data-integrity findings land. The MHRA's GXP data-integrity guidance is explicit that GMP decisions must rest on original records (or true, verifiable copies), and that a static printout or display of dynamic electronic data is not by itself an acceptable record where the underlying data and metadata can no longer be reconstructed [8]. The FDA's data-integrity Q&A makes the same point from the CGMP side: electronic records and their audit trails must be retained and reviewable, and a transient display or a pasted image is not a substitute for the attributable, original, reconstructable record behind it [7]. Translate that into the Grafana world and the rule is concrete: a PNG of a release trend dropped into a report is a non-reproducible printout — the rows that produced it, their quality flags, and the exact query are all severed from the picture. The remedy the architecture already gives you is the panel anatomy above: the trend is regenerated from ts.sensor_reading by a versioned query, every time, so the record is the data and the question, never the image.

The alert threshold is the crudest member of a family

The outside_range test in refId C is a fixed limit: 36.5–37.5 °C, the same band for every batch, every campaign, forever. That is exactly right for a regulated control limit — a CPP (critical process parameter — a process input whose setting bears on a quality outcome) band is a validated, fixed number, not a thing a model gets to renegotiate. But the same panel that draws the temperature trend is the natural carrier for the statistical limits the analytics chapter builds, and it is worth being precise about how they differ from this one. An SPC (statistical process control) chart replaces the hand-set band with control limits derived from the data — center ± 3·sigma, where the spread is estimated the moving-range way so a slow drift cannot widen the very limits meant to catch it; an MSPC (multivariate SPC) monitor replaces the single-tag test with a Hotelling's T² / SPE pair over many correlated tags at once. Both are still, structurally, a Grafana panel: a rawSql query (refId A) wired into a threshold (refId C). The whole machinery is built in Process Analytics: SPC, MVDA & Soft Sensors; the point here is only that a dashboard's threshold is the simplest rung of that ladder, not a different thing.

This matters because the same I-MR (individuals and moving-range) control chart that watches a process is exactly the instrument that watches a model. A soft sensor's prediction residual — predicted titer minus the slow offline assay — is itself a stream to be controlled, and a residual drifting off-center is the model going stale, the lagging "concept-drift" detector of MLOps and Lifecycle. Two cautions travel with that idea, and both land on this dashboard. First, a soft-sensor trend is only trustworthy inside its applicability domain — the region of inputs the model was calibrated on; a Raman calibration is bound to its exact probe, so the day-7 cooling excursion (when the quality flag flips to Uncertain) is precisely when a model trained on in-band data is extrapolating and least reliable, which is why a derived trend should inherit the quality flag of the raw rows beneath it rather than draw a confident line over them. Second, the leading, label-free companion detector — a Population Stability Index on the input distribution — fires the moment a new lot or a fouling probe moves the inputs, before the offline assay can confirm any error; a dashboard that overlays PSI next to the residual chart reads process drift and model drift on one screen. The deeper modeling, and the validation paradox of holding a learning model under GMP change control, is the ML & AI book's subject; the dashboard layer is where its outputs become something an engineer actually watches.

In the real world

Operator and engineer dashboards over contextualized, real-time process data are not a nice-to-have on a modern mAb line — they are how the process is run and controlled, increasingly alongside soft sensors and multivariate analytics (built end-to-end in Process Analytics: SPC, MVDA & Soft Sensors, with the deeper modeling in the ML & AI book) [1]. The commercial reference point most plants know is AVEVA PI Vision sitting on a PI System historian. For the heavier investigative analytics — golden-batch overlays, multi-batch comparison, root-cause trending — the tool many process engineers reach for is Seeq, with general-purpose business-intelligence suites like TIBCO Spotfire and Microsoft Power BI covering the reporting layer above it. Grafana is the credible open-source analogue for the operator and engineer dashboards and the alerting we build here; it does not try to match Seeq's batch-analytics depth, and on a real line you will often see Grafana running alongside one of those commercial layers rather than instead of it. At the pilot scale, an open dashboard layer over an open historian is an entirely reasonable build.

Now the honest part. Two limits separate "runs great on a laptop" from "carries a GMP release decision."

The AGPLv3 clause. In 2021 Grafana Labs relicensed the core of Grafana (and Loki and Tempo) from Apache 2.0 to AGPLv3 [9]. AGPLv3 adds Section 13, the remote network interaction clause: if you modify Grafana's source and then offer that modified version to users over a network, you must make your corresponding modified source available to those remote users [10]. For the overwhelming majority of plants this changes nothing: running the unmodified official image internally for your own operators imposes no source-disclosure obligation. The trap is narrower — forking and patching Grafana itself and then exposing that fork as a service, or bundling-and-redistributing it. The book ships the unmodified grafana-oss image precisely to stay on the safe side, and flags AGPLv3 in the license inventory so the obligation is a documented decision, not a surprise during an audit.

The validation last mile. Grafana is not 21 CFR Part 11 / Annex 11 compliant out of the box — no OSS tool is, and that is true of the whole stack. Compliance is a property of a validated system plus procedures, not a download. Grafana OSS gives you dashboards-as-code, provisioned identities, and queries that are reproducible from data — genuinely the ~80% that pure open source can deliver here. The GxP last mile is where the hybrid reality bites: Grafana OSS has no native Part 11 e-signatures, and its richer access controls (fine-grained RBAC — role-based access control — plus single-sign-on via SSO/SAML, reporting, and data-source permissions) live in Grafana Enterprise / Cloud, not the OSS build. A regulated deployment closes the gap with the validated system around Grafana — the identity provider, the audit trail in the database, change control over the provisioning repository, and a standard operating procedure (SOP) that says release trends are regenerated from the historian, never trusted as saved images. The dashboard is open source; the trust comes from the system you build around it.

Key terms

Dashboard-as-code — defining Grafana data sources, dashboards, and alerts in version-controlled YAML/JSON files loaded at startup, rather than clicking them in the UI.
Provisioning — the Grafana mechanism that reads those files at boot from a mounted directory; here mounted read-only so the UI cannot overwrite the record of truth.
Data source — a configured connection (here uid: timescaledb) Grafana queries on demand; Grafana stores none of the data itself.
Panel / target — a single visualization and the SQL query that feeds it; the reason a trend is a reproducible question, not a stored picture.
Templating variable ($batch) — a dashboard variable defined by its own query, surfaced as a dropdown and substituted into every panel's SQL so one dashboard serves any batch.
Threshold expression (refId C) — the second stage of an alert rule: a threshold node whose evaluator (here outside_range, params [36.5, 37.5]) tests the rows returned by the query node (refId A) and decides fire or no-fire.
Contact point / notification policy — where and how Grafana sends an alert when a rule fires.
AGPLv3 Section 13 — the network/remote-interaction clause requiring source disclosure when you offer a modified Grafana over a network.
PI Vision — AVEVA's commercial process-visualization product; Grafana is the open-source analogue this chapter builds.
Reproducible-from-data — the regulated rule that a trend used as evidence must be regenerable from validated records, which a screenshot is not.
Applicability domain — the region of inputs a model was calibrated on; a soft-sensor trend is trustworthy only inside it, so an excursion that flips the quality flag is also where a derived line is least reliable.
Drift detector (PSI / residual chart) — the leading, label-free input-distribution monitor (Population Stability Index) and the lagging residual control chart that, read together, separate model drift from process drift on the same dashboard.
SHACL gate — the closed-world graph constraint (sh:minInclusive/sh:maxInclusive) that is the semantic twin of an outside_range alert: it guards a record's completeness and conformance rather than a live signal's behavior.

The same trend, said semantically

This page leans hard on stable identifiers — the timescaledb data-source uid, the BR101.Temp.PV tag, the BATCH-2026-001 batch id — without ever calling them what the next chapter will: the start of a graph. It is worth closing the loop, because the panel and the alert each have an exact semantic twin. The tag string BR101.Temp.PV is a flat key; the same fact stated as an RDF triple (subject–predicate–object, the atom of a knowledge graph) becomes bp:BR101 bp:hasMeasurement bp:BR101.Temp.PV, and a single reading becomes bp:reading-1 bp:value 37.01 ; bp:hasQuality bp:Good ; bp:ofBatch bp:BATCH-2026-001 — where bp:Good is now a typed thing the graph reasons over, not the integer 192 whose meaning lives only in a DDL comment. That move from "192 in a column" to a named quality class is the difference between a value and a vocabulary, and it is exactly the identifiers-and-units and classes-and-taxonomy work the ontology book does; the historian-to-graph load itself is the runnable subject of this book's own Semantics & the Digital Thread chapter.

Two of this page's mechanisms map especially cleanly. The operator's $batch dropdown — list the batches the historian knows about — is a competency question (a question the data model must be able to answer), and its SQL SELECT DISTINCT batch_id is the relational sibling of a SPARQL SELECT ?batch WHERE { ?r bp:ofBatch ?batch }. And the alert rule is, almost literally, a constraint: "every released lot's temperature record must sit within its band" is a closed-world gate, the job of SHACL (the Shapes Constraint Language — it validates that graph data has the required structure, where a missing or out-of-band result is a failure now, not an open question). A SHACL sh:property with an sh:minInclusive 36.5 ; sh:maxInclusive 37.5 says in the graph exactly what outside_range params: [36.5, 37.5] says in Grafana — the difference is that SHACL guards the record's completeness and conformance at the genealogy seam, while the Grafana rule guards the live signal's behavior. The release-gate version of that idea, validated against the running campaign, is The Release Gate and SHACL; the lineage edges (derivedFrom) that let one query walk batch → pool → substance are relations-and-genealogy. The dashboard is the picture; the graph is the meaning behind every label on it.

Where this leads

We can now see our data and trust the picture. But Grafana, the historian, and the batch model still speak slightly different dialects of "the same thing." The next chapter, Semantics & the Digital Thread: Ontologies and a Knowledge Graph, gives the whole plant one shared meaning — modeling equipment, material, recipe, and result as an ontology and weaving them into an RDF/SPARQL knowledge graph, so a single critical quality attribute can be traced across upstream, downstream, and QC in one query.

What this chapter covers​

Grafana is already in the stack​

Dashboards as code, not clicks​

The provisioning tree, file by file​

Anatomy of a Grafana panel: a saved question, field by field​

Two audiences, one set of data​

The operator view vs the engineer view​

The quality column: greying out a sample that did not arrive Good​

Alerting: the dashboard that pages you​

Anatomy of a provisioned alert rule: a query plus a threshold​

Why it matters​

Why a screenshot fails an inspection​

The alert threshold is the crudest member of a family​

In the real world​

Key terms​

The same trend, said semantically​

Where this leads​