The Edge Gateway: Routing Floor Data with Node-RED, Telegraf & NiFi
π Where we are: Part II, "Capturing the Process." The signal already leaves the bioreactor as OPC UA and lands on the MQTT broker (previous chapter). Now we build the gateway that collects, transforms, and routes it onward β and we decide which of three open-source tools does the job.
Think of the edge gateway as the mailroom of the factory. Sensors all over the floor keep dropping envelopes (measurements). The mailroom sorts them, re-labels anything that arrived in a weird format, decides which ones go to which department, and β crucially β keeps a logbook of every envelope it touched. A sloppy mailroom loses mail. A regulated mailroom can tell you, months later, exactly which envelope it received, when, where it came from, and where it sent it. This chapter builds three kinds of mailroom and picks the right one for each job.
The edge gateway sits on the fault line of the whole platform: the seam between OT (operational technology β the controllers, skids, and sensors running the process) and IT (the databases, dashboards, and analytics that make sense of it). On the OT side, data speaks OPC UA and Modbus and lives on isolated control networks; on the IT side it speaks MQTT, SQL, and HTTP. Something has to stand in the middle, translate, buffer, and route β without ever touching the validated control loop. That something is the gateway, and getting it right is the difference between data you can submit to a regulator and data you merely hope is complete.
What this chapter coversβ
- Why the OT/IT bridge exists and what a gateway must do at that seam.
- The three open-source tools we ship β Node-RED (low-code flows), Telegraf (declarative collection), and Apache NiFi (guaranteed delivery + replayable provenance) β and what each is genuinely good at.
- How they route data from the broker into the historian (
ts.sensor_reading) and the batch model, with the real long-format rows you'll see. - The honest part: delivery semantics (at-least-once vs. exactly-once), the audit-trail gap, and where the OSS gateway hands off to the validated record-of-truth.
The seam: what a gateway actually doesβ
A gateway is not a database and not a dashboard. Its job, distilled, is a southboundβtransformβnorthbound pipeline. Southbound, it speaks the field protocols β it subscribes to the OPC UA address space the bioreactor publishes [1] and, for older skids, polls Modbus registers. In the middle it normalizes: a raw register value of 3725 becomes 37.25 Β°C, a vendor's tag becomes our canonical BR101.Temp.PV, and a missing unit is filled in. Northbound, it routes the cleaned record to wherever it needs to live β onto the MQTT broker as a Unified Namespace topic, or straight into TimescaleDB.
A peer-reviewed reference design for exactly this shape β a modular edge gateway with a southbound protocol-translation layer for Modbus/MQTT/OPC UA and a cache that decouples acquisition from transmission so a network hiccup never drops a sample β was published in 2026 [2]. That decoupling is the whole game. The floor never stops producing data; the network sometimes stops carrying it. A gateway that buffers locally and forwards when the link returns is the only kind that preserves a complete record. A second analysis of an OPC UA gateway on embedded hardware quantifies the other tension: OPC UA is a rich, heavy, self-describing stack, while MQTT is a light pub/sub transport, so a gateway routinely reads heavy OPC UA southbound and emits light MQTT northbound [3].
The edge gateway as the OT/IT seam: it reads field protocols southbound, normalizes and buffers in the middle, and routes northbound to the broker and historian. Node-RED, Telegraf, and NiFi each occupy a different point on the same pipeline. Original diagram by the authors, created with AI assistance.
Three tools, three jobsβ
We ship all three because no single tool wins on every axis. The trick is matching the tool to the job.
Node-RED β the low-code flow editorβ
Node-RED is a browser-based, low-code editor where you wire small functional nodes into a flow; the flow itself is stored as JSON and runs on Node.js [4]. It is the fastest way to get an idea onto the floor: drag an mqtt in node, a function node to reshape the payload, and a postgres node, connect them, and you are ingesting. Process engineers who would never write a daemon will happily build a Node-RED flow.
Because flows are JSON, they live in Git like any other config-as-code artifact. A minimal flow that takes a Sparkplug payload off the broker, picks out one metric, and inserts a row into the historian looks like this β this is a realistic Node-RED flow export, not a tested artifact in the companion repo's core profile; the gateway runs behind the opt-in capture profile:
// edge/node-red/flows.json (realistic config β capture profile, not in the core compose)
[
{ "id": "mqtt-in", "type": "mqtt in", "topic": "spBv1.0/Newark/DDATA/BR101/+",
"qos": "1", "broker": "mosquitto", "wires": [["to-row"]] },
{ "id": "to-row", "type": "function", "name": "sparkplug -> sensor_reading",
"func": "const m = msg.payload.metrics[0];\nmsg.payload = {\n ts: new Date(m.timestamp).toISOString(),\n tag: m.name, value: m.value, unit: m.properties.unit,\n quality: m.is_null ? 0 : 192, batch_id: flow.get('batch_id')\n};\nreturn msg;",
"wires": [["to-pg"]] },
{ "id": "to-pg", "type": "postgresql", "name": "ts.sensor_reading",
"query": "INSERT INTO ts.sensor_reading (ts,tag,value,unit,quality,batch_id) VALUES ($1,$2,$3,$4,$5,$6)" }
]
Node-RED's strength is also its limit. It ships only basic authentication and a thin permission model, so in a GxP setting you cannot prove who changed a flow or grant fine-grained roles without an enterprise add-on. We treat it as the prototyping and light-glue layer, and we say so out loud.
Telegraf β declarative collectionβ
Where Node-RED is interactive, Telegraf is the opposite: a single Go binary configured entirely by a TOML file, with a plugin model β inputs, processors, aggregators, outputs β that you compose declaratively [5]. There is no canvas and no clicking. You write the config, version it, and the agent does exactly what the file says, every time. That determinism is precisely what you want for steady, high-rate metric collection.
A Telegraf config that consumes the broker's UNS topics and writes straight to PostgreSQL/TimescaleDB is short β again, a realistic capture-profile artifact, labelled as such:
# edge/telegraf/telegraf.conf (realistic config β capture profile)
[agent]
interval = "5s"
flush_interval = "5s"
omit_hostname = true
[[inputs.mqtt_consumer]]
servers = ["tcp://mosquitto:1883"]
topics = ["bioproc/Newark/+/+/+"] # UNS: enterprise/site/area/line/unit
data_format = "json"
json_time_key = "ts"
json_time_format = "2006-01-02T15:04:05Z07:00"
[[outputs.postgresql]]
connection = "host=postgres user=bioproc password=bioproc dbname=bioproc"
table_template = "INSERT INTO ts.sensor_reading (ts,tag,value,unit,quality,batch_id) VALUES (...)"
The cost of that simplicity is that Telegraf collects and forwards; it does not give you a per-message audit trail. It will faithfully drop a message it cannot parse and move on. For monitoring stack health (we reuse it for exactly that in the operations chapters) it is ideal. For a regulated batch record it is a collector, not a system of record.
Apache NiFi β guaranteed delivery and replayable provenanceβ
The third tool is the one that earns its keep when the data is regulated. Apache NiFi routes data as FlowFiles through a directed graph of processors, and for every FlowFile it creates, forks, clones, modifies, or sends, it writes a provenance event into a repository you can query and replay [6]. That is the closest any open-source edge tool comes to an end-to-end data-flow audit trail. You can ask NiFi, after the fact, "show me the lineage of this record" and it will reconstruct who/what/when/from-what β the same shape the W3C PROV ontology defines as entities, activities, and agents [7]. When the record must be defensible months later, that chain of custody is the feature.
NiFi's lighter sibling, MiNiFi, pushes the same idea down to the source: a small-footprint agent designed explicitly for "generation of data provenance with full chain of custody of information" right at the device [8]. On a constrained edge box next to the skid, MiNiFi collects and stamps provenance, then hands off to a central NiFi.
The price is weight. NiFi is a JVM application (Java 21, ~2 GB of RAM) with its own provenance and content repositories, which is why it sits behind the opt-in capture profile rather than the always-on core stack. A reader brings it up only for this chapter. Its provenance is configured in nifi.properties, and the relevant lines β realistic configuration, not a core-profile artifact β are:
# edge/nifi/nifi.properties (realistic config β capture profile, provenance enabled)
nifi.provenance.repository.implementation=org.apache.nifi.provenance.WriteAheadProvenanceRepository
nifi.provenance.repository.directory.default=./provenance_repository
nifi.provenance.repository.max.storage.time=180 days
nifi.provenance.repository.indexed.fields=EventType, FlowFileUUID, filename, ProcessorID
Set max.storage.time to your retention requirement and NiFi keeps a replayable record of every FlowFile it handled for that window.
Where the data landsβ
Whichever tool routes it, the data converges on the same target: the historian hypertable ts.sensor_reading, defined exactly once in the shared platform schema and joined to the batch model. The destination database itself comes up with the always-on core profile. From examples/platform/compose/compose.yaml:
# examples/platform/compose/compose.yaml
services:
postgres:
# timescale/timescaledb IS PostgreSQL + TimescaleDB, so the historian
# hypertable and the ISA-88/95 batch model live in one joinable database.
image: timescale/timescaledb:2.17.2-pg17
profiles: ["core"]
environment:
POSTGRES_USER: ${POSTGRES_USER:-bioproc}
POSTGRES_PASSWORD: ${POSTGRES_PASSWORD:-bioproc}
POSTGRES_DB: ${POSTGRES_DB:-bioproc}
ports: ["5432:5432"]
volumes:
- pgdata:/var/lib/postgresql/data
- ../db:/docker-entrypoint-initdb.d:ro # 00-60 schema files run on first init
mosquitto:
image: eclipse-mosquitto:2.0.22
profiles: ["core"]
ports: ["1883:1883"]
Note the deliberate choice in the comment: TimescaleDB is PostgreSQL, so the high-rate sensor history and the ISA-88/95 batch context live in one database you can join. The gateway's only job is to land a clean row; the meaning comes from the join, which later chapters build.
The rows the gateway produces are dead simple β long format, one measurement per row. Here is the real shape, taken from examples/datasets/fedbatch_timeseries_10min.sample.csv:
ts,tag,value,unit,quality,batch_id
2026-01-05 00:00:00+00:00,BR101.Agitation.PV,81.4323,rpm,192,BATCH-2026-001
2026-01-05 00:00:00+00:00,BR101.Agitation.SP,81.6008,rpm,192,BATCH-2026-001
2026-01-05 00:00:00+00:00,BR101.DO.PV,40.8224,%sat,192,BATCH-2026-001
That quality column is not decoration. 192 is the OPC UA StatusCode for Good (0x00C0 = 192); the gateway carries the quality flag through untouched so that a dashboard or a reviewer can later distinguish a real reading from an uncertain or bad one. Throwing that field away at the edge is a classic data-integrity own-goal β you have silently made every value look equally trustworthy.
In the companion repo, the chapters from 5 through 13 build this ingest path up piece by piece. So that the later contextualization and ALCOA+ chapters have something to query right away, the repo also ships one script that does the whole load at once β examples/tools/load_datasets.py. Its time-series loader is a textbook bulk ingest, exactly what a production gateway batches up northbound:
# examples/tools/load_datasets.py
def load_timeseries(conn) -> int:
df = pd.read_parquet(DATA / "fedbatch_timeseries.parquet")
buf = io.StringIO()
df[["ts", "tag", "value", "unit", "quality", "batch_id"]].to_csv(buf, index=False, header=False)
buf.seek(0)
with conn.cursor() as cur:
cur.execute("TRUNCATE ts.sensor_reading")
with cur.copy("COPY ts.sensor_reading (ts, tag, value, unit, quality, batch_id) "
"FROM STDIN WITH (FORMAT csv)") as copy:
copy.write(buf.read())
return len(df)
The same script also routes the offline lab data into a different schema, and that detail matters for the gateway story. Notice it stamps an attributable actor before it writes β set_config('app.user', 'loader', ...) β so the database's audit trigger records who introduced the row:
# examples/tools/load_datasets.py
def load_offline(conn) -> int:
df = pd.read_csv(DATA / "offline_assays.csv", parse_dates=["sample_time"])
n = 0
with conn.cursor() as cur:
cur.execute("SELECT set_config('app.user', 'loader', false)")
for _, r in df.iterrows():
cur.execute(
"INSERT INTO lab.sample (sample_id, batch_id, sample_time, sample_point, sample_type) "
"VALUES (%s,%s,%s,%s,'in_process') ON CONFLICT (sample_id) DO NOTHING",
(r.sample_id, r.batch_id, r.sample_time.to_pydatetime(), r.sample_point))
...
return n
You run the whole thing with one make target the book prints verbatim:
make load # load the datasets into the running stack (historian + lab + genealogy)
# -> loaded: 322560 sensor readings, 1344 offline results, 66 release results, 30 genealogy edges
The honest part: delivery semantics and the audit gapβ
Here is where we stop selling and start confessing. A gateway's most important promise is completeness β that every measurement the floor produced actually arrived. That is the "C" (Complete) in ALCOA+, and the MHRA's data-integrity guidance names it explicitly: data must be complete, with nothing silently dropped [9]. The FDA's CGMP data-integrity Q&A makes the same demand from the other direction β all CGMP data must be complete, reliable, and accurate [10].
Completeness at the edge comes down to MQTT Quality of Service. The MQTT specification defines three levels: QoS 0 (at-most-once β fire and forget, messages can be lost), QoS 1 (at-least-once β guaranteed delivery but possible duplicates), and QoS 2 (exactly-once β guaranteed and de-duplicated, at the cost of a four-step handshake) [11]. The Node-RED flow above sets "qos": "1" on purpose: in bioprocess data you would rather receive a measurement twice (and de-duplicate on (ts, tag)) than lose it once. Choosing QoS 0 to save bandwidth is, in a regulated context, choosing to make your record incomplete.
But QoS only protects the transport. It says nothing about what happens inside the gateway when a transform throws, when the disk fills, or when the process restarts mid-flow. This is exactly where Node-RED and Telegraf fall short and NiFi shines: NiFi's FlowFile model is transactional and its provenance repository records the fate of every message, so a dropped or rerouted record is visible and replayable rather than silently gone [6]. For a record that may be audited, "we think it all got through" is not an answer; "here is the provenance event for every FlowFile and here are zero failures in the dead-letter relationship" is.
And even NiFi's provenance is not a Part 11 audit trail. It tells you what the data flow did; it does not, by itself, give you immutable, signed, who-and-why-with-reason records of a regulated value being changed. That belongs to the database β the system-versioned history and hash chain we build in the trust chapters β not to the gateway. The honest division of labor is: the gateway guarantees the data arrives and records how it flowed; the historian and batch model become the record of truth that is validated, audit-trailed, and signable. No edge tool is the compliant record on its own.
Why it mattersβ
Everything downstream β dashboards, contextualization, the knowledge graph, the soft-sensor, the audit-trail review β assumes the data arrived complete and correctly labelled. The edge gateway is where that assumption is either earned or quietly broken. Drop the quality flag, pick QoS 0, or use a collector with no chain of custody, and you can build a beautiful platform on top of data you cannot defend. Choose the right tool for each job β Node-RED to prototype, Telegraf to collect at scale, NiFi when the flow must be provable β and the rest of the book has a foundation worth building on.
In the real worldβ
On a real mAb line, the gateway rarely talks to a simulator. It talks to a DCS like Emerson DeltaV or a Siemens controller over OPC UA, to standalone skids over Modbus or S7, and to a commercial historian like AVEVA PI on the IT side. Those systems do not run on a laptop and are license-locked, which is why our companion repo mocks the OPC UA bioreactor and ships the heavy gateway behind an opt-in profile β the integration code is real, the vendor-specific quirks are not exercised here. Pilot-scale cGMP facilities being built for exactly this kind of digital integration, such as NIIMBL's SABRE facility at the University of Delaware (a public-private cGMP pilot facility that broke ground in April 2024; cGMP = current Good Manufacturing Practice), are where the real-device edge testing happens.
The honest OSS-vs-commercial verdict for this layer: the open-source gateways genuinely do the bridging job well β Node-RED, Telegraf, and NiFi are all production-grade, and NiFi's provenance is a real differentiator. What they do not give you out of the box is the validated-system wrapper: vendor accountability, a turnkey Part 11 audit trail, qualified high availability, and the IQ/OQ/PQ paperwork. Commercial edge platforms (Ignition, HighByte, the historian vendors' own connectors) sell that wrapper. You can reach roughly the same data outcome with OSS for far less license cost β but the GxP last mile is yours to validate, and that work is real.
Key termsβ
- Edge gateway β the device/software at the OT/IT seam that reads field protocols, transforms data, buffers it, and routes it onward without altering the control loop.
- OT / IT β Operational Technology (controllers, skids, sensors on isolated control networks) vs. Information Technology (databases, dashboards, analytics).
- Southbound / northbound β gateway-speak for the field-protocol side (down toward devices) vs. the data-platform side (up toward IT).
- Node-RED β browser-based low-code flow editor; flows stored as JSON, runs on Node.js; great for prototyping, weak RBAC.
- Telegraf β single-binary, plugin-driven, TOML-configured collection agent; deterministic, no per-message audit trail.
- Apache NiFi / MiNiFi β JVM dataflow tool with replayable, FlowFile-level provenance (chain of custody); MiNiFi is its small-footprint edge agent.
- Provenance / data lineage β the recorded who/what/when/from-what history of a data record; W3C PROV-O is the standard vocabulary (entities, activities, agents).
- QoS (MQTT) β delivery guarantee: QoS 0 at-most-once (can lose), QoS 1 at-least-once (can duplicate), QoS 2 exactly-once.
- Quality flag β the OPC UA
StatusCode(e.g.192= Good) carried with each value so consumers can tell good readings from uncertain/bad. - ALCOA+ "Complete" β the data-integrity attribute requiring that no data be silently lost; the gateway's delivery guarantees protect it.
Where this leadsβ
The gateway now reliably routes a clean, quality-flagged stream into the historian. But routing is only as good as the source. In the next chapter, Upstream Capture: The Production Bioreactor, we point the pipeline at the heart of the process β the fed-batch CHO bioreactor itself β and capture its setpoints, process values, OPC UA quality codes, and ISA-88 phase context as the 14-day batch unfolds, turning a stream of rows into a story about a living culture.