Is your AI model drifting silently in production? Are latency spikes going unnoticed? Most ML engineers deploy models with zero observability — and then wonder why accuracy degrades or users complain about slow predictions. This tutorial shows you how to build a complete real-time monitoring stack using InfluxDB, Telegraf, and Grafana (the TIG stack), going from zero to live dashboards in under 30 minutes.
No paid tools, no Kubernetes required — just Docker, Python, and three open-source projects that thousands of teams use in production.
Why AI Model Monitoring Matters
Deploying a model is only the beginning. Without monitoring, you are flying blind. Here are the four problems that hit every production AI system:
- Model drift — Your model was 95% accurate on launch day. Three months later, real-world data has shifted and accuracy dropped to 78%. Nobody noticed because there are no metrics.
- Latency spikes — Inference normally takes 20ms, but under load it spikes to 500ms. Users feel it. You do not find out until someone files a support ticket.
- Resource waste — Memory leaks and CPU overuse are invisible without tooling. You are burning money on infrastructure you cannot optimize.
- Silent failures — The worst one. Models can return wrong predictions without raising any errors. No exception, no crash log — just quietly incorrect results going to real users.
Production AI needs observability. Just like you would never run a web server without logs and metrics, you should not run an AI model without monitoring.
Meet the TIG Stack
Three tools, each with one job, working together:
| Tool | Role | Port |
|---|---|---|
| InfluxDB | Time-series database — stores metrics with nanosecond timestamps | 8086 |
| Telegraf | Data collection agent — receives metrics from your app and forwards to InfluxDB | 8080 |
| Grafana | Visualization and alerting — queries InfluxDB and renders live dashboards | 3000 |
Together they form the TIG Stack — battle-tested, used by thousands of teams worldwide, and completely free.
Architecture — How Data Flows
The data pipeline is simple and linear:
- Your AI model (Python) generates metrics — latency, accuracy, batch size, memory usage.
- Telegraf listens on an HTTP endpoint, collects the metrics, and forwards them to InfluxDB on a regular flush interval.
- InfluxDB stores every data point with a nanosecond-precision timestamp.
- Grafana queries InfluxDB using the Flux query language, renders live dashboards, and evaluates alert conditions continuously.
Prerequisites
- Docker Desktop installed and running — everything runs in containers for a clean, reproducible setup.
- Python 3.8+ — for the model instrumentation code.
- A terminal — PowerShell, Bash, or CMD.
- An AI model — scikit-learn, TensorFlow, PyTorch, or any model with a
predict()method. If you do not have one, the includedmonitor.pyhas a demo model for testing.
Step 1 — Install InfluxDB
Run InfluxDB 2.7 in a Docker container with persistent data volumes:
docker run -d -p 8086:8086 \
--name influxdb \
-v influxdb-data:/var/lib/influxdb2 \
-v influxdb-config:/etc/influxdb2 \
influxdb:2.7
Verify it is running:
docker ps
Open http://localhost:8086 in your browser. You should see the InfluxDB welcome screen.
Step 2 — Configure InfluxDB
- Click Get Started.
- Create admin credentials — save these.
- Set Organization to
myorg. - Set Bucket to
ai_metrics. - Click Continue, then Configure Later.
- Go to Data → API Tokens → Generate Token → All Access Token.
- Copy and save your API token immediately — you will not see it again.
Step 3 — Set Up the Docker Network
Containers need to communicate by name. Create a shared Docker network and connect InfluxDB to it:
mkdir ai-monitoring && cd ai-monitoring
docker network create monitoring
docker network connect monitoring influxdb
Now Telegraf can reach InfluxDB at http://influxdb:8086 instead of needing an IP address.
Step 4 — Configure and Run Telegraf
Create a telegraf.conf file in your project folder with this minimal configuration:
[agent]
interval = "10s"
flush_interval = "10s"
[[outputs.influxdb_v2]]
urls = ["http://influxdb:8086"]
token = "YOUR_API_TOKEN_HERE"
organization = "myorg"
bucket = "ai_metrics"
[[inputs.http_listener_v2]]
service_address = ":8080"
paths = ["/metrics"]
data_format = "influx"
Replace YOUR_API_TOKEN_HERE with the token from Step 2.
What this config does:
[agent]— Collects every 10 seconds and flushes to InfluxDB every 10 seconds.[[outputs.influxdb_v2]]— Sends data to InfluxDB using the container network name.[[inputs.http_listener_v2]]— Listens on port 8080 at/metricsfor incoming data in InfluxDB line protocol format.
Start the Telegraf container:
# Linux/macOS
docker run -d --name telegraf --network monitoring \
-v $(pwd)/telegraf.conf:/etc/telegraf/telegraf.conf:ro \
telegraf
# Windows PowerShell
docker run -d --name telegraf --network monitoring `
-v ${PWD}/telegraf.conf:/etc/telegraf/telegraf.conf:ro `
telegraf
Verify with docker ps — you should see both influxdb and telegraf running.
Step 5 — Instrument Your AI Model with Python
This is where the magic happens. Install the InfluxDB Python client:
pip install influxdb-client psutil
Here is the full monitoring wrapper from the companion repository. It works with any model that has a predict() method:
import time
import psutil
import os
from influxdb_client import InfluxDBClient, Point
from influxdb_client.client.write_api import SYNCHRONOUS
INFLUXDB_URL = "http://localhost:8086"
INFLUXDB_TOKEN = "YOUR_API_TOKEN"
INFLUXDB_ORG = "myorg"
INFLUXDB_BUCKET = "ai_metrics"
MODEL_NAME = "my_model_v1"
client = InfluxDBClient(url=INFLUXDB_URL, token=INFLUXDB_TOKEN, org=INFLUXDB_ORG)
write_api = client.write_api(write_options=SYNCHRONOUS)
def predict(model, input_data, ground_truth=None):
start = time.time()
error_occurred = False
try:
result = model.predict(input_data)
except Exception as e:
error_occurred = True
result = None
print(f"[monitor] Prediction error: {e}")
latency_ms = (time.time() - start) * 1000
memory_mb = psutil.Process(os.getpid()).memory_info().rss / 1024 / 1024
batch_size = len(input_data) if hasattr(input_data, "__len__") else 1
point = (
Point("ai_model_metrics")
.tag("model", MODEL_NAME)
.field("latency_ms", latency_ms)
.field("batch_size", batch_size)
.field("memory_mb", memory_mb)
.field("error_rate", 1.0 if error_occurred else 0.0)
)
if ground_truth is not None and result is not None:
correct = sum(p == t for p, t in zip(result, ground_truth))
accuracy = correct / len(ground_truth)
point = point.field("accuracy", accuracy)
write_api.write(bucket=INFLUXDB_BUCKET, record=point)
return result
How to Use It
Replace your existing model.predict(data) call with the wrapped version:
from monitor import predict
import joblib
model = joblib.load("my_model.pkl")
result = predict(model, X, ground_truth=y_true)
That is it. Your existing code barely changes, and every prediction now emits real-time metrics to InfluxDB.
Metrics Tracked Automatically
| Metric | Field Name | What It Measures |
|---|---|---|
| Inference latency | latency_ms |
Time per prediction in milliseconds |
| Model accuracy | accuracy |
Correct predictions / total (requires ground truth) |
| Batch size | batch_size |
Number of inputs per call |
| Memory usage | memory_mb |
Process RSS memory — detects leaks |
| Error rate | error_rate |
1.0 on failure, 0.0 on success |
Step 6 — Install Grafana
docker run -d -p 3000:3000 \
--name grafana --network monitoring \
-v grafana-data:/var/lib/grafana \
grafana/grafana:latest
Open http://localhost:3000. Default credentials: admin / admin. You will be prompted to change the password.
Step 7 — Connect Grafana to InfluxDB
- Go to Connections → Data Sources → Add data source.
- Select InfluxDB.
- Set URL to
http://influxdb:8086(use the container name, not localhost). - Set Query Language to Flux.
- Enter Organization:
myorg. - Paste your API Token.
- Set Default Bucket:
ai_metrics. - Click Save & Test — you should see a green success banner.
Step 8 — Build Your Dashboard
Create a new dashboard and add panels with Flux queries. Here are the queries for the key metrics:
Model Accuracy Over Time
from(bucket: "ai_metrics")
|> range(start: -1h)
|> filter(fn: (r) => r._measurement == "ai_model_metrics")
|> filter(fn: (r) => r._field == "accuracy")
|> aggregateWindow(every: 1m, fn: mean)
Inference Latency
from(bucket: "ai_metrics")
|> range(start: -1h)
|> filter(fn: (r) => r._measurement == "ai_model_metrics")
|> filter(fn: (r) => r._field == "latency_ms")
|> aggregateWindow(every: 1m, fn: mean)
Memory Usage
from(bucket: "ai_metrics")
|> range(start: -1h)
|> filter(fn: (r) => r._measurement == "ai_model_metrics")
|> filter(fn: (r) => r._field == "memory_mb")
|> aggregateWindow(every: 1m, fn: mean)
Add more panels for batch_size and error_rate by changing the field filter. Arrange them in a grid, resize, and rename. Save the dashboard as AI Model Monitoring.
Step 9 — Configure Alerts
Catch model degradation before it impacts users:
- Click Alerting → Alert rules → New alert rule.
- Set the condition:
accuracyIS BELOW0.80. - Set evaluation: every 1 minute for 5 minutes — the condition must persist for 5 consecutive minutes before firing, which prevents false alarms.
- Add a notification channel: Email, Slack, PagerDuty, Discord, or Webhooks.
- Click Save rule.
Your model is now protected. The moment accuracy drops below 80% for more than 5 minutes, you get notified automatically.
Docker Commands Cheat Sheet
| Command | What It Does |
|---|---|
docker start influxdb telegraf grafana |
Start all containers |
docker stop influxdb telegraf grafana |
Stop all containers |
docker logs telegraf |
View Telegraf logs |
docker logs influxdb |
View InfluxDB logs |
docker network inspect monitoring |
Verify containers are on the same network |
docker volume rm influxdb-data influxdb-config grafana-data |
Full cleanup (removes all stored data) |
Troubleshooting Common Issues
“Connection refused” in Grafana or Telegraf
- Check containers are running:
docker ps - Verify the network:
docker network inspect monitoring - Use container names (
http://influxdb:8086), notlocalhost, when connecting from within the Docker network
No data appearing in Grafana
- Run your Python script:
python monitor.py - Check InfluxDB Data Explorer at
http://localhost:8086— verify data is arriving - Check Telegraf logs:
docker logs telegraf - Ensure your Flux query uses the correct bucket name and measurement name
TOML syntax error in telegraf.conf
- Ensure the file is UTF-8 encoded without BOM
- Make sure the API token is on a single line with no extra quotes
- Verify indentation uses spaces, not tabs
Resources
- Full source code and config files: github.com/shazforiot/ai-monitoring-with-TIG-stack
- InfluxDB Documentation: docs.influxdata.com/influxdb/v2
- Telegraf Plugins: docs.influxdata.com/telegraf/latest/plugins
- Grafana Dashboards: grafana.com/grafana/dashboards
- Flux Query Language: docs.influxdata.com/flux/latest
Frequently Asked Questions
What is the TIG stack and why use it for AI monitoring?
The TIG stack stands for Telegraf + InfluxDB + Grafana. Telegraf collects metrics from your AI models, InfluxDB stores them as time-series data with nanosecond precision, and Grafana visualizes the data in real-time dashboards with alerting. It is fully open-source, runs in Docker, and is battle-tested by thousands of engineering teams for production monitoring.
Can I monitor any AI model with this setup?
Yes. The Python instrumentation wrapper (monitor.py) works with any model that has a predict() method — scikit-learn, TensorFlow, PyTorch, XGBoost, HuggingFace transformers, and more. You simply replace model.predict(data) with predict(model, data) and metrics are automatically sent to InfluxDB.
What metrics should I track for AI model monitoring?
The five most critical metrics are: latency_ms (inference time per prediction), accuracy (model performance if ground truth is available), memory_mb (process memory to detect leaks), batch_size (throughput volume), and error_rate (failed prediction rate). The included monitor.py tracks all five out of the box.
Do I need Kubernetes to run this monitoring stack?
No. The entire stack runs in Docker containers on your local machine or any single server. You do not need Kubernetes, cloud services, or paid tools. Just Docker Desktop and Python 3.8+. The setup works identically on Windows, macOS, and Linux.
How do Grafana alerts detect model drift?
You configure an alert rule in Grafana that evaluates a condition like “accuracy IS BELOW 0.80 for 5 consecutive minutes.” When your model’s accuracy degrades below the threshold — a sign of data drift or concept drift — Grafana fires the alert and sends a notification to Slack, email, PagerDuty, or any configured channel. This catches silent degradation before it impacts users.
Video Chapters — Quick Navigation
- 00:00 — Intro & Demo
- 01:24 — Why AI Model Monitoring Matters
- 04:38 — Architecture Overview (TIG Stack)
- 06:20 — Prerequisites
- 08:09 — Install & Configure InfluxDB
- 13:05 — Docker Network & Telegraf Setup
- 14:54 — Configure & Run Telegraf Container
- 17:11 — Instrument Your AI Model (Python)
- 19:57 — Install & Connect Grafana
- 21:35 — Connect Grafana to InfluxDB
- 23:30 — Build the Dashboard & View Metrics
- 27:06 — Configure Alerts
- 28:20 — Final Demo & Recap