Top Videos
May 14, 2026Home / IoT VideosEvery time you push code, someone has to build it, test it, and deploy it. Do that manually and you waste hours — or worse, introduce human error. GitHub Actions automates all of it. It is a CI/CD platform built directly into GitHub, so there is nothing extra to install or configure. You write a YAML file, push it, and GitHub runs your pipeline on every commit.
This tutorial takes you from zero to a production-grade CI/CD pipeline with caching for 3x faster builds, matrix builds for multi-version testing, deployment pipelines with approval gates, and secrets management — all using the companion project at shazforiot/github-actions-demo.
What Is GitHub Actions?
GitHub Actions is a CI/CD platform built into GitHub. It automates running tests on every push, building and deploying your application, scheduled tasks like backups, and virtually any workflow you can define. With over 20,000 actions in the GitHub Marketplace, you can assemble pipelines without writing custom scripts.
Core Concepts
Understand these four terms and everything else clicks:
Concept
What It Is
Example
Workflow
The entire automation, defined in a .yml file
ci.yml, deploy.yml
Trigger
What starts the workflow
Push, pull request, schedule, manual
Job
A group of steps running on the same machine
Test job, build job, deploy job
Step
An individual command or pre-built action
npm install, actions/checkout@v4
Data flows like this: Trigger → Workflow → Jobs → Steps. Each job runs on a fresh virtual machine, and jobs can run in sequence (using needs:) or in parallel.
Step 1 — Your First CI Pipeline
Create a file at .github/workflows/ci.yml. This is the most common workflow — run tests and build on every push:
name: CI Pipeline
on:
push:
branches:
pull_request:
branches:
jobs:
test:
name: Run Tests
runs-on: ubuntu-latest
steps:
- name: Checkout code
uses: actions/checkout@v4
- name: Setup Node.js
uses: actions/setup-node@v4
with:
node-version: '20'
- name: Install dependencies
run: npm install
- name: Run linter
run: npm run lint
- name: Run tests
run: npm test
build:
name: Build Project
runs-on: ubuntu-latest
needs: test
steps:
- name: Checkout code
uses: actions/checkout@v4
- name: Setup Node.js
uses: actions/setup-node@v4
with:
node-version: '20'
- name: Install dependencies
run: npm install
- name: Build
run: npm run build
- name: Upload build artifacts
uses: actions/upload-artifact@v4
with:
name: build-files
path: dist/
Line-by-Line Breakdown
name — Display name in the GitHub Actions UI.
on — Triggers. This runs on pushes and pull requests to main.
runs-on — The operating system. ubuntu-latest is the most common choice.
uses: actions/checkout@v4 — A pre-built action that clones your repository. Every workflow needs this as the first step.
uses: actions/setup-node@v4 — Installs Node.js. The with: block passes parameters.
run: — Executes a shell command directly.
needs: test — The build job only runs after the test job passes.
actions/upload-artifact@v4 — Saves build output so downstream jobs (like deployment) can use it.
Step 2 — Add Caching for 3x Faster Builds
Without caching, every workflow run downloads all dependencies from scratch. For a Node.js project, that can take 30–60 seconds per run. Caching stores node_modules between runs, cutting install time to near zero on cache hits.
- name: Cache node modules
uses: actions/cache@v4
id: cache-npm
with:
path: ~/.npm
key: ${{ runner.os }}-npm-${{ hashFiles('**/package-lock.json') }}
restore-keys: |
${{ runner.os }}-npm-
- name: Cache status
run: |
if ; then
echo "Cache HIT - using cached dependencies"
else
echo "Cache MISS - will cache after install"
fi
- name: Install dependencies
run: npm install
How the Cache Key Works
The key includes a hash of package-lock.json. When dependencies change, the hash changes, the cache misses, and a new cache is saved. The restore-keys fallback tries any cache starting with runner.os-npm-, so even a partial match speeds up the install.
Result: First run has a cache miss (normal). Second run gets a cache hit and installs in seconds instead of minutes.
Step 3 — Matrix Builds (Multi-Version Testing)
Does your code work on Node 18? Node 20? Node 22? Windows? Linux? Instead of writing separate workflows, use a matrix strategy to test all combinations in parallel:
jobs:
test-matrix:
name: Node ${{ matrix.node-version }} on ${{ matrix.os }}
strategy:
matrix:
node-version:
os:
fail-fast: false
runs-on: ${{ matrix.os }}
steps:
- name: Checkout code
uses: actions/checkout@v4
- name: Setup Node.js ${{ matrix.node-version }}
uses: actions/setup-node@v4
with:
node-version: ${{ matrix.node-version }}
- name: Install dependencies
run: npm install
- name: Run tests
run: npm test
This configuration creates 6 parallel jobs (3 Node versions × 2 operating systems). The fail-fast: false setting ensures all jobs complete even if one fails — useful for seeing the full compatibility picture.
Step 4 — Deployment Pipeline with Staging and Production
A real deployment pipeline has stages: Test → Build → Staging → Production. Each stage depends on the previous one, and production can require manual approval.
name: Deploy
on:
push:
branches:
concurrency:
group: deployment
cancel-in-progress: false
jobs:
test:
name: Run Tests
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-node@v4
with:
node-version: '20'
- run: npm install
- run: npm test
build:
name: Build
runs-on: ubuntu-latest
needs: test
steps:
- uses: actions/checkout@v4
- uses: actions/setup-node@v4
with:
node-version: '20'
- run: npm install
- run: npm run build
- uses: actions/upload-artifact@v4
with:
name: production-build
path: dist/
deploy-staging:
name: Deploy to Staging
runs-on: ubuntu-latest
needs: build
environment: staging
steps:
- uses: actions/download-artifact@v4
with:
name: production-build
path: dist/
- run: echo "Deploying to staging..."
deploy-production:
name: Deploy to Production
runs-on: ubuntu-latest
needs: deploy-staging
environment: production
steps:
- uses: actions/download-artifact@v4
with:
name: production-build
path: dist/
- run: echo "Deploying to production..."
Key Features
concurrency — Prevents multiple deployments from running simultaneously. If a new commit pushes while a deployment is in progress, it waits rather than creating conflicts.
needs: — Enforces the stage order. Staging only runs after build passes. Production only runs after staging.
environment: staging / environment: production — GitHub Environments let you require manual approval. Go to Settings → Environments → production → Required reviewers to add approval gates.
upload-artifact / download-artifact — Pass build output between jobs since each job runs on a separate machine.
Step 5 — Secrets and Environment Variables
Never hardcode sensitive data in your workflows. Use GitHub Secrets for API keys, database URLs, and deployment credentials.
Adding Secrets
Go to your repository Settings → Secrets and variables → Actions.
Click New repository secret.
Add the name (e.g., API_KEY) and value.
Reference it in workflows as ${{ secrets.API_KEY }}.
Using Secrets and Variables in Workflows
name: Secrets Demo
on:
workflow_dispatch:
env:
APP_NAME: github-actions-demo
NODE_ENV: production
jobs:
demo:
runs-on: ubuntu-latest
env:
LOG_LEVEL: info
steps:
- uses: actions/checkout@v4
- name: Show GitHub context
run: |
echo "Repository: ${{ github.repository }}"
echo "Branch: ${{ github.ref_name }}"
echo "Commit: ${{ github.sha }}"
echo "Actor: ${{ github.actor }}"
- name: Use secrets safely
env:
API_KEY: ${{ secrets.API_KEY }}
DATABASE_URL: ${{ secrets.DATABASE_URL }}
run: |
echo "API Key is set: $(] && echo 'Yes' || echo 'No')"
- name: Generate output
id: generate
run: echo "random_id=$(date +%s)" >> $GITHUB_OUTPUT
- name: Use generated value
run: echo "Generated ID: ${{ steps.generate.outputs.random_id }}"
Secrets are automatically masked in logs — you will see *** instead of the actual value. The GITHUB_OUTPUT environment variable lets you pass data between steps within the same job.
Step 6 — Scheduled Jobs (Cron Workflows)
Run workflows on a schedule for health checks, dependency updates, or weekly reports:
name: Scheduled Tasks
on:
schedule:
- cron: '0 9 * * 1' # Every Monday at 9 AM UTC
workflow_dispatch:
inputs:
task:
description: 'Which task to run'
type: choice
options:
- health-check
- dependency-update
- report
jobs:
health-check:
if: github.event.inputs.task == 'health-check' || github.event_name == 'schedule'
runs-on: ubuntu-latest
steps:
- run: echo "Running health checks..."
dependency-check:
if: github.event.inputs.task == 'dependency-update' || github.event_name == 'schedule'
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-node@v4
with:
node-version: '20'
- run: npm outdated || true
weekly-report:
if: github.event.inputs.task == 'report' || github.event_name == 'schedule'
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
with:
fetch-depth: 0
- run: |
echo "=== Weekly Report ==="
git log --oneline --since="7 days ago" | head -20
git shortlog -sn --since="7 days ago"
Cron Syntax Reference
Expression
Meaning
0 0 * * *
Every day at midnight UTC
0 */6 * * *
Every 6 hours
0 9 * * 1-5
Weekdays at 9 AM UTC
0 9 * * 1
Every Monday at 9 AM UTC
Use crontab.guru to generate and verify cron expressions.
The Demo Project — Calculator API
The companion repository uses a simple Node.js Calculator API with endpoints for add, subtract, multiply, and divide. It includes unit tests using Node’s built-in test runner:
// src/index.js (simplified)
function add(a, b) { return a + b; }
function subtract(a, b) { return a - b; }
function multiply(a, b) { return a * b; }
function divide(a, b) {
if (b === 0) throw new Error('Cannot divide by zero');
return a / b;
}
module.exports = { add, subtract, multiply, divide };
// src/index.test.js (simplified)
const { test, describe } = require('node:test');
const assert = require('node:assert');
const { add, divide } = require('./index.js');
describe('Calculator', () => {
test('adds two numbers', () => {
assert.strictEqual(add(2, 3), 5);
});
test('throws on division by zero', () => {
assert.throws(() => divide(10, 0), { message: 'Cannot divide by zero' });
});
});
Run it locally: npm install && npm test.
Try It Yourself
Fork or clone the repository: git clone https://github.com/shazforiot/github-actions-demo
Push to your own GitHub account.
Go to the Actions tab in your repository and watch the workflows run.
Make a change — edit a file, commit, and push. See the CI pipeline trigger automatically.
Create a pull request — see the PR checks run.
Set up a secret in Settings → Secrets and trigger the secrets-demo workflow manually.
Common Issues and Fixes
Workflow not running?
Check that the on: triggers match your action (pushing to main? creating a PR?).
Ensure the .yml file is in .github/workflows/.
Check for YAML indentation errors — spaces only, no tabs.
Tests failing in CI but passing locally?
Add - run: npm test -- --verbose for more detailed output.
Check the logs in the Actions tab for the exact error.
Ensure your Node version matches between local and CI.
Cache not working?
Verify the cache key matches — it depends on package-lock.json existing and being committed.
Check the “Cache status” step output for HIT or MISS.
Pro Tips
Use workflow_dispatch to manually trigger workflows for testing without pushing code.
Add status badges to your README: !(https://github.com/user/repo/actions/workflows/ci.yml/badge.svg)
Use environments for deployment approvals — staging can be automatic, production requires sign-off.
Set up branch protection to require passing CI checks before merging PRs.
Monitor your usage in Settings → Billing → Actions to stay within free limits.
Resources
Full source code and all workflows: github.com/shazforiot/github-actions-demo
GitHub Actions Documentation: docs.github.com/en/actions
Workflow Syntax Reference: Workflow Syntax
Actions Marketplace: github.com/marketplace?type=actions
Cron Expression Generator: crontab.guru
Frequently Asked Questions
Is GitHub Actions free?
Yes. GitHub offers 2,000 free minutes per month for public and private repositories on the free plan. Public repositories have unlimited minutes. For most small projects and learning, the free tier is more than enough. Paid plans offer higher minute allowances.
What is the difference between a workflow, a job, and a step?
A workflow is the entire automation defined in a .yml file inside .github/workflows/. A job is a group of steps that run on the same virtual machine. A step is an individual command or action within a job. The hierarchy is: Workflow → Job → Step.
How do I speed up my GitHub Actions builds?
The most effective method is dependency caching. Use actions/cache@v4 to store node_modules, pip packages, or other dependencies between runs. A cache hit can reduce build times from minutes to seconds. Other strategies include using smaller base images, running jobs in parallel instead of sequentially, and only running workflows on relevant file changes.
Can I require manual approval before deploying to production?
Yes. Use GitHub Environments with required reviewers. In your deploy.yml, set environment: production on the production job. Then go to repo Settings → Environments → production → Required reviewers and add the team members who must approve before the deployment proceeds.
How do I use secrets in GitHub Actions?
Go to your repository Settings → Secrets and variables → Actions → New repository secret. Add the name (e.g., API_KEY) and value. In your workflow, reference it as ${{ secrets.API_KEY }}. Secrets are automatically masked in logs so they never appear in plain text.
Video Chapters — Quick Navigation
0:00 — Introduction
0:30 — What is GitHub Actions?
2:00 — Creating your first CI pipeline
6:40 — Adding caching for faster builds
9:40 — Matrix builds (multi-version testing)
12:15 — Deployment pipelines
16:00 — Try it yourself
19:45 — Recap [...]
May 12, 2026Home / IoT VideosIs your AI model drifting silently in production? Are latency spikes going unnoticed? Most ML engineers deploy models with zero observability — and then wonder why accuracy degrades or users complain about slow predictions. This tutorial shows you how to build a complete real-time monitoring stack using InfluxDB, Telegraf, and Grafana (the TIG stack), going from zero to live dashboards in under 30 minutes.
No paid tools, no Kubernetes required — just Docker, Python, and three open-source projects that thousands of teams use in production.
Why AI Model Monitoring Matters
Deploying a model is only the beginning. Without monitoring, you are flying blind. Here are the four problems that hit every production AI system:
Model drift — Your model was 95% accurate on launch day. Three months later, real-world data has shifted and accuracy dropped to 78%. Nobody noticed because there are no metrics.
Latency spikes — Inference normally takes 20ms, but under load it spikes to 500ms. Users feel it. You do not find out until someone files a support ticket.
Resource waste — Memory leaks and CPU overuse are invisible without tooling. You are burning money on infrastructure you cannot optimize.
Silent failures — The worst one. Models can return wrong predictions without raising any errors. No exception, no crash log — just quietly incorrect results going to real users.
Production AI needs observability. Just like you would never run a web server without logs and metrics, you should not run an AI model without monitoring.
Meet the TIG Stack
Three tools, each with one job, working together:
Tool
Role
Port
InfluxDB
Time-series database — stores metrics with nanosecond timestamps
8086
Telegraf
Data collection agent — receives metrics from your app and forwards to InfluxDB
8080
Grafana
Visualization and alerting — queries InfluxDB and renders live dashboards
3000
Together they form the TIG Stack — battle-tested, used by thousands of teams worldwide, and completely free.
Architecture — How Data Flows
The data pipeline is simple and linear:
Your AI model (Python) generates metrics — latency, accuracy, batch size, memory usage.
Telegraf listens on an HTTP endpoint, collects the metrics, and forwards them to InfluxDB on a regular flush interval.
InfluxDB stores every data point with a nanosecond-precision timestamp.
Grafana queries InfluxDB using the Flux query language, renders live dashboards, and evaluates alert conditions continuously.
Prerequisites
Docker Desktop installed and running — everything runs in containers for a clean, reproducible setup.
Python 3.8+ — for the model instrumentation code.
A terminal — PowerShell, Bash, or CMD.
An AI model — scikit-learn, TensorFlow, PyTorch, or any model with a predict() method. If you do not have one, the included monitor.py has a demo model for testing.
Step 1 — Install InfluxDB
Run InfluxDB 2.7 in a Docker container with persistent data volumes:
docker run -d -p 8086:8086 \
--name influxdb \
-v influxdb-data:/var/lib/influxdb2 \
-v influxdb-config:/etc/influxdb2 \
influxdb:2.7
Verify it is running:
docker ps
Open http://localhost:8086 in your browser. You should see the InfluxDB welcome screen.
Step 2 — Configure InfluxDB
Click Get Started.
Create admin credentials — save these.
Set Organization to myorg.
Set Bucket to ai_metrics.
Click Continue, then Configure Later.
Go to Data → API Tokens → Generate Token → All Access Token.
Copy and save your API token immediately — you will not see it again.
Step 3 — Set Up the Docker Network
Containers need to communicate by name. Create a shared Docker network and connect InfluxDB to it:
mkdir ai-monitoring && cd ai-monitoring
docker network create monitoring
docker network connect monitoring influxdb
Now Telegraf can reach InfluxDB at http://influxdb:8086 instead of needing an IP address.
Step 4 — Configure and Run Telegraf
Create a telegraf.conf file in your project folder with this minimal configuration:
interval = "10s"
flush_interval = "10s"
]
urls = ["http://influxdb:8086"]
token = "YOUR_API_TOKEN_HERE"
organization = "myorg"
bucket = "ai_metrics"
]
service_address = ":8080"
paths = ["/metrics"]
data_format = "influx"
Replace YOUR_API_TOKEN_HERE with the token from Step 2.
What this config does:
— Collects every 10 seconds and flushes to InfluxDB every 10 seconds.
] — Sends data to InfluxDB using the container network name.
] — Listens on port 8080 at /metrics for incoming data in InfluxDB line protocol format.
Start the Telegraf container:
# Linux/macOS
docker run -d --name telegraf --network monitoring \
-v $(pwd)/telegraf.conf:/etc/telegraf/telegraf.conf:ro \
telegraf
# Windows PowerShell
docker run -d --name telegraf --network monitoring `
-v ${PWD}/telegraf.conf:/etc/telegraf/telegraf.conf:ro `
telegraf
Verify with docker ps — you should see both influxdb and telegraf running.
Step 5 — Instrument Your AI Model with Python
This is where the magic happens. Install the InfluxDB Python client:
pip install influxdb-client psutil
Here is the full monitoring wrapper from the companion repository. It works with any model that has a predict() method:
import time
import psutil
import os
from influxdb_client import InfluxDBClient, Point
from influxdb_client.client.write_api import SYNCHRONOUS
INFLUXDB_URL = "http://localhost:8086"
INFLUXDB_TOKEN = "YOUR_API_TOKEN"
INFLUXDB_ORG = "myorg"
INFLUXDB_BUCKET = "ai_metrics"
MODEL_NAME = "my_model_v1"
client = InfluxDBClient(url=INFLUXDB_URL, token=INFLUXDB_TOKEN, org=INFLUXDB_ORG)
write_api = client.write_api(write_options=SYNCHRONOUS)
def predict(model, input_data, ground_truth=None):
start = time.time()
error_occurred = False
try:
result = model.predict(input_data)
except Exception as e:
error_occurred = True
result = None
print(f" Prediction error: {e}")
latency_ms = (time.time() - start) * 1000
memory_mb = psutil.Process(os.getpid()).memory_info().rss / 1024 / 1024
batch_size = len(input_data) if hasattr(input_data, "__len__") else 1
point = (
Point("ai_model_metrics")
.tag("model", MODEL_NAME)
.field("latency_ms", latency_ms)
.field("batch_size", batch_size)
.field("memory_mb", memory_mb)
.field("error_rate", 1.0 if error_occurred else 0.0)
)
if ground_truth is not None and result is not None:
correct = sum(p == t for p, t in zip(result, ground_truth))
accuracy = correct / len(ground_truth)
point = point.field("accuracy", accuracy)
write_api.write(bucket=INFLUXDB_BUCKET, record=point)
return result
How to Use It
Replace your existing model.predict(data) call with the wrapped version:
from monitor import predict
import joblib
model = joblib.load("my_model.pkl")
result = predict(model, X, ground_truth=y_true)
That is it. Your existing code barely changes, and every prediction now emits real-time metrics to InfluxDB.
Metrics Tracked Automatically
Metric
Field Name
What It Measures
Inference latency
latency_ms
Time per prediction in milliseconds
Model accuracy
accuracy
Correct predictions / total (requires ground truth)
Batch size
batch_size
Number of inputs per call
Memory usage
memory_mb
Process RSS memory — detects leaks
Error rate
error_rate
1.0 on failure, 0.0 on success
Step 6 — Install Grafana
docker run -d -p 3000:3000 \
--name grafana --network monitoring \
-v grafana-data:/var/lib/grafana \
grafana/grafana:latest
Open http://localhost:3000. Default credentials: admin / admin. You will be prompted to change the password.
Step 7 — Connect Grafana to InfluxDB
Go to Connections → Data Sources → Add data source.
Select InfluxDB.
Set URL to http://influxdb:8086 (use the container name, not localhost).
Set Query Language to Flux.
Enter Organization: myorg.
Paste your API Token.
Set Default Bucket: ai_metrics.
Click Save & Test — you should see a green success banner.
Step 8 — Build Your Dashboard
Create a new dashboard and add panels with Flux queries. Here are the queries for the key metrics:
Model Accuracy Over Time
from(bucket: "ai_metrics")
|> range(start: -1h)
|> filter(fn: (r) => r._measurement == "ai_model_metrics")
|> filter(fn: (r) => r._field == "accuracy")
|> aggregateWindow(every: 1m, fn: mean)
Inference Latency
from(bucket: "ai_metrics")
|> range(start: -1h)
|> filter(fn: (r) => r._measurement == "ai_model_metrics")
|> filter(fn: (r) => r._field == "latency_ms")
|> aggregateWindow(every: 1m, fn: mean)
Memory Usage
from(bucket: "ai_metrics")
|> range(start: -1h)
|> filter(fn: (r) => r._measurement == "ai_model_metrics")
|> filter(fn: (r) => r._field == "memory_mb")
|> aggregateWindow(every: 1m, fn: mean)
Add more panels for batch_size and error_rate by changing the field filter. Arrange them in a grid, resize, and rename. Save the dashboard as AI Model Monitoring.
Step 9 — Configure Alerts
Catch model degradation before it impacts users:
Click Alerting → Alert rules → New alert rule.
Set the condition: accuracy IS BELOW 0.80.
Set evaluation: every 1 minute for 5 minutes — the condition must persist for 5 consecutive minutes before firing, which prevents false alarms.
Add a notification channel: Email, Slack, PagerDuty, Discord, or Webhooks.
Click Save rule.
Your model is now protected. The moment accuracy drops below 80% for more than 5 minutes, you get notified automatically.
Docker Commands Cheat Sheet
Command
What It Does
docker start influxdb telegraf grafana
Start all containers
docker stop influxdb telegraf grafana
Stop all containers
docker logs telegraf
View Telegraf logs
docker logs influxdb
View InfluxDB logs
docker network inspect monitoring
Verify containers are on the same network
docker volume rm influxdb-data influxdb-config grafana-data
Full cleanup (removes all stored data)
Troubleshooting Common Issues
“Connection refused” in Grafana or Telegraf
Check containers are running: docker ps
Verify the network: docker network inspect monitoring
Use container names (http://influxdb:8086), not localhost, when connecting from within the Docker network
No data appearing in Grafana
Run your Python script: python monitor.py
Check InfluxDB Data Explorer at http://localhost:8086 — verify data is arriving
Check Telegraf logs: docker logs telegraf
Ensure your Flux query uses the correct bucket name and measurement name
TOML syntax error in telegraf.conf
Ensure the file is UTF-8 encoded without BOM
Make sure the API token is on a single line with no extra quotes
Verify indentation uses spaces, not tabs
Resources
Full source code and config files: github.com/shazforiot/ai-monitoring-with-TIG-stack
InfluxDB Documentation: docs.influxdata.com/influxdb/v2
Telegraf Plugins: docs.influxdata.com/telegraf/latest/plugins
Grafana Dashboards: grafana.com/grafana/dashboards
Flux Query Language: docs.influxdata.com/flux/latest
Frequently Asked Questions
What is the TIG stack and why use it for AI monitoring?
The TIG stack stands for Telegraf + InfluxDB + Grafana. Telegraf collects metrics from your AI models, InfluxDB stores them as time-series data with nanosecond precision, and Grafana visualizes the data in real-time dashboards with alerting. It is fully open-source, runs in Docker, and is battle-tested by thousands of engineering teams for production monitoring.
Can I monitor any AI model with this setup?
Yes. The Python instrumentation wrapper (monitor.py) works with any model that has a predict() method — scikit-learn, TensorFlow, PyTorch, XGBoost, HuggingFace transformers, and more. You simply replace model.predict(data) with predict(model, data) and metrics are automatically sent to InfluxDB.
What metrics should I track for AI model monitoring?
The five most critical metrics are: latency_ms (inference time per prediction), accuracy (model performance if ground truth is available), memory_mb (process memory to detect leaks), batch_size (throughput volume), and error_rate (failed prediction rate). The included monitor.py tracks all five out of the box.
Do I need Kubernetes to run this monitoring stack?
No. The entire stack runs in Docker containers on your local machine or any single server. You do not need Kubernetes, cloud services, or paid tools. Just Docker Desktop and Python 3.8+. The setup works identically on Windows, macOS, and Linux.
How do Grafana alerts detect model drift?
You configure an alert rule in Grafana that evaluates a condition like “accuracy IS BELOW 0.80 for 5 consecutive minutes.” When your model’s accuracy degrades below the threshold — a sign of data drift or concept drift — Grafana fires the alert and sends a notification to Slack, email, PagerDuty, or any configured channel. This catches silent degradation before it impacts users.
Video Chapters — Quick Navigation
00:00 — Intro & Demo
01:24 — Why AI Model Monitoring Matters
04:38 — Architecture Overview (TIG Stack)
06:20 — Prerequisites
08:09 — Install & Configure InfluxDB
13:05 — Docker Network & Telegraf Setup
14:54 — Configure & Run Telegraf Container
17:11 — Instrument Your AI Model (Python)
19:57 — Install & Connect Grafana
21:35 — Connect Grafana to InfluxDB
23:30 — Build the Dashboard & View Metrics
27:06 — Configure Alerts
28:20 — Final Demo & Recap [...]
August 30, 2020IoT Videos#Raspbian, #Virtualbox, #IoT In this video we are going to see how to install raspbian on virtualbox. Go to https://www.raspberrypi.org/downloads & you will be able to see multiple version of Operating system listed in the page. Download Raspberry Pi Desktop . We will create a new Virtual box VM and follow the steps shown in the video to setup Raspbian desktop. [...]
August 30, 2020IoT Videos#jenkins, #raspberrypi, #DevOps Hello Friends, This video is on Jenkins . Jenkins is a Continuous integration and continuous delivery tool which is used highly as part of DevOps. We are going to setup Jenkins on Raspberry Pi. Lets find out. [...]
