Systems Library / Infrastructure / How to Build a Server Resource Monitoring System

Infrastructure monitoring

How to Build a Server Resource Monitoring System

Monitor CPU, memory, and disk usage with automated alerts.

Jay Banlasan

The AI Systems Guy

Server resource monitoring for CPU, memory, and disk is the baseline for any production system. I run a lightweight Python script on every VPS I manage. It checks resources every minute, logs to SQLite, and pings me on Slack when something crosses a threshold.

No Datadog subscription needed. Just Python and cron.

What You Need Before Starting

A Linux VPS or server
Python 3.8+ with psutil installed
A Slack webhook URL for alerts
SSH access to the server

Step 1: Install psutil

pip install psutil

Step 2: Build the Resource Collector

import psutil
import sqlite3
from datetime import datetime

def get_server_stats():
    cpu_percent = psutil.cpu_percent(interval=1)
    memory = psutil.virtual_memory()
    disk = psutil.disk_usage("/")
    
    return {
        "cpu_percent": cpu_percent,
        "memory_percent": memory.percent,
        "memory_used_gb": round(memory.used / (1024**3), 2),
        "memory_total_gb": round(memory.total / (1024**3), 2),
        "disk_percent": disk.percent,
        "disk_used_gb": round(disk.used / (1024**3), 2),
        "disk_total_gb": round(disk.total / (1024**3), 2),
        "timestamp": datetime.utcnow().isoformat()
    }

Step 3: Store Metrics in SQLite

def init_resource_db(db_path="server_metrics.db"):
    conn = sqlite3.connect(db_path)
    conn.execute("""
        CREATE TABLE IF NOT EXISTS resource_snapshots (
            id INTEGER PRIMARY KEY AUTOINCREMENT,
            cpu_percent REAL,
            memory_percent REAL,
            memory_used_gb REAL,
            disk_percent REAL,
            disk_used_gb REAL,
            timestamp TEXT
        )
    """)
    conn.commit()
    conn.close()

def log_stats(stats, db_path="server_metrics.db"):
    conn = sqlite3.connect(db_path)
    conn.execute(
        "INSERT INTO resource_snapshots (cpu_percent, memory_percent, memory_used_gb, disk_percent, disk_used_gb, timestamp) VALUES (?,?,?,?,?,?)",
        (stats["cpu_percent"], stats["memory_percent"], stats["memory_used_gb"],
         stats["disk_percent"], stats["disk_used_gb"], stats["timestamp"])
    )
    conn.commit()
    conn.close()

Step 4: Set Up Threshold Alerts

import requests

THRESHOLDS = {
    "cpu_percent": 90,
    "memory_percent": 85,
    "disk_percent": 80
}

def check_thresholds(stats):
    alerts = []
    if stats["cpu_percent"] > THRESHOLDS["cpu_percent"]:
        alerts.append(f"CPU at {stats['cpu_percent']}%")
    if stats["memory_percent"] > THRESHOLDS["memory_percent"]:
        alerts.append(f"Memory at {stats['memory_percent']}% ({stats['memory_used_gb']}GB)")
    if stats["disk_percent"] > THRESHOLDS["disk_percent"]:
        alerts.append(f"Disk at {stats['disk_percent']}% ({stats['disk_used_gb']}GB)")
    
    if alerts:
        message = "Server Resource Alert:\n" + "\n".join(alerts)
        requests.post("YOUR_SLACK_WEBHOOK", json={"text": message})
    
    return alerts

Step 5: Run on a Schedule

Create the main script:

if __name__ == "__main__":
    init_resource_db()
    stats = get_server_stats()
    log_stats(stats)
    check_thresholds(stats)

Add to crontab to run every minute:

* * * * * python3 /root/monitoring/server_monitor.py

Step 6: Add a Cleanup Job

Keep 30 days of data and purge the rest:

def cleanup_old_data(days=30, db_path="server_metrics.db"):
    conn = sqlite3.connect(db_path)
    conn.execute("DELETE FROM resource_snapshots WHERE timestamp < datetime('now', ?)", (f"-{days} days",))
    conn.commit()
    conn.close()

Run cleanup daily:

0 3 * * * python3 -c "from server_monitor import cleanup_old_data; cleanup_old_data()"

What to Build Next

Add trend detection so you get warned when disk usage is growing 2%+ per day. That gives you time to clean up or expand before you hit 100%.

How to Build a Server Resource Monitoring System

What You Need Before Starting

Step 1: Install psutil

Step 2: Build the Resource Collector

Step 3: Store Metrics in SQLite

Step 4: Set Up Threshold Alerts

Step 5: Run on a Schedule

Step 6: Add a Cleanup Job

What to Build Next

Related Reading

Related Systems

How to Set Up Application Performance Monitoring

How to Create an Automated Incident Response System