Systems Library / Infrastructure / How to Build a Server Resource Monitoring System
Infrastructure monitoring

How to Build a Server Resource Monitoring System

Monitor CPU, memory, and disk usage with automated alerts.

Jay Banlasan

Jay Banlasan

The AI Systems Guy

Server resource monitoring for CPU, memory, and disk is the baseline for any production system. I run a lightweight Python script on every VPS I manage. It checks resources every minute, logs to SQLite, and pings me on Slack when something crosses a threshold.

No Datadog subscription needed. Just Python and cron.

What You Need Before Starting

Step 1: Install psutil

pip install psutil

Step 2: Build the Resource Collector

import psutil
import sqlite3
from datetime import datetime

def get_server_stats():
    cpu_percent = psutil.cpu_percent(interval=1)
    memory = psutil.virtual_memory()
    disk = psutil.disk_usage("/")
    
    return {
        "cpu_percent": cpu_percent,
        "memory_percent": memory.percent,
        "memory_used_gb": round(memory.used / (1024**3), 2),
        "memory_total_gb": round(memory.total / (1024**3), 2),
        "disk_percent": disk.percent,
        "disk_used_gb": round(disk.used / (1024**3), 2),
        "disk_total_gb": round(disk.total / (1024**3), 2),
        "timestamp": datetime.utcnow().isoformat()
    }

Step 3: Store Metrics in SQLite

def init_resource_db(db_path="server_metrics.db"):
    conn = sqlite3.connect(db_path)
    conn.execute("""
        CREATE TABLE IF NOT EXISTS resource_snapshots (
            id INTEGER PRIMARY KEY AUTOINCREMENT,
            cpu_percent REAL,
            memory_percent REAL,
            memory_used_gb REAL,
            disk_percent REAL,
            disk_used_gb REAL,
            timestamp TEXT
        )
    """)
    conn.commit()
    conn.close()

def log_stats(stats, db_path="server_metrics.db"):
    conn = sqlite3.connect(db_path)
    conn.execute(
        "INSERT INTO resource_snapshots (cpu_percent, memory_percent, memory_used_gb, disk_percent, disk_used_gb, timestamp) VALUES (?,?,?,?,?,?)",
        (stats["cpu_percent"], stats["memory_percent"], stats["memory_used_gb"],
         stats["disk_percent"], stats["disk_used_gb"], stats["timestamp"])
    )
    conn.commit()
    conn.close()

Step 4: Set Up Threshold Alerts

import requests

THRESHOLDS = {
    "cpu_percent": 90,
    "memory_percent": 85,
    "disk_percent": 80
}

def check_thresholds(stats):
    alerts = []
    if stats["cpu_percent"] > THRESHOLDS["cpu_percent"]:
        alerts.append(f"CPU at {stats['cpu_percent']}%")
    if stats["memory_percent"] > THRESHOLDS["memory_percent"]:
        alerts.append(f"Memory at {stats['memory_percent']}% ({stats['memory_used_gb']}GB)")
    if stats["disk_percent"] > THRESHOLDS["disk_percent"]:
        alerts.append(f"Disk at {stats['disk_percent']}% ({stats['disk_used_gb']}GB)")
    
    if alerts:
        message = "Server Resource Alert:\n" + "\n".join(alerts)
        requests.post("YOUR_SLACK_WEBHOOK", json={"text": message})
    
    return alerts

Step 5: Run on a Schedule

Create the main script:

if __name__ == "__main__":
    init_resource_db()
    stats = get_server_stats()
    log_stats(stats)
    check_thresholds(stats)

Add to crontab to run every minute:

* * * * * python3 /root/monitoring/server_monitor.py

Step 6: Add a Cleanup Job

Keep 30 days of data and purge the rest:

def cleanup_old_data(days=30, db_path="server_metrics.db"):
    conn = sqlite3.connect(db_path)
    conn.execute("DELETE FROM resource_snapshots WHERE timestamp < datetime('now', ?)", (f"-{days} days",))
    conn.commit()
    conn.close()

Run cleanup daily:

0 3 * * * python3 -c "from server_monitor import cleanup_old_data; cleanup_old_data()"

What to Build Next

Add trend detection so you get warned when disk usage is growing 2%+ per day. That gives you time to clean up or expand before you hit 100%.

Related Reading

Want this system built for your business?

Get a free assessment. We will map every system your business needs and show you the ROI.

Get Your Free Assessment

Related Systems