How to Build a Server Resource Monitoring System
Monitor CPU, memory, and disk usage with automated alerts.
Jay Banlasan
The AI Systems Guy
Server resource monitoring for CPU, memory, and disk is the baseline for any production system. I run a lightweight Python script on every VPS I manage. It checks resources every minute, logs to SQLite, and pings me on Slack when something crosses a threshold.
No Datadog subscription needed. Just Python and cron.
What You Need Before Starting
- A Linux VPS or server
- Python 3.8+ with psutil installed
- A Slack webhook URL for alerts
- SSH access to the server
Step 1: Install psutil
pip install psutil
Step 2: Build the Resource Collector
import psutil
import sqlite3
from datetime import datetime
def get_server_stats():
cpu_percent = psutil.cpu_percent(interval=1)
memory = psutil.virtual_memory()
disk = psutil.disk_usage("/")
return {
"cpu_percent": cpu_percent,
"memory_percent": memory.percent,
"memory_used_gb": round(memory.used / (1024**3), 2),
"memory_total_gb": round(memory.total / (1024**3), 2),
"disk_percent": disk.percent,
"disk_used_gb": round(disk.used / (1024**3), 2),
"disk_total_gb": round(disk.total / (1024**3), 2),
"timestamp": datetime.utcnow().isoformat()
}
Step 3: Store Metrics in SQLite
def init_resource_db(db_path="server_metrics.db"):
conn = sqlite3.connect(db_path)
conn.execute("""
CREATE TABLE IF NOT EXISTS resource_snapshots (
id INTEGER PRIMARY KEY AUTOINCREMENT,
cpu_percent REAL,
memory_percent REAL,
memory_used_gb REAL,
disk_percent REAL,
disk_used_gb REAL,
timestamp TEXT
)
""")
conn.commit()
conn.close()
def log_stats(stats, db_path="server_metrics.db"):
conn = sqlite3.connect(db_path)
conn.execute(
"INSERT INTO resource_snapshots (cpu_percent, memory_percent, memory_used_gb, disk_percent, disk_used_gb, timestamp) VALUES (?,?,?,?,?,?)",
(stats["cpu_percent"], stats["memory_percent"], stats["memory_used_gb"],
stats["disk_percent"], stats["disk_used_gb"], stats["timestamp"])
)
conn.commit()
conn.close()
Step 4: Set Up Threshold Alerts
import requests
THRESHOLDS = {
"cpu_percent": 90,
"memory_percent": 85,
"disk_percent": 80
}
def check_thresholds(stats):
alerts = []
if stats["cpu_percent"] > THRESHOLDS["cpu_percent"]:
alerts.append(f"CPU at {stats['cpu_percent']}%")
if stats["memory_percent"] > THRESHOLDS["memory_percent"]:
alerts.append(f"Memory at {stats['memory_percent']}% ({stats['memory_used_gb']}GB)")
if stats["disk_percent"] > THRESHOLDS["disk_percent"]:
alerts.append(f"Disk at {stats['disk_percent']}% ({stats['disk_used_gb']}GB)")
if alerts:
message = "Server Resource Alert:\n" + "\n".join(alerts)
requests.post("YOUR_SLACK_WEBHOOK", json={"text": message})
return alerts
Step 5: Run on a Schedule
Create the main script:
if __name__ == "__main__":
init_resource_db()
stats = get_server_stats()
log_stats(stats)
check_thresholds(stats)
Add to crontab to run every minute:
* * * * * python3 /root/monitoring/server_monitor.py
Step 6: Add a Cleanup Job
Keep 30 days of data and purge the rest:
def cleanup_old_data(days=30, db_path="server_metrics.db"):
conn = sqlite3.connect(db_path)
conn.execute("DELETE FROM resource_snapshots WHERE timestamp < datetime('now', ?)", (f"-{days} days",))
conn.commit()
conn.close()
Run cleanup daily:
0 3 * * * python3 -c "from server_monitor import cleanup_old_data; cleanup_old_data()"
What to Build Next
Add trend detection so you get warned when disk usage is growing 2%+ per day. That gives you time to clean up or expand before you hit 100%.
Related Reading
- Why Monitoring Is Not Optional - the monitoring foundation every system needs
- How to Think About System Performance - performance mental models
- The Monitoring Stack - how monitoring layers fit together
Want this system built for your business?
Get a free assessment. We will map every system your business needs and show you the ROI.
Get Your Free Assessment