Infrastructure
monitoring
How to Create Automated Health Check Systems
Run automated health checks on all endpoints and services.
Jay Banlasan
The AI Systems Guy
An automated health check system for endpoint monitoring tells you which services are up, which are degraded, and which are down. I run health checks across every API, database, and external service my systems depend on. When something breaks at 3am, I find out from my alert, not my client.
What You Need Before Starting
- Python 3.8+
- A list of endpoints and services to monitor
- A notification channel (Slack webhook, email, or SMS)
- Cron or a task scheduler
Step 1: Define Your Health Check Registry
HEALTH_CHECKS = [
{
"name": "Main API",
"url": "https://api.yoursite.com/health",
"method": "GET",
"expected_status": 200,
"timeout_seconds": 10
},
{
"name": "Database",
"type": "db",
"connection_string": "sqlite:///production.db"
},
{
"name": "Claude API",
"url": "https://api.anthropic.com/v1/messages",
"method": "HEAD",
"expected_status": 401,
"timeout_seconds": 5
},
{
"name": "Webhook Endpoint",
"url": "https://yoursite.com/webhook/intake",
"method": "GET",
"expected_status": 200,
"timeout_seconds": 8
}
]
Step 2: Build the Health Check Runner
import requests
import sqlite3
from datetime import datetime
def check_http(check):
try:
response = requests.request(
method=check.get("method", "GET"),
url=check["url"],
timeout=check.get("timeout_seconds", 10)
)
is_healthy = response.status_code == check.get("expected_status", 200)
return {
"name": check["name"],
"healthy": is_healthy,
"status_code": response.status_code,
"response_ms": round(response.elapsed.total_seconds() * 1000, 2)
}
except requests.exceptions.Timeout:
return {"name": check["name"], "healthy": False, "error": "timeout"}
except requests.exceptions.ConnectionError:
return {"name": check["name"], "healthy": False, "error": "connection_failed"}
def check_database(check):
try:
conn = sqlite3.connect(check["connection_string"].replace("sqlite:///", ""))
conn.execute("SELECT 1")
conn.close()
return {"name": check["name"], "healthy": True}
except Exception as e:
return {"name": check["name"], "healthy": False, "error": str(e)}
def run_all_checks():
results = []
for check in HEALTH_CHECKS:
if check.get("type") == "db":
results.append(check_database(check))
else:
results.append(check_http(check))
return results
Step 3: Log Results to a Database
def init_health_db(db_path="health_checks.db"):
conn = sqlite3.connect(db_path)
conn.execute("""
CREATE TABLE IF NOT EXISTS check_results (
id INTEGER PRIMARY KEY AUTOINCREMENT,
service_name TEXT,
healthy INTEGER,
response_ms REAL,
error TEXT,
checked_at TEXT
)
""")
conn.commit()
conn.close()
def log_results(results, db_path="health_checks.db"):
conn = sqlite3.connect(db_path)
for r in results:
conn.execute(
"INSERT INTO check_results (service_name, healthy, response_ms, error, checked_at) VALUES (?,?,?,?,?)",
(r["name"], 1 if r["healthy"] else 0, r.get("response_ms"), r.get("error"), datetime.utcnow().isoformat())
)
conn.commit()
conn.close()
Step 4: Add Alerting for Failures
def alert_on_failures(results):
failures = [r for r in results if not r["healthy"]]
if not failures:
return
message = "Health Check Failures:\n"
for f in failures:
error_detail = f.get("error", f"status {f.get('status_code', 'unknown')}")
message += f" {f['name']}: {error_detail}\n"
requests.post("YOUR_SLACK_WEBHOOK", json={"text": message})
Step 5: Schedule It
*/2 * * * * python3 /path/to/health_check.py
The main script ties it all together:
if __name__ == "__main__":
init_health_db()
results = run_all_checks()
log_results(results)
alert_on_failures(results)
What to Build Next
Add consecutive failure tracking so you only get alerted after two or three failures in a row. Single blips happen. Sustained outages need action.
Related Reading
- Why Monitoring Is Not Optional - why every production system needs monitoring
- Designing for Failure - building systems that handle failure gracefully
- How to Build Automated Alerts That Actually Help - alert design that reduces false positives
Want this system built for your business?
Get a free assessment. We will map every system your business needs and show you the ROI.
Get Your Free Assessment