Operations & Admin
document management
How to Build an AI Document Filing System
Automatically classify and file documents into the right folders using AI.
Jay Banlasan
The AI Systems Guy
This ai document filing and organization system classifies documents and routes them to the right folder automatically. I use it to keep 10,000+ files organized without manual sorting.
What You Need Before Starting
- Python 3.8+
- Claude or GPT API key
- python-docx or PyPDF2 installed
- Storage system (local or S3)
Step 1: Set Up Document Processing
Build the foundation for document filing.
import os
import json
from datetime import datetime
def init_doc_system(base_dir):
os.makedirs(os.path.join(base_dir, "inbox"), exist_ok=True)
os.makedirs(os.path.join(base_dir, "processed"), exist_ok=True)
os.makedirs(os.path.join(base_dir, "archive"), exist_ok=True)
import sqlite3
conn = sqlite3.connect(os.path.join(base_dir, "documents.db"))
conn.execute("""CREATE TABLE IF NOT EXISTS documents (
id INTEGER PRIMARY KEY AUTOINCREMENT,
filename TEXT, doc_type TEXT, status TEXT,
metadata TEXT, processed_at TEXT
)""")
conn.commit()
return conn
Step 2: Process Documents
Extract content and metadata from incoming documents.
def process_document(file_path):
ext = os.path.splitext(file_path)[1].lower()
if ext == ".pdf":
import PyPDF2
with open(file_path, "rb") as f:
reader = PyPDF2.PdfReader(f)
text = "\n".join(page.extract_text() for page in reader.pages)
elif ext == ".docx":
import docx
doc = docx.Document(file_path)
text = "\n".join(p.text for p in doc.paragraphs)
else:
with open(file_path) as f:
text = f.read()
return {"filename": os.path.basename(file_path), "text": text, "ext": ext}
Step 3: Analyze with AI
Use Claude to classify, tag, or review the document.
import anthropic
def analyze_document(doc_data):
client = anthropic.Anthropic()
message = client.messages.create(
model="claude-sonnet-4-20250514", max_tokens=1500,
messages=[{"role": "user",
"content": f"Analyze this document and provide: 1) Document type 2) Key data points 3) Summary\n\n{doc_data['text'][:3000]}"}])
return message.content[0].text
Step 4: Route and Store
Move processed documents to the right location.
import shutil
def route_document(doc_data, analysis, base_dir, conn):
doc_type = extract_type(analysis)
dest_dir = os.path.join(base_dir, "processed", doc_type)
os.makedirs(dest_dir, exist_ok=True)
dest_path = os.path.join(dest_dir, doc_data["filename"])
shutil.move(doc_data["original_path"], dest_path)
conn.execute(
"INSERT INTO documents (filename, doc_type, status, metadata, processed_at) VALUES (?, ?, ?, ?, ?)",
(doc_data["filename"], doc_type, "processed", analysis, datetime.now().isoformat()))
conn.commit()
Step 5: Monitor and Report
Track processing stats and flag issues.
def daily_report(conn):
stats = conn.execute("""
SELECT doc_type, COUNT(*), MAX(processed_at)
FROM documents WHERE processed_at >= date('now', '-1 day')
GROUP BY doc_type
""").fetchall()
report = "Daily Document Processing:\n"
for doc_type, count, last in stats:
report += f" {doc_type}: {count} documents (last: {last})\n"
return report
What to Build Next
Add retention policies. Auto-archive old documents and flag retention limits.
Related Reading
- Building a Document Generation System - document generation system ai
- The Compounding Advantage Nobody Talks About - compounding advantage of ai systems
- Cross-Functional AI: When Marketing Talks to Operations - cross functional ai systems
Want this system built for your business?
Get a free assessment. We will map every system your business needs and show you the ROI.
Get Your Free Assessment