How to Build a Shared Document Library with AI Tagging

Organize shared documents with AI-generated tags and categories.

Jay Banlasan

The AI Systems Guy

This ai document library with tagging and organization uses AI to auto-tag, categorize, and maintain a searchable shared document library.

What You Need Before Starting

Python 3.8+
Claude or GPT API key
python-docx or PyPDF2 installed
Storage system (local or S3)

Step 1: Set Up Document Processing

Build the foundation for document library.

import os
import json
from datetime import datetime

def init_doc_system(base_dir):
    os.makedirs(os.path.join(base_dir, "inbox"), exist_ok=True)
    os.makedirs(os.path.join(base_dir, "processed"), exist_ok=True)
    os.makedirs(os.path.join(base_dir, "archive"), exist_ok=True)

    import sqlite3
    conn = sqlite3.connect(os.path.join(base_dir, "documents.db"))
    conn.execute("""CREATE TABLE IF NOT EXISTS documents (
        id INTEGER PRIMARY KEY AUTOINCREMENT,
        filename TEXT, doc_type TEXT, status TEXT,
        metadata TEXT, processed_at TEXT
    )""")
    conn.commit()
    return conn

Step 2: Process Documents

Extract content and metadata from incoming documents.

def process_document(file_path):
    ext = os.path.splitext(file_path)[1].lower()
    if ext == ".pdf":
        import PyPDF2
        with open(file_path, "rb") as f:
            reader = PyPDF2.PdfReader(f)
            text = "\n".join(page.extract_text() for page in reader.pages)
    elif ext == ".docx":
        import docx
        doc = docx.Document(file_path)
        text = "\n".join(p.text for p in doc.paragraphs)
    else:
        with open(file_path) as f:
            text = f.read()
    return {"filename": os.path.basename(file_path), "text": text, "ext": ext}

Step 3: Analyze with AI

Use Claude to classify, tag, or review the document.

import anthropic

def analyze_document(doc_data):
    client = anthropic.Anthropic()
    message = client.messages.create(
        model="claude-sonnet-4-20250514", max_tokens=1500,
        messages=[{"role": "user",
            "content": f"Analyze this document and provide: 1) Document type 2) Key data points 3) Summary\n\n{doc_data['text'][:3000]}"}])
    return message.content[0].text

Step 4: Route and Store

Move processed documents to the right location.

import shutil

def route_document(doc_data, analysis, base_dir, conn):
    doc_type = extract_type(analysis)
    dest_dir = os.path.join(base_dir, "processed", doc_type)
    os.makedirs(dest_dir, exist_ok=True)

    dest_path = os.path.join(dest_dir, doc_data["filename"])
    shutil.move(doc_data["original_path"], dest_path)

    conn.execute(
        "INSERT INTO documents (filename, doc_type, status, metadata, processed_at) VALUES (?, ?, ?, ?, ?)",
        (doc_data["filename"], doc_type, "processed", analysis, datetime.now().isoformat()))
    conn.commit()

Step 5: Monitor and Report

Track processing stats and flag issues.

def daily_report(conn):
    stats = conn.execute("""
        SELECT doc_type, COUNT(*), MAX(processed_at)
        FROM documents WHERE processed_at >= date('now', '-1 day')
        GROUP BY doc_type
    """).fetchall()

    report = "Daily Document Processing:\n"
    for doc_type, count, last in stats:
        report += f"  {doc_type}: {count} documents (last: {last})\n"
    return report

What to Build Next

Add usage analytics. See which documents get accessed most and which are stale.

How to Build a Shared Document Library with AI Tagging

What You Need Before Starting

Step 1: Set Up Document Processing

Step 2: Process Documents

Step 3: Analyze with AI

Step 4: Route and Store

Step 5: Monitor and Report

What to Build Next

Related Reading

Related Systems

How to Build an AI Document Filing System

How to Build an AI Document Summarizer

How to Build an AI-Powered OCR Document Processor