Systems Library / Finance Automation / How to Build an AI Bookkeeping Categorizer
Finance Automation accounting reporting

How to Build an AI Bookkeeping Categorizer

Categorize bank transactions automatically using AI pattern recognition.

Jay Banlasan

Jay Banlasan

The AI Systems Guy

Categorizing bank transactions by hand is mind-numbing work. I built an ai bookkeeping transaction categorizer that reads each transaction, recognizes the vendor, and assigns the correct accounting category automatically. Months of bookkeeping done in minutes.

The AI learns your categorization patterns and gets more accurate over time.

What You Need Before Starting

Step 1: Define Your Chart of Accounts

CHART_OF_ACCOUNTS = {
    "software": {"code": "6100", "name": "Software & Subscriptions"},
    "advertising": {"code": "6200", "name": "Advertising & Marketing"},
    "office": {"code": "6300", "name": "Office Supplies"},
    "travel": {"code": "6400", "name": "Travel & Transportation"},
    "meals": {"code": "6500", "name": "Meals & Entertainment"},
    "professional": {"code": "6600", "name": "Professional Services"},
    "utilities": {"code": "6700", "name": "Utilities & Internet"},
    "insurance": {"code": "6800", "name": "Insurance"},
    "payroll": {"code": "7000", "name": "Payroll & Benefits"},
    "revenue": {"code": "4000", "name": "Revenue"},
    "other": {"code": "6900", "name": "Miscellaneous"}
}

Step 2: Load and Parse Transactions

import csv

def load_transactions(csv_path):
    transactions = []
    with open(csv_path) as f:
        reader = csv.DictReader(f)
        for row in reader:
            transactions.append({
                "date": row.get("Date", ""),
                "description": row.get("Description", ""),
                "amount": float(row.get("Amount", 0)),
                "type": "expense" if float(row.get("Amount", 0)) < 0 else "income"
            })
    return transactions

Step 3: Categorize with AI

import anthropic
import json

client = anthropic.Anthropic()

def categorize_transactions(transactions, categories):
    cat_list = "\n".join(f"- {k}: {v['name']}" for k, v in categories.items())

    prompt = f"""Categorize each bank transaction into one of these categories:
{cat_list}

Transactions:
{json.dumps(transactions[:50], indent=2)}

Return a JSON array where each item has: description, amount, category (key from list above), confidence (high/medium/low)."""

    message = client.messages.create(
        model="claude-sonnet-4-20250514",
        max_tokens=3000,
        messages=[{"role": "user", "content": prompt}]
    )
    return json.loads(message.content[0].text)

Step 4: Learn from Corrections

import sqlite3

def init_learning_db(db_path="categorization.db"):
    conn = sqlite3.connect(db_path)
    conn.execute("""
        CREATE TABLE IF NOT EXISTS vendor_categories (
            vendor_pattern TEXT PRIMARY KEY,
            category TEXT,
            confidence REAL DEFAULT 1.0
        )
    """)
    conn.commit()
    return conn

def learn_from_correction(conn, vendor_pattern, correct_category):
    conn.execute(
        "INSERT OR REPLACE INTO vendor_categories (vendor_pattern, category) VALUES (?,?)",
        (vendor_pattern.lower(), correct_category)
    )
    conn.commit()

def check_known_vendors(conn, description):
    desc_lower = description.lower()
    rows = conn.execute("SELECT vendor_pattern, category FROM vendor_categories").fetchall()
    for pattern, category in rows:
        if pattern in desc_lower:
            return category
    return None

Step 5: Export for Accounting Software

def export_categorized(categorized, output_path="categorized_transactions.csv"):
    with open(output_path, "w", newline="") as f:
        writer = csv.writer(f)
        writer.writerow(["Date", "Description", "Amount", "Category", "Account Code", "Confidence"])
        for txn in categorized:
            cat = CHART_OF_ACCOUNTS.get(txn["category"], CHART_OF_ACCOUNTS["other"])
            writer.writerow([txn.get("date", ""), txn["description"], txn["amount"],
                           cat["name"], cat["code"], txn.get("confidence", "medium")])
    return output_path

What to Build Next

Add split transaction handling for purchases that span multiple categories. A single Amazon order might include office supplies and software. The system should suggest splits based on the transaction amount and vendor history.

Related Reading

Want this system built for your business?

Get a free assessment. We will map every system your business needs and show you the ROI.

Get Your Free Assessment

Related Systems