How to Build an AI-Powered Headline Testing System

Test and optimize content headlines using AI scoring and A/B testing.

Jay Banlasan

The AI Systems Guy

Most content teams pick headlines by gut feel or by whoever speaks loudest in the meeting. This ai headline testing optimization system gives you a scoring framework that runs before you publish, plus a structure for tracking which headline types actually win. I use it to score every headline against seven criteria before it goes live, and I log the results so the system gets smarter over time.

The ROI is straightforward. A headline that pulls 2x the clicks at the same ranking position doubles your traffic without touching your SEO spend. This system makes headline optimization a process, not a guess.

What You Need Before Starting

Python 3.10 or higher
Anthropic API key
SQLite (built into Python, no install needed)
pip install anthropic python-dotenv

Step 1: Set Up the Scoring Database

You need somewhere to store headlines and their scores so you can track patterns over time:

import sqlite3
import os

def init_db(db_path: str = "headlines.db"):
    conn = sqlite3.connect(db_path)
    cursor = conn.cursor()
    
    cursor.execute("""
        CREATE TABLE IF NOT EXISTS headlines (
            id INTEGER PRIMARY KEY AUTOINCREMENT,
            original_headline TEXT NOT NULL,
            variant TEXT NOT NULL,
            topic TEXT,
            audience TEXT,
            clarity_score INTEGER,
            curiosity_score INTEGER,
            specificity_score INTEGER,
            benefit_score INTEGER,
            urgency_score INTEGER,
            emotion_score INTEGER,
            keyword_score INTEGER,
            total_score INTEGER,
            ai_notes TEXT,
            actual_ctr REAL,
            created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
        )
    """)
    
    conn.commit()
    conn.close()
    print(f"Database ready at {db_path}")

Step 2: Build the AI Scorer

This function sends a headline to Claude and gets back structured scores across seven dimensions:

import anthropic
import json
from dotenv import load_dotenv

load_dotenv()
client = anthropic.Anthropic(api_key=os.getenv("ANTHROPIC_API_KEY"))

def score_headline(headline: str, topic: str, audience: str) -> dict:
    prompt = f"""Score this headline across 7 dimensions. Return JSON only, no other text.

HEADLINE: {headline}
TOPIC: {topic}
TARGET AUDIENCE: {audience}

Score each dimension from 1-10:

1. clarity: Does the reader instantly know what the article is about?
2. curiosity: Does it make you want to click without being clickbait?
3. specificity: Does it contain numbers, names, timeframes, or specific claims?
4. benefit: Does it communicate what the reader will gain?
5. urgency: Does it imply timeliness or importance?
6. emotion: Does it trigger a feeling (fear, hope, frustration, excitement)?
7. keyword: Does it naturally contain a searchable phrase?

Also include:
- total: sum of all scores (max 70)
- grade: "A" (60-70), "B" (50-59), "C" (40-49), "D" (below 40)
- top_strength: one sentence on its best quality
- top_weakness: one sentence on its biggest problem
- suggested_fix: rewrite of the headline that scores higher

Return format:
{{
  "clarity": 0,
  "curiosity": 0,
  "specificity": 0,
  "benefit": 0,
  "urgency": 0,
  "emotion": 0,
  "keyword": 0,
  "total": 0,
  "grade": "",
  "top_strength": "",
  "top_weakness": "",
  "suggested_fix": ""
}}"""

    message = client.messages.create(
        model="claude-opus-4-5",
        max_tokens=600,
        messages=[{"role": "user", "content": prompt}]
    )
    
    raw = message.content[0].text.strip()
    return json.loads(raw)

Step 3: Generate Headline Variants

Instead of scoring just one headline, generate a batch of variants and score them all:

def generate_variants(topic: str, audience: str, keyword: str, count: int = 8) -> list:
    prompt = f"""Generate {count} headline variants for this content piece.

TOPIC: {topic}
TARGET AUDIENCE: {audience}
PRIMARY KEYWORD: {keyword}

Use these formats, one each:
1. How to [achieve outcome] in [timeframe]
2. [Number] ways to [solve problem]
3. Why [common belief] is wrong (and what to do instead)
4. The [adjective] guide to [topic]
5. [Specific result]: how [audience] can [replicate it]
6. What [authority] knows about [topic] that you don't
7. Stop [bad habit]. Do this instead.
8. [Number]-step system for [outcome]

Return as a JSON array of strings only."""

    message = client.messages.create(
        model="claude-opus-4-5",
        max_tokens=800,
        messages=[{"role": "user", "content": prompt}]
    )
    
    raw = message.content[0].text.strip()
    return json.loads(raw)

Step 4: Score All Variants and Store Results

def score_and_store(original: str, topic: str, audience: str, keyword: str, db_path: str = "headlines.db"):
    variants = generate_variants(topic, audience, keyword)
    variants.insert(0, original)  # Always include the original
    
    results = []
    conn = sqlite3.connect(db_path)
    cursor = conn.cursor()
    
    for variant in variants:
        print(f"Scoring: {variant[:60]}...")
        scores = score_headline(variant, topic, audience)
        
        cursor.execute("""
            INSERT INTO headlines 
            (original_headline, variant, topic, audience, clarity_score, curiosity_score,
             specificity_score, benefit_score, urgency_score, emotion_score, keyword_score,
             total_score, ai_notes)
            VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
        """, (
            original, variant, topic, audience,
            scores["clarity"], scores["curiosity"], scores["specificity"],
            scores["benefit"], scores["urgency"], scores["emotion"], scores["keyword"],
            scores["total"],
            f"{scores['top_strength']} | Fix: {scores['suggested_fix']}"
        ))
        
        results.append({"headline": variant, "scores": scores})
    
    conn.commit()
    conn.close()
    
    results.sort(key=lambda x: x["scores"]["total"], reverse=True)
    return results

Step 5: Print a Ranked Report

def print_report(results: list):
    print("\n" + "="*60)
    print("HEADLINE SCORING REPORT")
    print("="*60)
    
    for i, item in enumerate(results, 1):
        s = item["scores"]
        print(f"\n#{i} [{s['grade']}] Score: {s['total']}/70")
        print(f"   {item['headline']}")
        print(f"   Strength: {s['top_strength']}")
        print(f"   Weakness: {s['top_weakness']}")

if __name__ == "__main__":
    init_db()
    
    results = score_and_store(
        original="Content Brief Generator Tool",
        topic="Building an AI-powered content brief generator for SEO teams",
        audience="Content managers and SEO specialists at agencies",
        keyword="ai content brief generator"
    )
    
    print_report(results)
    print(f"\nWinner: {results[0]['headline']}")

What to Build Next

Add a feedback loop that records actual CTR from your CMS and updates the actual_ctr field, then trains future scoring on real data
Build a Slack slash command that lets any team member score a headline on demand without touching the terminal
Export weekly reports showing which headline formats perform best for your specific audience

How to Build an AI-Powered Headline Testing System

What You Need Before Starting

Step 1: Set Up the Scoring Database

Step 2: Build the AI Scorer

Step 3: Generate Headline Variants

Step 4: Score All Variants and Store Results

Step 5: Print a Ranked Report

What to Build Next

Related Reading

Related Systems

How to Build an AI Blog Post Generator

How to Build an AI Product Description Generator

How to Create an AI-Powered FAQ Generator