Systems Library / Sales Automation / How to Automate Duplicate Contact Detection in CRM
Sales Automation crm pipeline

How to Automate Duplicate Contact Detection in CRM

Find and merge duplicate contacts in your CRM automatically.

Jay Banlasan

Jay Banlasan

The AI Systems Guy

Duplicate contacts in your CRM wreck reporting and confuse reps. This system automates duplicate contact detection and crm cleanup so your data stays clean. I run this weekly for every CRM I manage.

What You Need Before Starting

Step 1: Extract Contact Records

Pull all contacts from your CRM for comparison.

import pandas as pd
from fuzzywuzzy import fuzz

def fetch_all_contacts(crm_client):
    contacts = crm_client.get_contacts(
        properties=["email", "firstname", "lastname", "phone", "company"],
        limit=10000
    )
    return pd.DataFrame(contacts)

Step 2: Build Match Scores

Compare contacts using fuzzy matching on name, email, and company.

def find_duplicates(contacts_df, threshold=85):
    duplicates = []
    for i in range(len(contacts_df)):
        for j in range(i + 1, len(contacts_df)):
            name_score = fuzz.ratio(
                f"{contacts_df.iloc[i]['firstname']} {contacts_df.iloc[i]['lastname']}",
                f"{contacts_df.iloc[j]['firstname']} {contacts_df.iloc[j]['lastname']}"
            )
            email_score = fuzz.ratio(
                contacts_df.iloc[i]["email"],
                contacts_df.iloc[j]["email"]
            )
            combined = (name_score * 0.4) + (email_score * 0.6)
            if combined >= threshold:
                duplicates.append({
                    "contact_a": contacts_df.iloc[i]["id"],
                    "contact_b": contacts_df.iloc[j]["id"],
                    "score": combined,
                })
    return duplicates

Step 3: Define Merge Strategy

The record with more activity data wins. Merge the other into it.

def merge_contacts(crm_client, primary_id, secondary_id):
    primary = crm_client.get_contact(primary_id)
    secondary = crm_client.get_contact(secondary_id)
    merged = {}
    for field in ["phone", "company", "title"]:
        merged[field] = primary.get(field) or secondary.get(field)
    crm_client.update_contact(primary_id, merged)
    crm_client.merge_activities(from_id=secondary_id, to_id=primary_id)
    crm_client.archive_contact(secondary_id)

Step 4: Run Weekly Scans

Send the results for review before any auto-merging happens.

def weekly_duplicate_report():
    contacts = fetch_all_contacts(crm_client)
    dupes = find_duplicates(contacts)
    report = f"Found {len(dupes)} potential duplicates:\n"
    for d in dupes[:20]:
        report += f"- Score {d['score']:.0f}: {d['contact_a']} <-> {d['contact_b']}\n"
    send_report(report, "[email protected]")

Step 5: Set Prevention Rules

Stop duplicates at the source. Check before creating new contacts.

def check_before_create(crm_client, email, name):
    existing = crm_client.search_contacts(email=email)
    if existing:
        return {"action": "update", "contact": existing[0]}
    name_matches = crm_client.search_contacts(name=name)
    if name_matches:
        return {"action": "review", "matches": name_matches}
    return {"action": "create"}

What to Build Next

Connect this to your lead capture forms. Every new submission should run through the duplicate check before hitting the CRM.

Related Reading

Want this system built for your business?

Get a free assessment. We will map every system your business needs and show you the ROI.

Get Your Free Assessment

Related Systems