How to Automate Duplicate Contact Detection in CRM
Find and merge duplicate contacts in your CRM automatically.
Jay Banlasan
The AI Systems Guy
Duplicate contacts in your CRM wreck reporting and confuse reps. This system automates duplicate contact detection and crm cleanup so your data stays clean. I run this weekly for every CRM I manage.
What You Need Before Starting
- Python 3.8+
- CRM API access
- pandas installed
- SMTP or Slack for notifications
Step 1: Extract Contact Records
Pull all contacts from your CRM for comparison.
import pandas as pd
from fuzzywuzzy import fuzz
def fetch_all_contacts(crm_client):
contacts = crm_client.get_contacts(
properties=["email", "firstname", "lastname", "phone", "company"],
limit=10000
)
return pd.DataFrame(contacts)
Step 2: Build Match Scores
Compare contacts using fuzzy matching on name, email, and company.
def find_duplicates(contacts_df, threshold=85):
duplicates = []
for i in range(len(contacts_df)):
for j in range(i + 1, len(contacts_df)):
name_score = fuzz.ratio(
f"{contacts_df.iloc[i]['firstname']} {contacts_df.iloc[i]['lastname']}",
f"{contacts_df.iloc[j]['firstname']} {contacts_df.iloc[j]['lastname']}"
)
email_score = fuzz.ratio(
contacts_df.iloc[i]["email"],
contacts_df.iloc[j]["email"]
)
combined = (name_score * 0.4) + (email_score * 0.6)
if combined >= threshold:
duplicates.append({
"contact_a": contacts_df.iloc[i]["id"],
"contact_b": contacts_df.iloc[j]["id"],
"score": combined,
})
return duplicates
Step 3: Define Merge Strategy
The record with more activity data wins. Merge the other into it.
def merge_contacts(crm_client, primary_id, secondary_id):
primary = crm_client.get_contact(primary_id)
secondary = crm_client.get_contact(secondary_id)
merged = {}
for field in ["phone", "company", "title"]:
merged[field] = primary.get(field) or secondary.get(field)
crm_client.update_contact(primary_id, merged)
crm_client.merge_activities(from_id=secondary_id, to_id=primary_id)
crm_client.archive_contact(secondary_id)
Step 4: Run Weekly Scans
Send the results for review before any auto-merging happens.
def weekly_duplicate_report():
contacts = fetch_all_contacts(crm_client)
dupes = find_duplicates(contacts)
report = f"Found {len(dupes)} potential duplicates:\n"
for d in dupes[:20]:
report += f"- Score {d['score']:.0f}: {d['contact_a']} <-> {d['contact_b']}\n"
send_report(report, "[email protected]")
Step 5: Set Prevention Rules
Stop duplicates at the source. Check before creating new contacts.
def check_before_create(crm_client, email, name):
existing = crm_client.search_contacts(email=email)
if existing:
return {"action": "update", "contact": existing[0]}
name_matches = crm_client.search_contacts(name=name)
if name_matches:
return {"action": "review", "matches": name_matches}
return {"action": "create"}
What to Build Next
Connect this to your lead capture forms. Every new submission should run through the duplicate check before hitting the CRM.
Related Reading
- Creating Automated Data Cleanup Routines - automated data cleanup routines
- Cost of Manual vs Cost of Automated - cost manual vs automated operations
- Competitive Intelligence with AI - competitive intelligence ai automated
Want this system built for your business?
Get a free assessment. We will map every system your business needs and show you the ROI.
Get Your Free Assessment