AI Matching

Catch what rules can't

AI-powered semantic matching understands context and meaning, finding duplicates that traditional rule-based systems miss.

Professional

Semantic Matching, Pay Once

Embeddings convert records to meaning-based vectors. Pay once per record - cached and reused on every scan.

How it works

1Record fields are converted to a text representation
2OpenAI embeddings create a 1536-dimension vector for each record
3Vector similarity finds records with similar meaning, not just matching text

One-time embedding cost per record (~$0.0001)
Catches typos, abbreviations, and variations
Works across languages and formats
Cached forever while you're actively scanning

Embedding Configuration

Control which fields are used for semantic matching.

Professional

Recommended Fields for AI Matching

INCLUDE

Name fields (FirstName, LastName)
Company/Account Name
Job Title
Email domain
Address (City, State)

EXCLUDE

Record IDs
Timestamps
Auto-generated fields
Binary/boolean fields
Large text areas (notes)

Pro tip: Use descriptive, human-readable fields for best results. The AI understands semantic meaning, so "Vice President of Sales" will match "VP Sales" even without exact text overlap.

Cross-Language Matching

AI embeddings understand meaning across languages and character sets.

Professional

Example: International Company Match

RECORD A

Company: Société Générale

Country: France

RECORD B

Company: Societe Generale SA

Country: FR

AI Match Confidence96%

Handles accented characters (é, ü, ñ) seamlessly
Matches transliterated names (Beijing = Peking)
Understands country code variations (France = FR = FRA)
Works with Japanese, Chinese, Korean, Arabic, and more

Smart Embedding Cache

Pay once, scan forever. Your embeddings stay cached as long as you're actively using the platform.

Professional

One-time cost per record — Embeddings are generated once and cached indefinitely while you're scanning. No re-embedding costs - ever.

~$0.0001

one-time per record

for every future scan

30 days

while actively scanning

When are embeddings regenerated?

Embeddings are only regenerated when:

Record data actually changes
You modify your AI field configuration
Account becomes inactive for 30+ days

AI Explanations

Natural language explanations help you understand why two records matched.

Business-Friendly Language

"These records likely refer to the same person because they share matching email addresses and similar company names."

2-3 sentence summaries for each match
Field-by-field similarity breakdown
Powered by Claude AI (Anthropic)
Uses AI credits (top up as needed)

Why Foundation Models Beat Custom ML

Some tools train a custom ML model on your data. Here's why we use pre-trained foundation models instead.

Foundation Model Advantages

Works immediately at full accuracy — no training period required
Catches duplicates it's never seen before (semantic understanding)
Same quality for new customers and small datasets
No risk of learning from dirty data patterns

Risks of Per-Customer ML Training

Garbage In, Garbage Out

If your existing data is dirty, the model learns from flawed patterns and may perpetuate or amplify existing issues.

Cold Start Problem

Custom models need time and data to learn. New customers or small datasets get worse results initially.

Overfitting to History

The model learns what duplicates looked like in the past. New patterns (acquisitions, naming changes) may be missed.

Black Box Decisions

When a customer asks "why did it merge these?", there's no clear answer. Our approach shows explicit rule + AI scores.

Model Drift

Without continuous retraining, accuracy degrades as data patterns evolve — creating silent failures over time.

Compliance Risk

Regulated industries require explainable decisions. "The ML model decided" isn't an acceptable audit response.

Our embedding-based approach uses a pre-trained foundation model that understands semantic similarity universally, rather than learning customer-specific patterns that may already be flawed.

Traditional Rules vs AI Matching

See what AI matching catches that rules miss.

Feature	Traditional Rules	AI Matching
"John Smith" vs "Johnny Smith"
"Acme Corp" vs "ACME Corporation"	Maybe
Typos and misspellings	Limited	Yes
Different field formats
Confidence percentages
Explainability	No	Yes

Title:

RECORD A

Name: Bill Johnson

Company: Tech Solutions Inc

Title: VP Sales

RECORD B

Name: William Johnson

Company: TechSolutions

Title: Vice President of Sales

AI Match Confidence94%

"These records likely refer to the same person. Bill is a common nickname for William, and the company names are variations of the same entity. The titles are semantically equivalent."

PreviousData Quality NextSmart Merging