AI Matching

Catch what rules can't

AI-powered semantic matching understands context and meaning, finding duplicates that traditional rule-based systems miss.

Professional

Semantic Matching, Pay Once

Embeddings convert records to meaning-based vectors. Pay once per record - cached and reused on every scan.

How it works

  1. 1Record fields are converted to a text representation
  2. 2OpenAI embeddings create a 1536-dimension vector for each record
  3. 3Vector similarity finds records with similar meaning, not just matching text
  • One-time embedding cost per record (~$0.0001)
  • Catches typos, abbreviations, and variations
  • Works across languages and formats
  • Cached forever while you're actively scanning

Embedding Configuration

Control which fields are used for semantic matching.

Professional

Recommended Fields for AI Matching

INCLUDE

  • Name fields (FirstName, LastName)
  • Company/Account Name
  • Job Title
  • Email domain
  • Address (City, State)

EXCLUDE

  • Record IDs
  • Timestamps
  • Auto-generated fields
  • Binary/boolean fields
  • Large text areas (notes)

Pro tip: Use descriptive, human-readable fields for best results. The AI understands semantic meaning, so "Vice President of Sales" will match "VP Sales" even without exact text overlap.

Cross-Language Matching

AI embeddings understand meaning across languages and character sets.

Professional

Example: International Company Match

RECORD A

Company: Société Générale

Country: France

RECORD B

Company: Societe Generale SA

Country: FR

AI Match Confidence96%
  • Handles accented characters (é, ü, ñ) seamlessly
  • Matches transliterated names (Beijing = Peking)
  • Understands country code variations (France = FR = FRA)
  • Works with Japanese, Chinese, Korean, Arabic, and more

Smart Embedding Cache

Pay once, scan forever. Your embeddings stay cached as long as you're actively using the platform.

Professional

One-time cost per record Embeddings are generated once and cached indefinitely while you're scanning. No re-embedding costs - ever.

~$0.0001

one-time per record

$0

for every future scan

30 days

while actively scanning

When are embeddings regenerated?

Embeddings are only regenerated when:

  • Record data actually changes
  • You modify your AI field configuration
  • Account becomes inactive for 30+ days

AI Explanations

Natural language explanations help you understand why two records matched.

Business-Friendly Language

"These records likely refer to the same person because they share matching email addresses and similar company names."

  • 2-3 sentence summaries for each match
  • Field-by-field similarity breakdown
  • Powered by Claude AI (Anthropic)
  • Uses AI credits (top up as needed)

Why Foundation Models Beat Custom ML

Some tools train a custom ML model on your data. Here's why we use pre-trained foundation models instead.

Foundation Model Advantages

  • Works immediately at full accuracy — no training period required
  • Catches duplicates it's never seen before (semantic understanding)
  • Same quality for new customers and small datasets
  • No risk of learning from dirty data patterns

Risks of Per-Customer ML Training

Garbage In, Garbage Out

If your existing data is dirty, the model learns from flawed patterns and may perpetuate or amplify existing issues.

Cold Start Problem

Custom models need time and data to learn. New customers or small datasets get worse results initially.

Overfitting to History

The model learns what duplicates looked like in the past. New patterns (acquisitions, naming changes) may be missed.

Black Box Decisions

When a customer asks "why did it merge these?", there's no clear answer. Our approach shows explicit rule + AI scores.

Model Drift

Without continuous retraining, accuracy degrades as data patterns evolve — creating silent failures over time.

Compliance Risk

Regulated industries require explainable decisions. "The ML model decided" isn't an acceptable audit response.

Our embedding-based approach uses a pre-trained foundation model that understands semantic similarity universally, rather than learning customer-specific patterns that may already be flawed.

Traditional Rules vs AI Matching

See what AI matching catches that rules miss.

FeatureTraditional RulesAI Matching
"John Smith" vs "Johnny Smith"
"Acme Corp" vs "ACME Corporation"Maybe
Typos and misspellingsLimitedYes
Different field formats
Confidence percentages
ExplainabilityNoYes

Title:

RECORD A

Name: Bill Johnson

Company: Tech Solutions Inc

Title: VP Sales

RECORD B

Name: William Johnson

Company: TechSolutions

Title: Vice President of Sales

AI Match Confidence94%

"These records likely refer to the same person. Bill is a common nickname for William, and the company names are variations of the same entity. The titles are semantically equivalent."