Most Data Cloud implementations fail at identity resolution. Not because the feature doesn’t work. Because architects treat it like a configuration exercise instead of an architectural decision that determines whether the entire platform delivers value.
The data cloud identity resolution architecture you build in the first 30 days defines whether you get a unified customer view or a fragmented mess that costs six months to fix. The difference isn’t the matching rules. It’s understanding what identity resolution actually does at the data model level and designing for the conflicts that emerge at scale.
The Identity Resolution Problem
Identity resolution solves one problem: determining when records from different systems represent the same person. A customer who buys in-store, browses online, and calls support generates three records across three systems. Identity resolution creates a Unified Individual that ties them together.

In practice, this means:
- Matching rules that compare fields across Data Model Objects (DMOs)
- A reconciliation strategy that decides which field values win when sources conflict
- A Unified Individual record that serves as the single source of truth
- Downstream segments and calculated insights that reference the unified profile
The architecture that works at 10,000 records breaks at 1 million. Not because of performance. Because edge cases that represent 0.1% of your data become 1,000 records that break downstream processes.
Matching Rules: The Core Architecture Decision
Identity resolution uses matching rules to compare records. A rule defines which DMOs to compare, which fields to match on, and how exact the match needs to be.

The naive approach: match on email across all DMOs. That works until you encounter:
- Shared family emails (multiple people, one address)
- Corporate emails (multiple employees at same company)
- Typos in one system but not others
- Email changes over time (maiden name to married name)
Matching rules survive at scale only when they’re layered:
Exact match layer — High-confidence matches on unique identifiers. Loyalty program ID, customer number, mobile phone. These create the core of your Unified Individual graph.
Fuzzy match layer — Probabilistic matches on name + address, name + phone, email domain + last name. These catch records where the exact identifier is missing but the pattern is clear.
Manual review layer — Potential matches flagged for human verification. This is not a failure mode. At enterprise scale, 2-5% of matches require judgment calls. The architecture that pretends otherwise creates silent data corruption.
In orgs with 500,000+ customer records, the typical pattern is:
- 70-80% matched via exact rules
- 15-20% matched via fuzzy rules
- 5-10% requiring manual review or remaining unmatched
The critical architectural decision: do you optimize for recall (matching as many records as possible) or precision (only matching when you’re certain)? Most teams optimize for recall and regret it. A false positive match (merging two different people) corrupts every downstream segment, calculated insight, and activation. A false negative (failing to match the same person) is visible and fixable.
Optimize for precision. Build monitoring to surface unmatched records. Fix them deliberately.
Reconciliation Rules: What Happens When Sources Conflict
Matching rules determine which records belong together. Reconciliation rules determine which field values win when sources disagree.
Customer record from e-commerce system says email is john.smith@gmail.com. Customer record from CRM says john.smith@company.com. Both are the same person. Which email goes into the Unified Individual?
Data Cloud supports four reconciliation strategies:
- Most recent: use the value from the most recently updated source
- Source priority: define a hierarchy of systems (CRM beats e-commerce beats support)
- Most frequent: use the value that appears most often across sources
- Custom: define your own logic, typically as a formula expression on the reconciliation rule (Apex extensions are available for the more complex cases that formulas can’t express)
Most teams default to “most recent” because it sounds reasonable. That works until:
- A batch import from a legacy system overwrites all current values with stale data
- A mobile app with spotty connectivity creates phantom updates
- A data quality issue in one system propagates to the unified profile
What survives in production: source priority for critical fields (email, phone, address), most recent for behavioral fields (last purchase date, last interaction), custom logic for fields where business rules matter (customer tier, lifecycle stage).
Document your reconciliation strategy in a decision matrix. Every field in your Unified Individual should have a defined rule. If you can’t articulate why a field uses a specific strategy, you haven’t thought through the architecture.
Data Model Objects and the Unified Individual
Identity resolution operates on Data Model Objects, not raw Data Streams. This is the architectural detail most teams miss.
Data Streams ingest raw data from source systems. DMOs normalize that data into a standard schema. Identity resolution compares DMOs to create Unified Individuals.
The critical implication: your DMO design determines what identity resolution can match on. If you map email to different field names across DMOs, identity resolution can’t compare them. If you normalize phone numbers inconsistently (some with country codes, some without), fuzzy matching fails.
The pattern that works:
- Map all source systems to a consistent set of DMOs (Individual, Contact Point Email, Contact Point Phone, Contact Point Address)
- Normalize field formats during the DMO mapping (phone numbers to E.164, emails to lowercase, addresses to USPS standard)
- Build matching rules that reference the normalized DMO fields
- Create the Unified Individual as a materialized view over matched DMOs
At 100,000+ record scale, the typical enterprise has:
- 3-5 core DMOs for identity resolution (Individual, email, phone, address, loyalty ID)
- 10-15 behavioral DMOs that reference the Unified Individual (purchases, interactions, support cases)
- 20-30 calculated insights derived from the unified profile
The Unified Individual becomes the join key for everything downstream. Segments reference it. Calculated insights aggregate over it. Activations target it. If your identity resolution architecture is wrong, every downstream process inherits the error.
The Patterns That Prevent Data Chaos
Three architectural patterns separate implementations that scale from implementations that collapse:
Separate identity resolution from data quality. Do not use identity resolution to fix bad data. If your source systems have inconsistent formats, fix them at the Data Stream or DMO mapping layer. Identity resolution should operate on clean, normalized data. Using fuzzy matching to compensate for data quality issues creates false positives that corrupt your unified profile.
Version your matching rules. When you change a matching rule, you change which records are considered the same person. That retroactively affects every historical segment, calculated insight, and activation. Run the new version in parallel with the old, compare results, cut over deliberately. Data Cloud does not enforce this. You build it into change management.
Monitor match rates and conflicts. Build dashboards that show: percentage of records matched, distribution of match confidence scores, reconciliation conflicts by field, unmatched records by source system. These metrics tell you when the architecture is degrading. A sudden drop in match rate means a source system changed its data format. A spike in reconciliation conflicts means two systems are fighting over the same field.
In practice, at enterprise scale, you should expect:
- 5-10% of records requiring manual review in the first 90 days
- 1-2% ongoing as new edge cases emerge
- Quarterly tuning of matching rules as data patterns evolve
Identity resolution is not a one-time configuration. It’s an architectural layer that requires ongoing monitoring and refinement.
The Single Decision That Determines Whether This Scales
If you take one architectural commitment from this piece, take this one: optimize for precision, not recall. False positive matches corrupt every downstream segment, calculated insight, and activation, and the corruption is invisible until a regulator, a finance team, or a customer notices. False negatives are visible and fixable.
The rest follows from that decision. Layered matching rules exist because exact-match-only is too conservative for real data; manual review exists because no automated tier reaches 100% precision; reconciliation rules exist because two systems will eventually disagree about the same field on the same person. Get the precision-vs-recall posture right and the rest of the architecture has obvious answers. Get it wrong and the rest of the architecture is just damage control.
Need help with data 360 & multi-cloud architecture?
Unify customer data across Salesforce clouds with Data 360, build identity resolution models, and architect multi-cloud systems that actually work together.
Related Articles
Data Cloud Segmentation Strategy That Works
Most Data Cloud segmentation strategies fail before activation. Here's the architecture that prevents it, with specific patterns for enterprise orgs.
Data Cloud Integration: Sales & Service Cloud
How to architect Data Cloud integration with Sales and Service Cloud without creating a fragile, over-engineered mess. Patterns that hold at scale.
Data Cloud Consultant Certification Guide
The Salesforce Data Cloud consultant certification demands architectural depth most candidates underestimate. Here's what actually matters to pass.