Overview
util.identity_personid provides a resolved and reusable identity mapping between source-system person records and a canonical PERSON_UID.
It supports the Person domain by combining:
- person identifiers
- email-based linkage
- payroll references
- survivorship logic
into a single trusted identity layer.
This process reduces duplication, improves cross-system linking, and provides a stable foundation for:
person.core_details- related Person domain views
Purpose
- Establish a canonical person key (
PERSON_UID) - Resolve multiple source records representing the same individual
- Link person records using hashed email identity
- Apply survivorship rules to select the most appropriate record
- Provide a reusable identity bridge for the Person domain and future identity-led reporting
How It Works
1. Extract Source Email Identities
Source:
ifs.QDIM_FND_USER_PROPERTY_OL- Filter:
SMTP_MAIL_ADDRESS
Processing:
- Rename to
person_email - Convert to lowercase
- Generate
email_hash:- MD5(lower(email))
- first 8 characters
- uppercased
Outcome:
- A compact, repeatable identity link key
2. Enrich with Person Details
Join to:
ifs.QDIM_PERSON_INFO_OLPERSON_ID,NAME
ifs.QDIM_COMPANY_PERSON_ALL_CF_OLEMPLOYEE_STATUSPAYROLL_ID
Outcome:
- Combined identity + operational context for ranking and selection
3. Clean and Standardise
Rules applied:
PAYROLL_ID = 'n/a'→ null- Email normalised to lowercase
- Data types standardised
- Null / empty values filtered
Outcome:
- Consistent and predictable identity inputs across source variations
4. Apply Survivorship Ranking
A priority score (sortStatus) is calculated to resolve duplicates.
Current precedence:
- Active employees
- Records with
PAYROLL_ID
Outcome:
- Deterministic selection of the most operationally useful record
5. Resolve to Canonical PERSON_UID
Join to:
identity_email_proxy(derived fromidentity_bridge)
Logic:
- If bridge match exists → use mapped
PERSON_UID - Else → fallback to
email_hash
Outcome:
- Supports both:
- bridge-led canonical identity
- safe fallback identity
6. Deduplicate to One Row per Person
Final step:
- Retain one row per
PERSON_UID - Apply survivorship ranking
- Remove technical columns:
sortStatus,IDENTITY,ID,CODE,pipeline_log
Output:
PERSON_UIDPERSON_IDNAMEPAYROLL_IDemail_hashperson_email
Design Principles
Canonical Where Possible, Fallback Where Necessary
- Use bridge-defined
PERSON_UIDwhere available - Fallback to
email_hashensures continuity
Email as the Linking Mechanism
- Email is the most consistent cross-system identifier
- Hashing enables secure, repeatable linkage
Deterministic Survivorship
- Selection is rule-based, not arbitrary
- Ensures consistent outputs across refresh cycles
Privacy-Aware Linkage
- Uses hashed email as primary key
- Limits exposure of raw personal data
Separated Identity Architecture
identity_bridge→ resolves identity relationshipsutil.identity_personid→ materialises identityperson.core_details→ consumes identity
What This Enables
- Stable joins using
PERSON_UID - Consistent person-level reporting across systems
- Controlled linkage between Person and Employee domains
- Foundation for:
person.core_detailsperson.meta_*views
- Future Entra / proxy identity integration
Reduces risk of:
- duplicate person records
- inconsistent joins
- ad hoc identity logic in reports
Known Constraints & Design Notes
1. Bridge Dependency
- Relies on
src_identity_bridgefor canonical mapping - Falls back to
email_hashwhere mapping is unavailable
2. Email-Led Logic
Depends on:
- valid SMTP email population
- stable email ownership
- reliable system linkage
3. Survivorship is Contextual
Current prioritisation:
- Active status
- Payroll presence
This reflects operational preference, not universal identity truth.
4. Hashing Implementation Constraint
- Microsoft Dataflows (Power Query M) does not provide a native hashing function
- A deterministic hash is required to:
- create a stable
email_hash - support cross-system identity linkage
- avoid reliance on raw email values as join keys
- create a stable
Current approach:
- Uses an external MD5 hashing service (
HashToMD5) - Applies:
- lowercase normalisation
- MD5 hash
- first 8 characters (uppercased)
Outcome:
- Produces a compact, repeatable, platform-independent identity key
Design position:
- This is an intentional workaround for platform limitations, not an ad hoc solution
- Ensures consistent hashing behaviour across all Dataflow executions
Future consideration:
- If/when native hashing becomes available in Fabric / Dataflows, this implementation can be internalised without changing the model or downstream dependencies
Summary
util.identity_personid provides the canonical identity resolution layer for the Person domain.
It transforms multiple source-system representations into a single, stable PERSON_UID, enabling consistent, secure, and scalable person-level modelling across the Data Warehouse.