Util.Identity_PersonID

Overview

util.identity_personid provides a resolved and reusable identity mapping between source-system person records and a canonical PERSON_UID.

It supports the Person domain by combining:

person identifiers
email-based linkage
payroll references
survivorship logic

into a single trusted identity layer.

This process reduces duplication, improves cross-system linking, and provides a stable foundation for:

person.core_details
related Person domain views

Purpose

Establish a canonical person key (PERSON_UID)
Resolve multiple source records representing the same individual
Link person records using hashed email identity
Apply survivorship rules to select the most appropriate record
Provide a reusable identity bridge for the Person domain and future identity-led reporting

How It Works

1. Extract Source Email Identities

Source:

ifs.QDIM_FND_USER_PROPERTY_OL
Filter: SMTP_MAIL_ADDRESS

Processing:

Rename to person_email
Convert to lowercase
Generate email_hash:
- MD5(lower(email))
- first 8 characters
- uppercased

Outcome:

A compact, repeatable identity link key

2. Enrich with Person Details

Join to:

ifs.QDIM_PERSON_INFO_OL
- PERSON_ID, NAME
ifs.QDIM_COMPANY_PERSON_ALL_CF_OL
- EMPLOYEE_STATUS
- PAYROLL_ID

Outcome:

Combined identity + operational context for ranking and selection

3. Clean and Standardise

Rules applied:

PAYROLL_ID = 'n/a' → null
Email normalised to lowercase
Data types standardised
Null / empty values filtered

Outcome:

Consistent and predictable identity inputs across source variations

4. Apply Survivorship Ranking

A priority score (sortStatus) is calculated to resolve duplicates.

Current precedence:

Active employees
Records with PAYROLL_ID

Outcome:

Deterministic selection of the most operationally useful record

5. Resolve to Canonical `PERSON_UID`

Join to:

identity_email_proxy (derived from identity_bridge)

Logic:

If bridge match exists → use mapped PERSON_UID
Else → fallback to email_hash

Outcome:

Supports both:
- bridge-led canonical identity
- safe fallback identity

6. Deduplicate to One Row per Person

Final step:

Retain one row per PERSON_UID
Apply survivorship ranking
Remove technical columns:
- sortStatus, IDENTITY, ID, CODE, pipeline_log

Output:

PERSON_UID
PERSON_ID
NAME
PAYROLL_ID
email_hash
person_email

Design Principles

Canonical Where Possible, Fallback Where Necessary

Use bridge-defined PERSON_UID where available
Fallback to email_hash ensures continuity

Email as the Linking Mechanism

Email is the most consistent cross-system identifier
Hashing enables secure, repeatable linkage

Deterministic Survivorship

Selection is rule-based, not arbitrary
Ensures consistent outputs across refresh cycles

Privacy-Aware Linkage

Uses hashed email as primary key
Limits exposure of raw personal data

Separated Identity Architecture

identity_bridge → resolves identity relationships
util.identity_personid → materialises identity
person.core_details → consumes identity

What This Enables

Stable joins using PERSON_UID
Consistent person-level reporting across systems
Controlled linkage between Person and Employee domains
Foundation for:
- person.core_details
- person.meta_* views
Future Entra / proxy identity integration

Reduces risk of:

duplicate person records
inconsistent joins
ad hoc identity logic in reports

Known Constraints & Design Notes

1. Bridge Dependency

Relies on src_identity_bridge for canonical mapping
Falls back to email_hash where mapping is unavailable

2. Email-Led Logic

Depends on:

valid SMTP email population
stable email ownership
reliable system linkage

3. Survivorship is Contextual

Current prioritisation:

Active status
Payroll presence

This reflects operational preference, not universal identity truth.

4. Hashing Implementation Constraint

Microsoft Dataflows (Power Query M) does not provide a native hashing function
A deterministic hash is required to:
- create a stable email_hash
- support cross-system identity linkage
- avoid reliance on raw email values as join keys

Current approach:

Uses an external MD5 hashing service (HashToMD5)
Applies:
- lowercase normalisation
- MD5 hash
- first 8 characters (uppercased)

Outcome:

Produces a compact, repeatable, platform-independent identity key

Design position:

This is an intentional workaround for platform limitations, not an ad hoc solution
Ensures consistent hashing behaviour across all Dataflow executions

Future consideration:

If/when native hashing becomes available in Fabric / Dataflows, this implementation can be internalised without changing the model or downstream dependencies

Summary

util.identity_personid provides the canonical identity resolution layer for the Person domain.

It transforms multiple source-system representations into a single, stable PERSON_UID, enabling consistent, secure, and scalable person-level modelling across the Data Warehouse.

This is based on the table csms.feedbackresponses, there is already a view feedback.requests based on csms.feedbackrequests

This may need to be called "Feedback.Requests" not Responses

Changes to Person

{"Current stable version":"202409","exec":"@version = ‘202409’","Development" : "@version = ‘beta’","Change":"Regional View"}

{"Current stable version":"202409","exec":"@version = ‘202409’","Development" : "@version = ‘beta’","Change":"Stable 09 View"}

Expected Condition The DataMart View BusOpp.Dates is in the style of 202409 As Found Condition The unversioned (beta) edition of…

Overview

Purpose

How It Works

1. Extract Source Email Identities

2. Enrich with Person Details

3. Clean and Standardise

4. Apply Survivorship Ranking

5. Resolve to Canonical PERSON_UID

6. Deduplicate to One Row per Person

Design Principles

Canonical Where Possible, Fallback Where Necessary

Email as the Linking Mechanism

Deterministic Survivorship

Privacy-Aware Linkage

Separated Identity Architecture

What This Enables

Known Constraints & Design Notes

1. Bridge Dependency

2. Email-Led Logic

3. Survivorship is Contextual

4. Hashing Implementation Constraint

Summary

Leave a Comment Cancel reply

5. Resolve to Canonical `PERSON_UID`