π― Purpose
This framework ensures that values in complex metadata structures (e.g. project earned value, invoice accounting, or employee attributes) are verified for presence even when using incremental loads. It protects against stale data persistence or silent value loss, using metadata-driven auditing.
π§© Problem Context
In many CDM views, values are upserted using procedures like cdm.patch_upsert
. These depend on:
- Change detection via
CHECKSUM
- Soft-deletion (expiry) when values change
However, if a value disappears from the source (e.g. employee is unassigned, or project status is removed) and no update occurs, it may remain active unless explicitly expired.
This issue is exacerbated with incremental loads, which may not include all rows every run.
β Framework Components
1. log.value_presence_seen
Table
Tracks the last seen date of each value (by object, meta type, attribute).
CREATE TABLE [log].[value_presence_seen] (
OBJECT_SEQ VARCHAR(256) NOT NULL,
META_TYPE VARCHAR(32) NOT NULL,
ATTRIBUTE VARCHAR(64) NOT NULL,
LAST_SEEN_DATE DATE NOT NULL,
LAST_SEEN_RUN_ID VARCHAR(32) NULL,
SEEN_COUNT INT DEFAULT 1,
CONSTRAINT PK_value_seen PRIMARY KEY (OBJECT_SEQ, META_TYPE, ATTRIBUTE)
);
2. Upsert Logic for Seen Rows
Add this inside your loader or patch routine to record presence:
MERGE log.value_presence_seen AS target
USING (SELECT @object_seq, @meta_type, @attribute) AS source(OBJECT_SEQ, META_TYPE, ATTRIBUTE)
ON (target.OBJECT_SEQ = source.OBJECT_SEQ AND target.META_TYPE = source.META_TYPE AND target.ATTRIBUTE = source.ATTRIBUTE)
WHEN MATCHED THEN
UPDATE SET LAST_SEEN_DATE = GETDATE(),
SEEN_COUNT = target.SEEN_COUNT + 1
WHEN NOT MATCHED THEN
INSERT (OBJECT_SEQ, META_TYPE, ATTRIBUTE, LAST_SEEN_DATE, SEEN_COUNT)
VALUES (source.OBJECT_SEQ, source.META_TYPE, source.ATTRIBUTE, GETDATE(), 1);
3. cdm.reconcile_presence
Procedure
Reconciles values in CDM tables based on absence in log.value_presence_seen
.
UPDATE tgt
SET ACTIVE_FLAG = 0,
EXPIRY_DATE = GETDATE(),
EXPIRY_REASON = 'Not seen in > 7 days'
FROM cdm.project_values AS tgt
LEFT JOIN log.value_presence_seen AS seen
ON tgt.OBJECT_SEQ = seen.OBJECT_SEQ
AND tgt.META_TYPE = seen.META_TYPE
AND tgt.ATTRIBUTE = seen.ATTRIBUTE
WHERE tgt.ACTIVE_FLAG = 1
AND seen.LAST_SEEN_DATE < DATEADD(DAY, -7, GETDATE());
π Integration Into Pipeline
- During incremental load:
- Load source to staging (e.g.
stage.project_values_current
) - Run
cdm.patch_upsert
for new/changed values - Record presence in
log.value_presence_seen
- Load source to staging (e.g.
- Periodically (e.g. weekly):
- Run
cdm.reconcile_presence
to expire stale values - Optional: log expired keys to
log.value_presence_errors
- Run
π§ Best Practices
- Use
SEEN_COUNT
andLAST_SEEN_RUN_ID
for insights - Implement alerting for high volume of expired values
- Run RPV on a scheduled batch, not live/streamed insert
- Keep
log.value_presence_seen
trimmed if needed (e.g. via archival)
π Related Objects
[cdm].[patch_upsert]
[cdm].[post_insert]
[cdm].[put_update]
[log].[value_presence_seen]
[stage].[project_values_current]
[cdm].[reconcile_presence]