Value Presence Verification (RPV) + Last-Seen Tracking Framework

🎯 Purpose

This framework ensures that values in complex metadata structures (e.g. project earned value, invoice accounting, or employee attributes) are verified for presence even when using incremental loads. It protects against stale data persistence or silent value loss, using metadata-driven auditing.


🧩 Problem Context

In many CDM views, values are upserted using procedures like cdm.patch_upsert. These depend on:

  • Change detection via CHECKSUM
  • Soft-deletion (expiry) when values change

However, if a value disappears from the source (e.g. employee is unassigned, or project status is removed) and no update occurs, it may remain active unless explicitly expired.

This issue is exacerbated with incremental loads, which may not include all rows every run.


βœ… Framework Components

1. log.value_presence_seen Table

Tracks the last seen date of each value (by object, meta type, attribute).

CREATE TABLE [log].[value_presence_seen] (
    OBJECT_SEQ       VARCHAR(256) NOT NULL,
    META_TYPE        VARCHAR(32)  NOT NULL,
    ATTRIBUTE        VARCHAR(64)  NOT NULL,

    LAST_SEEN_DATE   DATE         NOT NULL,
    LAST_SEEN_RUN_ID VARCHAR(32)  NULL,
    SEEN_COUNT       INT          DEFAULT 1,

    CONSTRAINT PK_value_seen PRIMARY KEY (OBJECT_SEQ, META_TYPE, ATTRIBUTE)
);

2. Upsert Logic for Seen Rows

Add this inside your loader or patch routine to record presence:

MERGE log.value_presence_seen AS target
USING (SELECT @object_seq, @meta_type, @attribute) AS source(OBJECT_SEQ, META_TYPE, ATTRIBUTE)
ON (target.OBJECT_SEQ = source.OBJECT_SEQ AND target.META_TYPE = source.META_TYPE AND target.ATTRIBUTE = source.ATTRIBUTE)
WHEN MATCHED THEN
    UPDATE SET LAST_SEEN_DATE = GETDATE(),
               SEEN_COUNT = target.SEEN_COUNT + 1
WHEN NOT MATCHED THEN
    INSERT (OBJECT_SEQ, META_TYPE, ATTRIBUTE, LAST_SEEN_DATE, SEEN_COUNT)
    VALUES (source.OBJECT_SEQ, source.META_TYPE, source.ATTRIBUTE, GETDATE(), 1);

3. cdm.reconcile_presence Procedure

Reconciles values in CDM tables based on absence in log.value_presence_seen.

UPDATE tgt
SET ACTIVE_FLAG = 0,
    EXPIRY_DATE = GETDATE(),
    EXPIRY_REASON = 'Not seen in > 7 days'
FROM cdm.project_values AS tgt
LEFT JOIN log.value_presence_seen AS seen
  ON tgt.OBJECT_SEQ = seen.OBJECT_SEQ
 AND tgt.META_TYPE = seen.META_TYPE
 AND tgt.ATTRIBUTE = seen.ATTRIBUTE
WHERE tgt.ACTIVE_FLAG = 1
  AND seen.LAST_SEEN_DATE < DATEADD(DAY, -7, GETDATE());

πŸ” Integration Into Pipeline

  1. During incremental load:
    • Load source to staging (e.g. stage.project_values_current)
    • Run cdm.patch_upsert for new/changed values
    • Record presence in log.value_presence_seen
  2. Periodically (e.g. weekly):
    • Run cdm.reconcile_presence to expire stale values
    • Optional: log expired keys to log.value_presence_errors

🧠 Best Practices

  • Use SEEN_COUNT and LAST_SEEN_RUN_ID for insights
  • Implement alerting for high volume of expired values
  • Run RPV on a scheduled batch, not live/streamed insert
  • Keep log.value_presence_seen trimmed if needed (e.g. via archival)

πŸ“Ž Related Objects

  • [cdm].[patch_upsert]
  • [cdm].[post_insert]
  • [cdm].[put_update]
  • [log].[value_presence_seen]
  • [stage].[project_values_current]
  • [cdm].[reconcile_presence]

Leave a Comment