Data Integration Standard

Issued by: Data & Analytics Team
Applies to: All data integrations, pipelines, and views entering or leaving the Data Warehouse
Status: Approved Draft
Date: [Insert Date]
Version: 1.0


1. Purpose

This standard defines the structure and principles for integrating data from source systems—including ERP (IFS), Dynamics 365, Power Apps, and SharePoint—into the enterprise Data Warehouse.

It ensures that data can be ingested with minimal transformation, remains joinable and stable, and achieves consistent quality, traceability, and trust.

Objectives:

  • Enable low-friction ingestion across diverse systems.
  • Define stable identifiers and consistent relationships.
  • Prevent brittle joins on labels or non-unique values.
  • Establish measurable data quality criteria (target score ≥ 7 / 10).

2. Scope

Applies to:

  • All source-to-target integrations across the medallion architecture (Bronze → Silver → Gold → Platinum).
  • All business domains (Project, Employee, Customer, Supplier, Finance, CRM).
  • All environments (Development, Test, Production).

Excludes:

  • Operational process redesign within source systems.
  • Enforcement of ERP business logic at source.

3. Principles

  1. Clarity of identity – every record has a stable and auditable key.
  2. Consistency of structure – like entities use like field names and types.
  3. Minimal transformation – ingestion logic limited to conformance only.
  4. Integrity of relationships – links between tables are explicit and testable.
  5. History preserved – change is versioned, not overwritten.
  6. Data contract first – structure agreed before load.
  7. Measurable quality – scored against the Data Quality Rubric.
  8. Look in the box first – always inspect and respect the source system structures before designing transformations or new views.
    • Use the fields, relationships, and identifiers as provided by the source system; do not invent new keys or constructs where suitable ones already exist.
    • Validate how data behaves in practice — confirm that joins, lookups, and field values are reliable.
    • Document any anomalies or deviations before creating derived tables or mappings.

Principle: Data integrity starts with understanding and respecting the source.
We do not “make it up” — we observe, confirm, and then design.


4. Identity and Keys

4.1 Company Identity

Each record must contain a route to the Company_ID using the ERP/IFS standard code.
This serves as the enterprise partition key.

4.2 Entity Identifiers

EntityBusiness CodeExample
ProjectProject_IDD012345
EmployeeEmployee_Number (EMP_NO)E1234
CustomerCustomer_Code (Customer_ID)CUST-001
SupplierSupplier_Code (Supplier_ID)SUP-009
PersonEmail_Addressname@uk.bmt.org

4.3 Source Key and Environment

Retain the native system identifier (e.g. GUID) as Source_Key, plus Source_System and Source_Env for lineage.

Key Stability: If a source key changes, treat it as a rotation event—record lineage but do not overwrite history.

4.4 Warehouse Object Sequence

OBJECT_SEQ Generated from a hash of (Company_ID, Business_Code) for joins in the warehouse.

4.5 Lookup Triplet Requirement

Every lookup must include:

  • Lookup_Key – system or numeric ID
  • Lookup_Code – coded/abbreviated value
  • Lookup_Description – readable label

Rule: Joins must use Lookup_Key or Lookup_Code, never the description.

4.6 Explicit Relationships

Declare all parent–child relationships in the data contract.
Logical foreign keys must be represented even if not enforced physically.

4.7 Orphan Handling

Where a record does not meet the relationship or join rules (for example, a child record with no valid parent key):

  • The record will not be loaded into the conformed layer (Silver or above).
  • The dataset is treated as filtered rather than reconciled — such records are simply excluded from the joined output.
  • If a table within the dataflow is known to have suspect or incomplete joins, a raw, unjoined version of that table will also be submitted to Bronze for inspection and traceability.
  • This ensures visibility of all source data without polluting the conformed dataset.

Note:

Lost records may later reappear once the parent relationship is corrected at source. These will be treated as new valid entries, not reinstatements of previous orphans.

The raw Bronze copy provides an audit trail for review but is not intended for end-user reporting.


5. History and Change Management

5.1 Slowly Changing Dimensions (SCD2)

Use SCD2 for all descriptive data:

  • Effective_From, Effective_To, Is_Current.

5.2 Status and Ownership

Volatile fields (e.g. project status, owner) captured as separate history tables or change events.

5.3 Soft Deletes & Row Presence Verification

Soft delete when a source row disappears.
Use RPV logic to detect missing/stale rows across loads.


7. Ingestion and Conformance

7.1 Bronze (Raw) Layer

One-to-one with source. Add:

  • Load_Timestamp
  • Source_System
  • Source_Env
  • Checksum / Row_Hash

7.2 Silver (Aligned) Layer

Adds:

  • Business codes + surrogate keys
  • Lookup triplets
  • Explicit relationships
  • SCD and RPV scaffolding

No business calculations permitted.

7.3 Gold (Delivery) Layer

Domain-specific marts with metrics and derived logic.

7.4 Platinum (Advanced Analytics)

Curated ML/AI datasets; high-trust aggregates only.


8. Data Quality Gates and Scoring

8.1 Rubric Overview

ScoreDescriptorMeaning
1BrokenReport/view does not work, fails to run, or gives unusable results.
2UnreliableRuns but contains serious errors, missing data, or misaligned logic.
3PoorWorks but is inaccurate or very incomplete; unsuitable for business use.
4WeakPartial data is correct, but structure or reliability prevents confidence.
5AdequateMeets minimum needs; functional but clunky or limited in scope.
6FairGenerally correct, but with notable gaps, inefficiencies, or usability issues.
7GoodProduces reliable and accurate results; some improvements desirable.
8StrongWell-structured, accurate, and trusted; minor refinements could enhance.
9ExcellentHigh-quality, efficient, user-friendly, and widely trusted.
10OutstandingBest-in-class, model of design and usability, requires no improvement.

8.2 Gate Criteria

CheckTargetAction
Company_ID completeness100 %Fail if missing
Lookup triplets present100 %Warn if < 100 %
FK resolvable≥ 99.5 %Quarantine if below
Key stabilityNo mutationAudit rotation
SCD2 validity≥ 99 %Warn if below
RPV freshnessWithin SLAFlag stale

9. Governance and Change Control

DataMart design and review must follow the Playbook: DataMart & Reporting Development Process to ensure consistency of standards, quality, and approval steps.

  • Design Review – mandatory for new or changed integrations.
  • Versioning – pipelines and contracts carry semantic versions (e.g. 1.2.0).
  • Issue Workflow – Detect → Quarantine → Investigate → Resolve → Re-release.
  • Documentation – All artefacts linked in KnowHow and myBMT with version tag.

Exceptions and Waivers

  • Exceptions must be formally documented and approved by Data Governance prior to implementation.
  • Temporary waivers (e.g. missing lookup codes or unresolved joins) must include a remediation plan and target resolution date.
    • Exceptions and Waivers identified during the DataMart view design stage may cause the project to be formally blocked until an acceptable resolution or mitigation is agreed with Data Governance.
  • Waivers older than 90 days will be reviewed for closure or escalation.

10. Responsibilities

RoleResponsibility
Data EngineerImplement and maintain conformity.
Data OwnerEnsure stable identifiers and completeness.
App DeveloperExpose required keys and lookup values.
Data GovernanceReview quality scores; approve exceptions.
Head of ReportingApprove Gold-layer release for publication.

11. References

  • DAMA-DMBOK v2 – Data Management Body of Knowledge
  • ISO 8000 – Data Quality Standard
  • Kimball Group – Dimensional Modelling Techniques
  • Microsoft Dataverse Design Guidance

12. Review and Maintenance

  • Reviewed annually or after major architecture change.
  • Updates submitted via the Data & Analytics Governance Board.

Leave a Comment