Objective: Efficiently acquire data from various sources and ensure secure, accurate, and timely transport to the data lakehouse and data warehouse.
Steps:
Data Source Identification and Mapping
- Action: Identify relevant data sources, including databases, applications, and third-party sources.
- Deliverable: Data source inventory, data mapping documents.
- Control Measures: All data sources SHALL be described in myBMT, EA solution SHALL map enterprise interactions.
Data Profiling and Classification
- Action: Profile data sources to understand data quality and classify based on sensitivity.
- Deliverable: Data profiling reports, classification tags.
- Control Measures: Sensitivity levels defined by DataMart Owner, data tagged for classification, metadata management in myBMT.
Data Acquisition and Integration
- Action: Use ETL processes to acquire data and transport it to the data lakehouse.
- Deliverable: ETL process documentation, transfer logs.
- Control Measures: Data encryption for sensitive data, archiving of source data before processing, transfer logs applied at each pipeline stage.
Transport to Data Warehouse
- Action: Orchestrate the secure and accurate transfer of data from the lakehouse to the data warehouse.
- Deliverable: Data warehouse integration reports.
- Control Measures: Continuous monitoring, execution logs identifying external agents and calling users.
Testing and Validation
- Action: Perform integration testing to ensure the ETL process works end-to-end from source to warehouse.
- Deliverable: Test reports, validation results.
- Control Measures: Data views in DataMarts checked for errors, data accuracy validated.