Identify Source Type and Origin: The first step is to determine the type and origin of the data source. This could be an external source like IFS, where data is automatically loaded into a container by a third party, or an internal source that we can directly access, such as the HubSpot API or a SharePoint list.
Data Ingestion and Storage:
Pipeline Execution: Run a dedicated pipeline for each data source (e.g., IFS, HubSpot, SharePoint) to transfer the data from its source into the bronze container in Parquet format. Parquet is chosen for its storage efficiency and improved data processing performance.
Data Archiving: The data from each source is also archived within the bronze container to ensure a backup is available for any future reference or reprocessing needs.
Data Staging:
Bronze to Staging Pipeline: Trigger the bronze-to-staging pipeline to move data from the bronze container into the staging database. This process is automated to ensure that the staging database is consistently updated with fresh data.
View Creation:
Field Requirement Gathering: Engage in discussions with the customer to understand their specific field requirements. This may involve joining data from multiple tables to ensure all necessary fields are included.
SQL Query Development: Write an SQL query to generate the required view based on the customer’s needs. This query will define how data is extracted and presented in the final view.
Automated Deployment:
DevOps Integration: Once the SQL query is finalized, commit it to the DevOps repository. This action triggers an automated release pipeline that deploys the latest queries to a container, preparing them for use by the snapshot pipeline.
Snapshot Creation and Storage:
Snapshot Pipeline Execution: The snapshot pipeline executes the committed SQL queries to create data snapshots. These snapshots are then transferred to the gold container and ultimately stored in the global Data Warehouse (DWH) under the snp schema.
Power BI Integration: The snapshots are now available for use in Power BI reports through the Exec get.myview command, allowing for real-time data analysis and reporting.