This draft framework provides a robust SLA for DataMart availability, balancing business needs, technical feasibility, and stakeholder communication.
1. Availability Target
Set a clear uptime percentage that aligns with the criticality of the DataMart to your business processes.
- Example:
- Datamarts must maintain a 99% availability per month.
- Define the allowed downtime (e.g., ~7.3 hours/month for 99% uptime).
2. Operational Hours
Define the expected operational hours of the DataMart. Specify if it needs to be available 24/7 or only during specific business hours.
- Example:
- “The DataMart is expected to be available 24/7, excluding planned maintenance windows.”
- “Critical periods are 07:00โ18:00 (GMT) on weekdays.”
3. Planned Maintenance
Outline the process and expectations for planned downtime.
- Example:
- “Planned maintenance must be scheduled at least 7 days in advance, and the maintenance window must not exceed 4 hours per occurrence unless approved.”
- “Maintenance should occur during low-traffic periods, typically between 19:00 and 00:00 (GMT).”
4. Incident Response and Resolution
Define how quickly issues should be addressed and resolved based on their severity.
PRD (Production) Environment
(Aligns with Business Report Availability)
- Critical Incidents: Datamart is completely inaccessible, impacting business-critical reports or processes.
- Response Time: Within 1 hour.
- Resolution Time: Within 8 hours.
- High Incidents: Partial functionality issues (e.g., degraded performance or failure of key components) that impact significant reporting processes but have workarounds available.
- Response Time: Within 4 hours.
- Resolution Time: Within 2 business days.
- Low Incidents: Non-critical errors (e.g., minor data discrepancies, cosmetic issues, or low-priority feature failures) that do not impact core business operations.
- Response Time: Within 2 business days.
- Resolution Time: Within 5 business days.
DEV (Development) Environment
(Aligns with Agile Sprint Planning)
- Critical Incidents: Datamart is inaccessible, blocking development, testing, or deployment of features within the sprint.
- Response Time: Within 4 hours.
- Resolution Time: Within 2 business days.
- High Incidents: Partial functionality issues (e.g., key features are non-functional or workflows are impaired) impacting sprint goals but not blocking progress entirely.
- Response Time: Within 2 business days.
- Resolution Time: Within 5 business days.
- Low Incidents: Non-critical issues or errors (e.g., enhancements, minor bugs, or performance tuning requests) with minimal impact on sprint goals.
- Response Time: Within 5 business days.
- Resolution Time: Within 15 business days.
5. Performance Metrics
Set thresholds for DataMart performance to ensure usability.
- Example:
- Query execution times, for direct DataMart execution, should not exceed 10 seconds for 95% of queries.
- Load/refresh processes should complete within 30 minutes of their scheduled time.
6. Data Availability
Define how often the DataMart is refreshed and how up-to-date the data must be.
- Example:
- “DataMart data must be refreshed every 24 hours by 06:00 (GMT). Any delays exceeding 1 hour must be communicated to stakeholders.”
7. Communication and Escalation
Establish protocols for informing stakeholders about issues or changes.
- Example:
- “Notification of unplanned downtime must be sent to stakeholders via email or Teams within 1 hour of detection.”
“Weekly performance reports will include uptime metrics, incident summaries, and upcoming maintenance schedules.”
8. Service Corrective Actions
Consider corrective actions for unmet SLAs to ensure accountability.
- Example:
- “Repeated SLA breaches over 3 consecutive months may trigger a review or additional corrective actions.”
9. Exclusions
Specify scenarios where the SLA does not apply.
- Examples:
- Issues caused by third-party services (e.g., Azure outages).
- User errors or queries exceeding system capacity without prior discussion.
- Force majeure events (natural disasters, widespread power outages, etc.).
10. Review and Improvement
Include a provision for periodic SLA reviews to adapt to changing business requirements.
- Example:
- “SLAs will be reviewed quarterly to ensure they remain aligned with business needs and technical capabilities.”