A disaster recovery strategy focuses on restoring critical systems and infrastructure after a disruption. In this article, we’ll explore how to measure the preparedness of specific systems or assets for disaster recovery and calculate overall disaster recovery readiness by aggregating data from individual assets.
Beyond Disaster Recovery for IT
With an average of 4.2 data-related disruptions annually1, we see a growing trend of scaling IT disaster recovery and aligning it with the overall GRC function.
At the same time, an increasing number of disruptive events, especially extreme weather conditions, drives organizations to look beyond basic business continuity and consider a wider scope2 of disaster recovery, including infrastructure recovery, facility recovery, operational recovery, and more.
The strategy implementation approach, based on an aligned strategy and functional scorecards, allows organizations to adapt the principles of an IT disaster recovery scorecard to this wider scope and integrate it into the company’s overall strategy.
Step 1. Business Impact and Risks Analysis
Start design of a disaster recovery scorecard with certain analyses and planning steps3:
- Business Impact Analysis to identify critical assets
- Risk Analysis to identify potential effects of uncertainty on the business
- Scenario Planning to explore how identified risks might impact critical assets
Depending on the complexity of the systems and stakeholders involved, the analysis process can be formalized using functional scorecards. For example:
- Business Continuity Scorecard with asset mapping and incident tracking.
- Central Risk Register with risk identification and analysis.
- Scenario Scorecard for analyzing early warning indicators and planning response strategies.
Step 2. Establish Recovery Point and Recovery Time Objectives
In this step, we establish:
- Recovery Point Objective (acceptable loss, such as acceptable data loss), and
- Recovery Time Objective (acceptable operation downtime).
For scorecard calculations, we will first define metrics for a specific system or asset and then combine them into an overall compliance score.
Disaster Recovery KPIs for a Specific System
When focusing on disaster recovery in IT, organizations may track metrics such as reliability, recovery time, and recovery point to evaluate and enhance their strategies.
Reliability Metrics
- Mean Time Between Failures (MTBF): Time between repairable system failures.
- Mean Time to Failure (MTTF): Time between non-repairable system failures, such as the total lifespan of a system.
Recovery Time Metrics
- Mean Time to Recovery (MTTR)
- Recovery Time Objective (RTO): Maximum allowable downtime after a disruption or the target MTTR.
Recovery Point Metrics
- Actual Backup Frequency
- Recovery Point Objective (RPO): Maximum acceptable data loss time or the target for backup frequency.
Calculating Performance: Linear vs. Binary
There are two popular approaches for calculating the performance of disaster recovery metrics:
- Linear optimization function
- Binary optimization function
For example, reliability metrics in our template are configured as linear optimization functions. This means the performance gradually improves as the metric’s value increases from the baseline toward the target.
Example
The MTBF for a Customer Relationship Management (CRM) system has a target of 10,000 hours, with an actual value of 8,000 hours.
- Using a linear function, performance is calculated as 80% (= 8,000 / 10,000).
- Using a binary function, performance is 0% because the target of 10,000 hours was not reached.
Binary performance functions are often used for MTTR:
- If MTTR is less than or equal to the RTO, performance is 100%.
- If MTTR exceeds the RTO, performance is 0%.
Recovery Time Objective as a Separate Indicator
MTTR has a current value and a target value. The target value corresponds to the current “Recovery Time Objective (RTO)” indicator.
While it’s possible to remove the RTO indicator and set the target directly for MTTR, compliance and reporting requirements often require tracking them separately. Therefore, the RTO is maintained as a distinct indicator.
Risk Definitions
Formulating a disaster recovery strategy begins with a business impact and risk analysis. Some risks are recorded in a central risk register, while more specific risks can be linked to the disaster recovery scorecard for individual assets.
The key is ensuring a clear connection between impact or risk analysis findings and recovery metrics for specific business systems. For example, for the Web Servers assets, the risks of “Exploited Vulnerabilities” and “DDoS Attacks” were defined locally.
Continuous Monitoring of Disaster Recovery Strategy
Disaster recovery metrics evolve over time:
- Targets may be adjusted based on updated risk models.
- Actuals are updated with historical performance data.
Key considerations include:
- The frequency of metric updates or revisions.
- Handling periods without data, e.g., whether to inherit data or display only explicitly entered data.
Overall Compliance Calculation
To assess overall readiness, we combine the performance of individual assets. If necessary, weights can be applied to reflect the relative importance of each asset.
Alternatively, overall compliance can be calculated using the critical path approach, focusing on critical systems’ performance.
For example, in our template:
- RPO Compliance (Critical Path) includes assets with recovery point objectives (RPOs) of 24 and 12 hours. The overall RPO is the minimum of these values, i.e., 12 hours.
- If even one asset fails to meet its RPO (e.g., “RPO for Inventory Management”), the overall RPO is not achieved.
The total scorecards for recovery time and recovery point metrics can be used as a source of data for compliance scorecard and other GRC-related functional scorecards.
Disaster Recovery Readiness Dashboard
Key metrics from the disaster recovery scorecard can be visualized on a dashboard alongside risk diagrams and improvement initiatives.
A strategy map provides a clear view of specific systems and their aggregated performance, offering a comprehensive overview.
Step 3. Establishing Internal Controls for Disaster Recovery
The definition of disaster recovery metrics (Step 2) allows the organization to establish acceptable loss and recovery levels, as well as quantify its readiness for disruptive events. However, these metrics don’t include specific emergency plans, responsibility mappings4, or validation and testing procedures. To address this, appropriate internal controls need to be designed.
In previous articles, we discussed the general approach to setting up internal controls, as well as their practical application in the business continuity domain.
In the context of disaster recovery, most of the
Stakeholders and Owners
Involving key stakeholders is critical for a success of disaster recovery strategy 4. On practical level, accountability can be enhanced by assigning owners to disaster recovery metrics and initiatives.
Training session: 'BSC Designer for Disaster Recovery Scorecard' is offered as part of our ongoing learning program and included with a BSC Designer subscription.
Training sessions are delivered weekly via Zoom, providing practical insights and personalized guidance. Upon completion, participants receive an attendance certification. Explore all available training sessions here.
Conclusions
The IT disaster recovery scorecard combines various approaches to performance measurement.
When quantifying specific assets or systems, we rely on:
- Mean Time Between Failures (MTBF) and Mean Time to Failure (MTTF) to estimate reliability.
- Recovery Time Objective (RTO), which establishes the target for the Mean Time to Recovery (MTTR).
- Recovery Point Objective (RPO), which establishes the target for Backup Frequency.
The performance of these metrics is typically calculated as a binary function, where the performance is 0% until the actual value meets the recovery objective.
The recovery metrics for individual assets or systems can be combined (for example, using the critical path approach) to calculate the overall readiness or compliance score.
Use Disaster Recovery Template
BSC Designer helps organizations implement their complex strategies:
- Sign up for a free plan on the platform.
- Use the Disaster Recovery template as a starting point. You will find it in New > New Scorecard > More Templates.
- Follow our Strategy Implementation System to align stakeholders and strategic ambitions into a comprehensive strategy.
Get started today and see how BSC Designer can simplify your strategy implementation!
- IDC, The State of Disaster Recovery and Cyber-Recovery, 2024–2025: Factoring in AI, 2024, IDC ↩
- Disaster Recovery Framework Guide, 2020, World Bank Group ↩
- Design Your Organization to Withstand Future Disasters, M. Reeves, K. Whitaker, Harvard Business Review, 2022. ↩
- Disaster Resilience Scorecard for Cities, UNDRR, 2024. ↩
- Disaster Resilience Scorecard for Cities, UNDRR, 2024. ↩
Alexis is a Senior Strategy Consultant and CEO at BSC Designer, with over 20 years of experience in strategic planning. Alexis developed the “5 Step Strategy Implementation System” that helps companies with the practical implementation of their strategies. He is a regular speaker at industry conferences and has published over 100 articles on strategy and performance management, including the book “10 Step KPI System”. His work is frequently cited in academic research.