When we speak about the stages of Information Security and Disaster Recovery, it’s easy to confuse different stages without proper context. A good case in point is RPO, RTO, WRT and MTD, four acronyms that could easily be confused and mistaken for each other
Understanding your requirements for each threshold is important to establish the recovery infrastructure and processes you need in the event of disaster.
Here’s what the acronyms RPO, RTO, WRT and MTD mean:
Recovery Point Objective (RPO)
Recovery Time Objective (RTO)
Work Recovery Time (WRT)
Maximum Tolerable Downtime (MTD)
Discussions around RPO, RTO, WRT and MTD can turn technical fast, but they are easy to picture and imagine with the following scenario:
Stage 1: Business as usual
Your systems are running and working correctly.
Stage 2: Disaster happens
Disaster occurs and one or more of your systems need to be recovered.
Your Recovery Point Objective (RPO) determines the point in time to which you will recover. This is defined by the maximum acceptable amount of data loss measured in time.
For example, having a maximum tolerable data loss of 20 minutes will set your RPO to 20 minutes. This means you can stand to lose 20 minutes of data without an impact on your business. If you have an RPO of 0, this means you will have zero data loss after recovery.
Stage 3: Recovery
At this stage, recovery is underway, but your systems are not ready for production yet.
Your Recovery Time Objective (RTO) determines the maximum tolerable amount of time it takes to bring critical systems back online. It is related to downtime, representing how long it takes to restore critical systems to production from the incident.
Generally speaking, RTO is a technical consideration, limited by the capability of the IT department and/or your backup solution at the time of disaster striking. It defines how quickly you should be able to recover functionality. Every piece of equipment, and every app, will have its own RTO.
RTO ends when systems are back online, and data is recovered.
Stage 4: Resume production
At this stage, your systems are fully recovered and ready for production, allowing you to resume normal operations.
Your Work Recovery Time (WRT) determines the maximum tolerable amount of time it takes to verify systems and data protection. It is related to verification, so requires checking databases, logs, apps and services to ensure they are available and operating correctly.
Where Maximum Tolerable Downtime (MTD) fits in
Maximum Tolerable Downtime (MTD) is the sum of RTO + WRT. In other words, it is the sum of the total amount of time that a system can be disrupted before the organisation’s survival or operational capability is at risk.
Another way of looking at MTD is it represents the total amount of downtime a business is willing to accept for a critical process or system.
MTD is normally set by executive management as an operational threshold, rather than being a technical consideration of the IT department. However, MTD serves as a useful metric to compute RTO, and the two have a close relationship. This is because RTO, which the IT department can control, determines MTD at a technical level.
Hopefully, this explainer has given you a good introduction to what RPO, RTO, WRT and MTD are, and how they are relevant to your organisation.