- What makes a “good backup”?
- Standard out-of-the-box features to be configured
- Backup job notifications / monitoring
- Backup Job and Backup Set verification
- On-site versus Off-site verification
- Protecting backups from Ransomware
- Test restores
- Management Oversight
Good backups are an essential first line of defence to address a multitude of issues. At the end of the day computer hardware can be replaced, but your critical data will be unique to your business and will be either irreplaceable – or at the very least difficult and expensive to recreate.
The ransomware ‘business-model’ is predicated on the fact, that for the majority of businesses who experience a ransomware attack where their data and backups are compromised; is that it is cheaper for the business to pay the ransom (and hopefully then receive a tool that allows them to decrypt their data and their backups) rather than to try to re-create their critical data from scratch.
So “good backups” should be considered as a form of insurance.
- Hardware faults
- Software faults (e.g., software update that has serious unintended consequences)
- User errors (e.g., inadvertent file deletion or file corruption)
- Physical disasters (e.g., fire, flood)
- Malicious events (virus strike, ransomware attack)
What makes a “good backup”?
A “good backup” is a recent backup that has all the data (which can include applications as well as information) in a form that can be easily accessed and restored if needed.
What this means then, is that:
- Backups need to be performed regularly (typically at least daily). The value of backups is significantly diminished if they’re not current – or near current to the point-in-time that you need
- The contents of the backups need to be “valid”. That is, that the contents of the backup can be restored. Either individual files or folders, or the entire system if needed (e.g., in the event of a “disaster” like a fire or a ransomware-attack)
Conceptually then, at a high-level, the requirements for a backup system are fairly straight-forward.
However, as we will see, the devil is in the detail and you need to have processes in place that will work, day after day, week after week, month after month.
Processes that can be relied upon, long after the backup system has been implemented, and the initial management focus has moved onto other more recent issues.
Processes that are straight-forward enough to be followed, when personnel with primary responsibility for the backups are away, and a stand-in has responsibility for the backups and other network administration tasks.
Standard out-of-the-box features to be configured
Off the shelf backup systems (e.g., Macrium Reflect, StorageCraft, Veeam, Veritas etc.), whatever your preferred vendor, have many of the required features out-of-the-box and those features just need to be configured.
Where complications arise, is with the exception of conditions that are not handled by default, this is the “devil is in the detail” that I referred to earlier.
Backup job notifications / monitoring
As the manager responsible for your organisation’s backups, you need to know:
- that all systems are being backed up at least daily.
- that the backup jobs are being started as scheduled.
- and that if there are any failures – someone is being notified so that any failures can be investigated and resolved
So essentially, what you’re really interested in is the “exceptions” (i.e., jobs that are not started for some reason, or backup jobs that complete with an error).
Off-the-shelf backup systems will include e-mail notifications for:
- Backup jobs that completed successfully
- Backup jobs that failed
These notifications should be sent to a “monitor” mailbox e.g., email@example.com
The reason that a “monitor” mailbox is preferred to an individual user’s mailbox is three-fold:
- An individual user’s mailbox would not be readily accessible to another staff member when the nominated individual is away (e.g., on vacation).
- Ideally you want to implement a monitoring system that is not dependent upon any individual person.
The monitoring solution should be automated and should check daily that all expected notifications have been received and highlight any ‘out of line’ situations. That is, provide an exception report.
So, rather than having the backup job reports go to a user mailbox, you configure them to go to a “monitor” mailbox
- Off-the-shelf backup systems won’t provide a report of backup jobs that were scheduled, but were not started (unless an error has occurred, and in that case, what has actually happened, is that the job was started – but then failed for some reason (e.g., backup media inaccessible etc.)
Where the backup job was not started at all (either due to an error with the backup system application or possibly the Windows Task Scheduler).
That condition will typically be beyond the scope of the Out-of-the-box backup system monitoring and will need to be detected by an independent / custom monitoring solution.
A typical approach is to deploy a mailbox monitoring tool that is configured to monitor e-mail messages based on the following criteria:
- Message subject
Then match the incoming messages against a list of expected notification messages, when ‘all is good’.
Finally, produce a daily report of the exceptions, those notifications which don’t match the list of expected notifications identified.
With any exceptions to be investigated by someone within your IT team.
With this approach, a daily summary report will be produced which identifies:
- The backup jobs that have completed successfully – No action needed
- The backup jobs that have failed or completed with some error condition – To be investigated
- The backup jobs that have not been started – To be investigated
The exception report can then be actioned by the IT person with primary responsibility for your backups. But if that person is away, the exception report can also be easily understood by a person who is acting as their stand-in.
Backup Job and Backup Set verification
Off-the-shelf backup systems typically include the capability to perform on-site backup job / backup image verification:
Backup Job verification
Typically, the configuration of the backup image verification is included in the creation of the backup job itself, so this is generally quite straight-forward.
However, it is important to note, that this verification step, usually only verifies an individual backup image created by the backup job each time that the backup job is executed.
This can best be illustrated by an example:
A commonly used Backup plan is referred to as ‘Incrementals forever’, which as the name implies, commences with a Full Backup, and is then followed by incremental backups thereafter; where each incremental backup captures all the changes that have occurred since the previous backup.
So, assuming one backup per day, the backup set would look like this over the first two weeks
F = Full I = Incremental
In this example, the backup job verification, will only verify the single backup image created each time that the backup job is executed.
So, in the example above, on Week 1 / Monday, this would be the Full backup and thereafter each time the backup job was executed it would only verify the Incremental image that was created on that particular day.
Backup Set verification
To verify all the images that have been created by the Backup job over time, the Backup Set needs to be verified.
Again, off-the-shelf backup systems typically include the capability to perform on-site backup set verification. However, this step is not usually included in the creation of the backup job, but rather needs to be configured separately.
Because the size of the Backup Set will grow over time, the usual process is to schedule the Backup Set verification to be performed on a weekly basis.
The Backup Set verification should have e-mail notification configured on Success or Failure, so that it can be included in the notification monitoring that we have previously discussed.
On-site versus Off-site verification
In the previous section, we established that Off-the-shelf backup systems typically include the capability to perform on-site verification of backup jobs and backup sets.
However, with some exceptions, for the most part, Off-the-shelf backup systems typically do not include the capability to perform off-site verification of backup jobs and backup sets.
Moreover, in those instances, where the capability does exist to perform off-site verification of backup jobs and / or backup sets; that verification is performed via applications that are running from the on-site (source) server.
While this approach made sense prior to the rise of ransomware. In the ransomware era, this approach will leave your off-site backups vulnerable to ransomware attack and thus is no longer a viable approach.
Protecting backups from Ransomware
In today’s IT environment, with the advent of the ransomware era, where ransomware is effectively a business-model for the criminal element; it is essential that your backup system has multiple copies of your backup (refer to this article for a detailed explanation of the configuration of ransomware-safe off-site backups).
And that at least one of these backups is off-site and unable to be easily accessed from your on-premise network.
If you can readily access your off-site backup from your on-site network, then it is possible – and indeed likely – that in the event of a ransomware attack; that the ransomware attackers will also be able to access your off-site backup, encrypt your backup and thus render that backup useless.
The complexity of protecting the off-site backups, is the reason that many small to medium organisations are now turning to Managed Service Providers (MSP’s), who can deliver cost-effective backup systems that include off-site backups that are ransomware-safe.
However, if you are considering a do-it-yourself implementation, then you will need to deploy a solution that prevents your off-site backups being accessed from your on-premise network – even in a scenario where your on-premise network security has been breached (i.e. in a ransomware-attack scenario).
Some examples of the type of solution that would be needed are:
1. On-site backup agent, which is “locked-down”, and can only be managed via a portal using multi-factor authentication (MFA / 2FA).
Access to the on-site and off-site backups is controlled via the portal.
That way, even if the logon credentials to the portal are compromised. Provided that the second layer of authentication remains secure, then the off-site backups also remain secure.
2. Multiple copies of the backup-set, and in particular multiple copies off-site, where at least one off-site copy is not accessible from the on-premise network.
Conceptually this is the ideal approach.
If there is a copy of the backup-set, which is off-site and which is not accessible from your on-premise network. Then in the event of a ransomware attack, that inaccessible copy will also be inaccessible to the ransomware attackers.
Further information on this approach can be found here.
Test restores are an important aspect of the backup verification procedure.
There are three reasons why test restores are needed:
- A test restore independently validates the backup image verification process.
- Hardware Independent Restore.
- In many cases, in the event of a “disaster” where a full system restored needs to be performed off-site. The off-site system will not be identical to the on-premise system. That is, if the restore is to a physical system, the hardware will not be identical. And in many cases, for reasons of cost, the off-site environment may be a VM whereas the on-site system may be physical hardware for maximum performance.
- In cases where the system platform is different between on-premise and off-site, this is referred to a ‘Hardware Independent Restore’ (HIR).
- The backup image verification process does not take into account the HIR component of a restore, so this is not tested or verified during the image verification process.
- The only way to test the HIR component of a restore is to perform HIR; and then resolve any issues that may arise.
- By periodically performing a HIR, any issues can be identified and resolved at your leisure, and thus if a HIR needs to be performed “for real”, then the process should proceed smoothly.
- Identify any Licensing issues that may arise on the restored system.
- Some application licensing models will detect when the underlying system hardware has changed and then change the status of the licensing from ‘activated’ to some other status (e.g., ‘trial’).
- By performing a test restore, these applications can be identified in advance, and as with HIR above, a process can be established to handle the licensing re-activation at your leisure. Before any time-critical ‘live’ restore may need to be undertaken in the future.
One aspect of backups that has often struck me as strange and which in my view displays a lack of understanding by management, is that given the importance of backups in providing a mechanism to recover from many serious events, events that could potentially put a business out of business; responsibility for the day-to-day management of the backup system is frequently given to a relatively ‘junior’ staff member who may not appreciate the importance of the backups to the needs of the business.
Backups need to have senior management focus and should form part of the Key Performance Indicators (KPI’s) for a senior manager. Events that should be measured are:
- Backup jobs performed without error
- In an ideal world, all backup jobs should fall under this classification
- Backup jobs performed with an error
- Errors need to be investigated, understood, and resolved
- Backup jobs not performed as scheduled
- A rare condition, that needs to be monitored and resolved
- Left undetected this can leave the business exposed to not having backups available when they’re needed.
- Periodic test restores
- Typically, every 6 months.