If you’ve ever had to exercise your Business Continuity Plan, I hope you walk away from it as we did – supremely proud of your staff. Our team had to respond very recently and did a great job. But see if you can spot the one big vulnerability that we have to address.
Recently, we had an incident that took down a number of our systems – albeit for a very short period of time. Within minutes, our staff had mobilized. While one of our experts did everything they could to remotely bring the affected systems back, two of our staff braved the storm and headed to the data centre in case the first efforts failed.
Within a few hours, the situation was under control and we were back up and running. But the staff worked until four a.m. getting everything back to normal. They went home exhausted but triumphant. They all did everything, from technical work to communications like a well-oiled machine.
The next morning, as tired as they were, they had a debrief. The next day they presented me with a clear “blow by blow” account including lessons learned.
This didn’t come out of “nowhere”. Nor was it luck. We practice this. We test backups and restores. We ask tough questions. We learn from every encounter. We think about and design for resilience. But Murphy’s Law reigns supreme. As good as this team is, they never get complacent. Good thing. Because we missed something. Can you guess what it was?
We realized what we needed to do when we did our assessment for the COVID 19 crisis. Like many of you, we have functioned without a member of the team for many reasons – some mundane, some joyous, some sad. We’ve survived.
But COVID-19 crisis has caused us to consider much deeper losses. What would happen if we were functioning with most of the team unable to work? What if we faced more than the loss of one or two key people? Where would we break down? We needed to find out.
Here’s what we did. We:
- outlined everyone’s skills and duties in the company. We put these into a table.
- went through listed the roles they would play in any situation where our systems were compromised.
- looked at their ability to act and their information needs. We detailed the information they had including access to including passwords, configurations. We noted the skills they had.
Then we brainstormed a list of disaster scenarios. We turned these into “use cases”. Like any use case for testing, we made them as real as possible. We looked at everything from a ransomware attack to a backup or network failure and all the way up to a total systems meltdown.
Now comes the interesting part. Remember those puzzles where you pull a piece out until the whole thing collapses? Using our list, we kept removing people until we saw where we would no longer be able to recover. That’s where you stop and ask yourself, “how do we mitigate that?”
We serve a lot of small and mid-sized companies. Most have very few IT staff. Some have only one. In some cases, we are the entire IT function of the company.
This exercise started us thinking about what would happen to companies who were not our clients if their total IT resources were unable to respond? The COVID virus and its rapid spread make it likely that this is going to happen to some companies. Murphy’s Law being what it is, the likelihood that a technical or other disruption will occur at the same time is not inconceivable.
With so many people working remotely and dependent on IT systems what would this do to an already vulnerable company? How would these companies respond? Could they?
Our disaster recovery is fairly well documented. What about those companies where all the knowledge is with one or two employees?
In the spirit of identifying likely risks and mitigating them, we’ll ask you to ask the same question. What would you do if you had a severe issue and a loss of your key IT resources? How many pieces of the puzzle could you remove before the whole thing collapses?
In managing risk, one looks at the likelihood and the impact. You get these by asking questions.
If you are a company where you have few IT resources and those key people are not available when a serious problem occurs – what will you do? How much more vulnerable are you now that so many of your workforce are working at home? The answer to those questions gives you the impact. As for the likelihood, you have to ask yourself another question. How much more likely is the loss of key resources?
It’s just our opinion, but in the current circumstances, we think there are only two answers.
(a) Yes, I’d like to know what I should do to address the risk of losing key people and facing a severe system disruption.
(b) No, thanks. We have it covered.
If you answered (b) good for you. If you answered (a) we’d be pleased to talk to you about how to evaluate these and other risks. No obligation.