An Effective Business Continuity Exercise Planning Example
Conducting Business Continuity (BC) exercises are a well-known means of validating BC plans and engaging an organization in assessing their response to business disruptive events. One of the biggest challenges of planning a meaningful exercise that engages participants and results in a renewed understanding of recovery roles and improvements to the plan, is to construct a fresh, relevant, and plausible exercise scenario.
A scenario is “fresh” when it is different, and something not previously considered. Too many exercises have exhausted simple fire, power outage, and weather event scenarios. These common recurring scenarios discourage and limit meaningful participation, which may no longer yield new and useful results.
Next, a scenario is “relevant” when it is appropriate to the current context of an organization’s business and operational environment. Obtaining a good understanding of the organization’s environment is essential for constructing a relevant and meaningful scenario. For example, a loss of IT infrastructure for a financial trading operation is vastly more relevant than the loss of water to its operating facility. Once this understanding has been obtained, the scenario’s events must impact key functionality and dependencies to provoke real thinking for improved recoverability.
Finally, a scenario must be “plausible” in that its occurrence is within the realm of possibility. For example, a volcanic eruption in a desert where there is no tectonic activity is not very plausible whereas a flash flood in a desert could be. When events are too far-fetched, the organization struggles to grasp the real ramifications and are not immersed the scenario.
Planning the Exercise Scenario – Details Matter
A recent pre-pandemic Virtual Corporation (Virtual) engagement was a perfect example of the need for a fresh, relevant, and plausible exercise scenario. Virtual was contracted to conduct exercises at several major manufacturing and warehousing locations for a chemical manufacturing client in the US, Canada, and Europe. The client’s previous exercises yielded limited changes to recovery plans and strategies. This year’s exercises needed a scenario that would be distinct, plausible and have a plant-wide impact since all departments within these facilities would be participating.
We knew proposing a scenario without assessing its potentiality, location and impact of its occurrence may not yield a very meaningful exercise. To gain a holistic understanding of the client’s facility in which the scenario would occur, we reviewed the Business Impact Analyses and past exercises with the client’s BC project and plant managers. The plant manager provided a comprehensive tour of the plant’s operations and facilities, which provided a fundamental understanding of the plant’s operations and key dependencies. Armed with this new insight, we were able to construct a plausible exercise scenario.
Our initial thought was to take-out the utility building that provided treated water and steam for the plant’s processing and supporting operations. While an explosion caused by a natural gas leak to the steam boilers is a plausible event, the plant manager preferred a more unique and challenging scenario. The plant manager suggested an outage to the plant’s local Data Center (DC), located within one of the company’s warehouses. This would result in the loss of access to the network and IT applications.
Such a scenario was “fresh” because it had not been contemplated previously. More importantly, the scenario was “relevant” as it illustrated the increasing dependency of plant operations on critical IT infrastructure and systems. The scenario was also more pervasive and significant as both the plant and nearby regional distribution warehouse utilized the same DC to house their IT systems. While the scenario covered the “fresh” and “relevant” requirements, was serious damage to the DC plausible?
The Virtual Corporation team visited the warehouse again to better ascertain a scenario to realistically compromise the DC. This DC followed the trend to utilize a large POD (portable on-demand) moving/storage-like container to house the IT network and computing infrastructure. Examining the DC’s backup generator and HVAC unit revealed no viable vulnerabilities. Both were located outside the building and well protected, making it difficult to come up with a plausible event that would damage either unit.
When we noticed the proximity of the POD to the warehouse shelving racks, we immediately recalled seeing a video of the collapse of the warehouse shelving system on YouTube. Serious damage to the POD and its exterior power and network cables was plausible because of their proximity to the shelving structure should its structural integrity were to be compromised. The video also showed the scenario was not too far-fetched and allowed participants to visualize the scenario realistically within their minds. Our “fresh, relevant and plausible” scenario was born.
To further enhance the quality of the exercise, we invited two visiting corporate IT representatives to join the exercise. This ensured all key recovery roles were represented in the exercise, as would occur in a real-life event. IT’s real-time application impact analysis and insight provided all exercise participants with vital information and answers to questions they had never thought to ask.
What We Learned
Having representatives from all the recovery groups, in the same room at the same time, focusing on the same recovery issues exposed new unknown critical gaps. It became clear that impact to the plant would affect and eventually shut down all the surrounding facilities. This was further complicated by IT’s best-case estimation of two to four weeks before critical network and applications could be restored.
This unexpected revelation raised the need for the warehousing and distribution groups to consider contingency actions and manual workaround procedures to ship, track and manage inventory until a suitable solution was in place. The scenario also provided Corporate IT with a clear understanding of the unacceptable impact to the company and the immense need for a suitable facility-wide DR solution.
Overall, the exercise was well-received by all participants. It achieved the overarching goal of further enhancing everyone’s understanding of how reliant the plant and surrounding facilities were on IT systems. It also shattered the preconceived notions of real-world IT restoration and revealed the vital need and importance for manual workarounds.
As this example demonstrates, careful planning with attention to detail is key to constructing an effective, meaningful exercise, which can be encompassed by the following three principles:
- Consider fresh, relevant, and plausible scenarios
- Perform the necessary due diligence to understand the environment in which the scenario will take place.
- Engage all key stakeholders
Be creative and have fun while doing it!
About the Writers
Bob Farkas, PMP, AMBCI, SCRA
Senior BCM Consultant
Bob has been with Virtual Corporation since 2001 during which he has led many Business Impact Analysis (BIA), Business Continuity Planning, and Risk Assessments projects across health care, manufacturing, government, technology and other services industries. In addition, he has been instrumental in building and refining Virtual’s processes and toolkit bringing new approaches and insights to client engagements. His career spans materials engineering, programming, telecom marketing research, IT outsourcing and business continuity. Bob holds PMP, AMBCI and SCRA certifications and has a Master’s in Chemical Engineering from the New Jersey Institute of Technology and Bachelor’s in Metallurgical Engineering from McMaster University (Hamilton, Ontario.)
John T. Hill, MBA, CBCP, ITIL
Senior Resilience Consultant
John is a proven resiliency expert with 20+ years of technical experience developing, leading and maturing enterprise Business Continuity, Disaster Recovery, Crisis Management, Incident Management, and Cyber Security Response for Fortune 500 companies. Mr. Hill is known for implementing redundant technical infrastructure/systems/staff to ensure resilient systems against business interruptions and actual disasters. What sets John apart is his ability to clarify complex technical concepts and partner with IT to build actionable plans for proven recovery capabilities and long-term business sustainability. John is ITIL certified and a Certified Business Continuity Professional.