Contingent operational resiliency risks represent a creeping threat that require attention sooner rather than later. How can risk management respond rapidly to these contingent risks and prioritize “good enough” mitigations?
The “New Normal” exposes institutions to a different set of operational resiliency risks that may not be addressed in existing business continuity plans. Like the underlying public health crisis, the contingent resiliency risks of the “New Normal” represent a creeping threat. The public health challenge is to address the spread of the virus before it builds into a wave of cases that overwhelm the capacity of the hospital system. The contingent resiliency risk management challenge is not to implement full mitigations for each new potential resiliency event. What is essential is to rapidly deploy “good enough” mitigations that are sufficient to prevent the institution from being overwhelmed if multiple resiliency-threatening events materialize together.
Resiliency Risks (i.e. risks that can lead to an event that can cause business interruption that impact an institution’s overall capability to conduct business) have attracted significant recent attention. Regulators have communicated [See Endnote 1] their intention to focus on resiliency risks and financial institutions have been establishing dedicated programs to identify and apply governance to them. Many resiliency risks are mitigated by contingency plans such as remote working, the use of alternative infrastructure and alternative business practices and processes.
The need for social distancing during the COVID-19 pandemic has required contingency plans to be invoked. Contingency processes and infrastructure are now being used to conduct business in the “New Normal”.
This “New Normal” exposes institutions to a different set of resiliency risks that have the potential for business interruption and significant loss that institutions may not be fully prepared for:
- Business continuity planning may assume operating in contingency mode for a more limited time rather than extended business-as-usual use [See Endnote 2].
- Contingent infrastructure / processes do not have contingency plans themselves.
This is not necessarily a failure of planning: Accurately anticipating likely future external events is hard. Anticipating all the potential adverse events only encountered in a specific “New Normal” is much harder.
Examples of potential contingent resiliency risks include:
- Inability to execute key processes due to loss of connectivity to the homes of key team members living in remote areas caused by utility provider service interruptions. Utility interruptions may arise from Covid-19-caused staffing level degradations.
- Inability of vendors / service providers to maintain service levels for critical processes due to stretched resources arising from simultaneous invocation of their contingency plans and the need to support contingency arrangements for multiple customers at the same time.
- Fraud events or data / privacy breaches arising from circumvention of internal controls. Segregation of duties controls may be more easily circumvented through use of unofficial and consequently unmonitored communication channels (e.g. personal Skype or Zoom accounts or phones). Employees may unwittingly circumvent controls in an unfamiliar (remote) work / control environment through manipulation by malign actors to unwittingly circumvent controls.
- Social engineering / manipulation of remote employees may also result in implantation of malware.
- Distributed Denial of Service (DDOS) attacks designed to exploit the infrastructure that enables critical processes to be conducted on a distributed basis.
- Compromise of financial reporting due to insufficient secure communications infrastructure capacity (originally sized for only a limited proportion of the workforce to work remotely) to support peak demand needed to complete time-sensitive and communication-intensive critical processes.
- Lack of agility to respond to financial and operational stress events: Crisis management processes that address these events often include “whiteboard” sessions to formulate responses. Untried / untested remote counterparts may not enable responses to be formulated rapidly enough.
The challenge for Risk Management
The first step to mitigate the New Normal’s contingent resiliency events is to identify them. From there, governance can be applied to prioritize them, identify “good enough” mitigations and oversee implementation.
It’s tempting to wait for the New Normal to settle in before undertaking this. However, like the underlying public health crisis, contingent resiliency risks represent a creeping threat. The public health challenge is to address the spread of the virus before it builds into a wave of cases that overwhelm the capacity of the hospital system. The contingent resiliency risk management challenge is not to implement full mitigations for each new potential resiliency event. It is essential to rapidly deploy “good enough” mitigations that are sufficient to prevent the institution from being overwhelmed if multiple resiliency-threatening events materialize together. (Contingent resiliency risks are not necessarily uncorrelated risks. One of their causes is shared: our collective lack of familiarity of doing business in the New Normal.)
The people best positioned to identify vulnerabilities in the New Normal are business line management. However, this is currently a highly stretched resource. Line management’s focus is the transition to business-as-usual in contingency mode, fighting the resulting fires to ensure control is maintained and obligations to customers and regulators are satisfied in an extremely challenging business environment.
The second line of defense however can take the lead and work with its partners in the business lines to enable the institution to expeditiously ensure it is sufficiently prepared.
What needs to be done?
Risk Identification
The potential risk events that might cause business interruption in the New Normal need to be identified and understood i.e.
- Risk Cause: What is the new external event or infrastructural / process vulnerability that might cause a business interruption? i.e. what needs to be fixed in order to mitigate the risk?
- Risk Event and Impacts: What is the nature of the potential interruption and what will be its consequences? i.e. how and where will the institution specifically not be able to satisfy its obligations to customers, regulators and other external stakeholders?
Mitigation Governance
Establish governance overseen by a senior business leader who can hold other business leaders accountable to:
- Prioritize the new risks i.e. decide which contingent resiliency risks can be accepted and which require “good enough” mitigation.
- Assign enterprise wide responsibility and accountability for the implementation of a “good enough” mitigation for each managed risk.
The priority is to mobilize action to put in place “good enough” mitigations for the most critical resiliency risks. Existing initiatives can be identified and re-purposed rather than stand up completely new initiatives to implement ideal solutions. This avoids incurring a lead time to stand up new project teams / organizations. The approach is similar that being taken to address the underlying public health crisis. Developing new drugs and testing them has a significant lead time. An existing drug that has already been proven to be safe to use can be deployed rapidly if it is found to be effective in at least helping patients get through the disease even if it is not a perfect cure. The priority is to rapidly deploy enough mitigation to new risks to prevent a scenario where consequent interruptions may overwhelm overall resiliency.
Ensure Consistency and Transparency
Resource and funding for mitigation are extremely limited due to the challenge of adjusting to the New Normal and the challenges of the business environment. Both should be prioritized in a consistent and transparent way to ensure that selected mitigations maximize the reduction of the institution’s exposure to business interruption. A common prioritization process will avoid duplication of action and enable prioritization of remediation resource / funding based on enterprise-wide priorities.
How can it get done?
Risk Identification
It’s tempting to use the indicators of the business-as-usual risk program (KRIs, Scenarios and RCSAs) as a starting point. The resiliency risks of the New Normal were however previously contingent risks. It is unlikely that these programs would have explicitly addressed them. The second line of defense participants in these programs, especially those aligned to specific businesses should have a solid understanding of their first line partner’s concerns. Identification can start by asking the question:
“If you were in the first line now what would keep you up at night?”
If managers in the first line are available for a brief conversation (no more than 45 minutes) they may appreciate the opportunity to unburden themselves and answer the question:
“What is keeping you up at night?”
Each situation keeping someone up at night is a story. The goal of the exercise is to identify the key contingent resiliency risks that can be prioritized at the enterprise level. This means that the individual stories need to be analyzed thematically. This means cataloguing separately and explicitly for each story:
- What is the problem? (Risk Cause i.e. what needs to be fixed)
- Why do we care? (Risk Event i.e. what can happen if we don’t do anything)
These component parts of each story can be categorized taxonomically by Risk Cause Type and Risk Event Type. This allows aggregation by Risk Event Type and Risk Cause Type to find common themes i.e. common stories that tell of repeated concern of a type of potential business interruption event that might happen because of a common problem.
The temptation to identify themes by arithmetic aggregation should be resisted. This isn’t a scientific process. It’s a journalistic endeavor to find the most important stories about threats the New Normal’s resiliency that should concern the leadership of the institution. The most that should be done arithmetically is adding up weighted counts of categorized problems and potential events to identify a candidate list of 20 to 30 themes. The individual stories making up these candidate themes should be examined to see if they really do constitute a consistent story. If they don’t then perhaps some of the individual stories should be categorized differently. The process can be repeatedly iteratively until a list of 20 to 30 themes emerges that makes sense to the members of the team. This list has can be validated with line management by asking them:
Are these risks to the resiliency of the institution that should be keeping its leadership up at night?”
Risk Governance Approach
The same approach can be taken to overall mitigation governance: Action can be most expeditiously mobilized by using an existing governance forum within the business chaired and attended by stakeholders that have a significant existing stake either because they are responsible for business functions that will be significantly directly impacted or because they are responsible for a significant proportion of the resources required for mitigation.
There will be no existing governance forum that is a perfect fit. For example, the Chief Information Officer will be responsible ultimately for the remediation of many resiliency risks but does not have responsibility for all the resources needed to address even all the cybersecurity risks. For example, preventing data breaches while most staff are working remotely requires business line resources to comply with policy and only use approved mechanisms for sharing sensitive information. The CIO cannot directly impact the behavior of those resources, but the CIO does have significant stature with her or his peers in the management committee to present a plan of action for each business and impress upon them the importance of their follow-through within their business lines.
The attendees at the selected governance forum (e.g. the CIO’s direct reports) similarly will not have direct control over all the necessary resources but they can be assigned responsibility for individual thematic risks and be made responsible for coordination of remediation across organizational boundaries.
Conclusion
It is very tempting to wait until all the contingency plans that are currently being invoked are stable and the gaps have been addressed before diverting attention to creating new continuity plans for the New Normal. This is dangerous because the previously contingent resiliency risks that are now current resiliency risks of the new normal represent a creeping threat. It is possible to execute a parallel process to identify and prioritize mitigation of these risks by refocusing existing risk management resources and governance structures. An economic downturn is not normally the time to incur new expenses, but the identification of these new risks might be accelerated by leveraging pre-existing content (e.g. rich detailed taxonomies available from vendors).
Endnotes
Endnote 1
Examples of recent regulatory attention to resiliency risk include:
The Basel Committee for Banking Supervision (BCBS) has established a working group to define policy proposals for Operational Resiliency. US regulators have indicated their interest including the publication by the New York Fed of an analysis of the Systemic Risk implications of cyber attacks.
The Bank of England has gone further and has published proposed draft standards for operational resiliency that would apply to UK financial institutions including “Impact tolerances for important business services”.
Endnote 2
The impact of a pandemic pandemic event seems to have been generally vastly underestimated at both an institutional and macroeconomic level. For example, the World Bank estimated in 2006 that the next influenza pandemic could well cost the world economy up to two trillion US dollars. The cost of the Coronavirus Aid, Relief, and Economic Security (CARES) Act passed in the US on March 27 2020 alone is equal to that estimate.
Dan Shalev is an Information Architect and Governance, Risk and Compliance (GRC) professional focused on enabling data-driven business decision-making. His key focus and passion is the enablement of business decision-making using consistently organized data irrespective of the systems, businesses or geography those data have been sourced from. Dan can be contacted at dan.shalev@bifrostanalytics.com.
© 2013 – 2020 Dan J Shalev. All rights reserved.