Process, Risk and Control Instance Records

Why taxonomies alone are not enough

This article was originally published in the August 2013 issue of The Risk Universe.

A common risk language of Process, Risk and Control Taxonomies is a core and indispensable component of any analytical framework for the management of Operational Risks. The aggregation and analysis of exposures and control effectiveness across the enterprise, different business lines, business units and geographies is not possible without a robust taxonomic framework to consistently describe exposures and assessments. Deployment of a common risk language across the institution provides powerful analytical capabilities to all lines of defense to understand just where to focus attention to understand and manage/mitigate the key exposures of the institution.

The referential framework is not however a substitute for complete and robust descriptions of the actual instances of individual processes, risks and controls and associated assessments. Loss arises because a specific condition exists within specific locations within the business that may cause a specific future event to occur, that will result in the business experiencing one or more specific impacts. That specific potential event has a specific likelihood of occurring that can be assessed and those specific impacts will result in loss of a specific severity that also can be assessed. The potential event will take place as the consequence of one or more specific causes. If those specific causes are understood then a specific targeted action plan can be considered and executed if the decision is made not to accept the risk.

Using the analogy of an accounting system the combinations of risk types and process types within a business can be considered to be sub-ledgers. The individual instances of risk can be considered as analogous to the individual balances within the sub-ledger. It would be hard to accurately value a sub-ledger without values for the individual balances that it aggregates.

Management of risks requires decisions to be taken by business stakeholders to accept or mitigate risks and, if they choose the path of mitigation, to fund and deploy resources to implement selected mitigations. These decisions are no different from any of the prioritization decisions made as part of the business governance process where funding and resources are allocated amongst competing priorities for investment. The risk framework operated by the first and second lines of defense provides the analytical capability to focus decisions on how the key risks the business is exposed to are included in these discussions. Before business stakeholders will make decisions to commit resources they need to accept and be bought into the assessments of these exposures. These conversations must be concrete and specific to the circumstances of the business line: How exactly might an event take place that could result in loss and how exactly will that loss manifest itself? Where exactly i.e. within which specific business activities might that event take place? Exposures must be explained this specifically so that the assessment of their extent can be challenged and potentially accepted by business management and the resources for their remediation or mitigation can be prioritized ahead of the other competing demands for investment and the application of resources within the business.

The Referential Framework Revisited

The core of the Operational Risk Framework data model consists of the records of the assessments of risk and controls and the processes where they are located. These assessments are created and updated as part of RCSAs and updated on an ongoing basis as a result of the monitoring and investigation of related diagnostic data elements e.g. KRIs, loss events, scenarios, control test results, audit findings, issues, specialist assessments (e.g. information security, business continuity) etc.

In order to support the aggregation and analytical needs of the risk framework while at the same time facilitating business relevant risk-management decision-making discussions, the data model used must support both a structure for instances of processes, risks and controls and separately a structure of the referential labels needed to apply analysis.

The referential labels are embodied in taxonomies that taken together comprise a common risk language that serves as a descriptive structure for identification / documentation, analysis / learning, comparison, aggregation and data mapping and mining of risks and controls. The importance of a robust referential framework cannot be over-emphasized. The Basel Committee on Banking Supervision in its “Principles for effective risk data aggregation and risk reporting” has established a mandate for Globally Systemically Important Banks (G-SIBs) to establish integrated data taxonomies [See Endnote 1]. (The BCBS goes further to strongly encourage national regulators to make these requirements mandatory for Domestically Systemically Important Banks).

In order to facilitate the actual management of risks however a bridge is required between the aggregated data described taxonomically and the specific circumstances of the business that are discussed in the routine governance forums where prioritization, funding and resource allocation decisions are made within the business. This bridge is the understanding of the specific instances of processes, risks and controls that give rise to key exposures i.e. the specific locations where risk exposures arise, the specific business circumstances that may potentially give rise to future loss and the individual controls that may require remediation or replacement.

Unfortunately many institutions as they have embraced standardized descriptions of risk have neglected to support within their frameworks the ability to consider the actual conditions within the business and their individual processes that lead to their assessment of risk exposure and control effectiveness. Without an articulation of the specific facts, observations and opinions that lead to a given assessment it is hard to gain confidence in an assessment of the extent of an exposure beyond the individuals who made the original assessment. A basis for a challenge of an assessment can only be obtained by re-review of the source diagnostic records considered as part of the original assessment process as opposed to simply a discussion of the reasonableness of the conclusions of the assessment (i.e. a discussion of the reasonableness of the assessment of the likelihood that a specific potential event may occur and the severity of the impacts that the assessors believe the business may experience).

Such assessments although they satisfy the requirements to complete a documented assessment of the level of risk exposure and control effectiveness are limited in their usefulness because of the difficulty to share confidence in the measurement of extent of the exposures beyond the group that prepared the assessment and also because they provide little useful input for conversations within business governance forums where a business case must be made in order to prioritize resources and funding for risks that are not accepted.

An example

The problem can be illustrated using a specific example: A risk report presented to the operating committee for a fixed income trading business in a New York based investment bank includes a risk with the following details:

The risk is presented with an estimated funding requirement of $300,000 for a remediation project, which the Operating Committee is being asked to prioritize.

The agenda of the operating committee meeting includes consideration of plans to fund a strategic project to enhance a key settlement system to allow the firm to execute business in an emerging economy where the firm would be able to compete on favorable terms to local participants and potentially secure a significant market share provided that operations commence before other foreign institutions enter this market. Available funds are limited. The COO is asked to prioritize funding. The business case for the strategic project is clearly articulated. The COO is not opposed to taking action to mitigate an unacceptable risk and potentially delay the strategic project but she does not understand why this risk is beyond risk appetite and has not been accepted. In order for her to gain comfort in the need to take action on this risk she needs facts or at least observations articulated to her that explain specifically how and where the business is exposed to loss as a result of the use of obsolete technology. Exactly what event could happen if the use of obsolete technology is not remediated and how exactly would the business experience impact? If the risk is not going to be accepted then she needs to understand what specific actions she is being asked to authorize and fund. The business case for funding the strategic project to support early entry into the new market is compelling but the COO understands fully the importance of taking pre-emptive action to manage Operational Risks before risks are manifested. It is agreed to postpone for a week the decision on the allocation of funding and that the Operating Committee will reconvene after Operational Risk Management returns with the necessary specific information about this instance of a risk.

What specific facts, opinions and observations about individual risks and associated controls and processes can be articulated?

Process

In order to be able to appreciate the impact of a risk it is necessary to understand where the risk is located. Processes are executed in order to achieve business objectives. If a process is ineffective or at least not fully effective then the ability of the enterprise to achieve its objectives is impaired and the results or the assets (either tangible or intangible) will be diminished. Impact (i.e. financial, reputational or regulatory loss) is therefore experienced as a result of a loss event taking place within a process and therefore preventing (or diminishing the ability of) a process to achieve the outcomes it is intended to achieve.

Consequently in order to be able to begin to assess the impact of a specific potential loss event it is necessary to understand exactly within which process instance it occurs. Recently major institutions have devoted considerable effort to process excellence initiatives. As a result institutions have created libraries of process maps of individual processes that document each of the steps that are executed in each individual process belonging to each individual process owners. Process maps are extremely powerful tools to estimate the impact of a loss event. Events take place because of one or more causes, often a failure or a lack of control over a specific step in a process. If that step in a process can be identified then it is possible to understand how a specific event prevents or impedes the process from achieving its intended specific outcomes. Individual process maps owned by individual process owners also will identify what specific products and services and books of business the process instance supports. For example, a process type such as “account opening” will describe the type of activity that may be impaired. The specific process instance of “Account opening for preferred customers at the Orchard St branch in London” very specifically identifies the book and flow of business that is at risk.

In the example discussed above the process type in the RCSA is “Technology Acquisition and Maintenance”. Although we do know which business this process is supporting there are many technology maintenance activities that support this business so this gives little information about how the impact may be experienced. Within the IT function however there are three separate process owners for this type of process each with responsibility for supporting a number of discrete group of applications. As part of the review to report back to the Operating Committee the OR Manager discusses the RCSA with the people involved in preparing the RCSA. As part of these discussions he learns that this assessment was raised as a result of considering an audit finding raised with respect to the “technology acquisition and maintenance processes for operating systems and infrastructure for core settlement systems”.

Control

Controls are the measures employed by management to constrain the variance of the outcomes of processes to within an acceptable tolerance (i.e. risk appetite). Controls ate therefore designed to prevent or at least control the occurrence of loss events that result in financial impact, diminishment in reputation or regulatory sanction.

The information documented in the RCSA is constrained to stating that controls over “Restrictions on Use of Unsupported or Obsolete Software or Hardware” are unsatisfactory. Even within the process to maintain the operating systems and infrastructure for the core settlement systems there are many measures that are employed to control the use of unsupported or obsolete software. We understand from the report that one or more of these measures are ineffective in a manner that may cause a risk event to expose us to impact. In order to understand how that may happen we need to understand specifically which measure or measures were assessed as ineffective.

A core control to ensure that obsolete technology is not used beyond the point where it cannot be supported is for IT functions to monitor the versions of technologies they use against roadmaps that include sunset dates beyond which given versions of technologies are deemed to be unsupportable. Review of the original audit finding shows that it was raised with respect to the “Adherence to IT component roadmaps for batch control tools”.

Risk Cause

Now that we have identified the instance of both the control and the process we understand that this risk arises because a potential event may arise because of a potential failure in control over batch processes in the core settlement systems. But how exactly may that cause an event to occur that may expose us to loss?

At this point we need to understand what is the specific gap in control within this control instance. The audit finding documents that the vendor of the job control tool used to control the batches for the core settlement systems was acquired by another IT vendor 7 years ago that markets its own tool. Although the acquiring firm has provided support for the legacy tool they strongly encouraged the customers of the acquired firm to migrate to using their own tool and they ceased to provide support for the legacy tool a year ago.

A risk event can therefore be caused as a result of the following control gap:

“Lack of external vendor support to resolve defects in the job control tool that controls the automated scheduling and execution of batch processes in the core settlement system”.

Risk Event

We are now in a position to consider the possible risk events. Almost all actual settlement activities are executed using online capabilities so a failure in batch scheduling will not affect the actual settlement of trades. However the settlement systems generate many files during batch processes that are executed during the night. The batch control tool is used to schedule and automate the execution of the batch processes that create these files and ensure their delivery to other downstream systems so that they can be received by agreed service level targets. Perhaps the most critical of these files is the file sent to the collateral management system. This system needs this file by a given time in order for it to calculate the possible margin calls that must be made to counterparties.

The batch control tool has been used for a long time to support the settlement system so it is considered highly reliable. However changes to other IT components that it was not originally designed to work with may cause it to fail and since it is not being actively supported there would be no advance warning or fix for this. The batch may fail, the files will not be delivered to the collateral system and if the market moves sufficiently one or more counterparties may be significantly under-collateralized. This alone is not enough for the bank to experience impact. However if counterparty fails when it is under-collateralized then the bank will not be able to recover much of its exposure and as a result will experience loss.

The specific loss event instance is therefore:

“A counterparty to be under-collateralized at the point that it fails”

There are two important consequence of this:

Risk Type Categorization

The Risk Event Type in the RCSA is “Obsolete or Unsupported Applications/Systems”. The Risk Cause Type is definitely IT related i.e. “Application and Infrastructure Components Fail to Comply with Architecture or Information Security Component Version”. However now that we understand exactly how this cause will lead to a risk event i.e. we have identified the risk event instance, we now understand that the original risk event type is inappropriate. Because the discussion during the RCSA process was at the level of the overall line of business and process type and there was a significant audit finding relating to technology obsolescence this risk type was chosen. The Risk Event Type however can now be seen to more accurately:

“Collateral Management Errors”.

Likelihood Assessment

The original likelihood assessment was “high”. Again this was selected because there was a significant audit finding relating to obsolete technology and it was understood there were real issues with obsolete technology. Therefore this likelihood of this risk type was assessed as high at the level of the overall aggregation point.

We now understand that the actual possible risk event relates to failure to be fully collateralized if a counterparty fails. There are many contingencies needed for this to occur i.e.

A previously undiscovered defect with the tool is manifested.
A batch consequently fails.
This occurs when at a point when a margin call should have been made.
The now under-collateralized counterparty fails before the error is corrected and margin call is made.

The combined likelihood of all these components can therefore be seen to be quite low and that likelihood assessment should have been stated at “low” at the very most.

Risk Impact(s)

Now that we understand exactly what the risk event is and where it will occur (i.e. in which book of business) we now understand that impact will be felt in the form of financial impact if we are not able to fully recover our exposure. Understanding exactly where the event will take place allows us to estimate given the average position of counterparty in this business that the financial impact would be very high.

Action

We also now understand the exact cause of the possible risk event. We cannot directly control risk events. The only way to control their occurrence is by controlling the causes of events. Understanding the specific cause or causes of a possible risk event instance allows us to understand what would be a suitable mitigation of that cause. Given the nature of this instance of the cause we can now understand why the $300,000 has been requested to mitigate this risk in order to do the following:

Migrate to the new supported job control tool offered by the vendor
Implement enhanced monitoring of batches in the period before the implementation of the new tool.

Summary

As a result of the identification and analysis of the relevant process, control and risk instances the risk instance can be described by the following risk statement:

“Lack of external vendor support to resolve defects in the job control tool that controls the automated scheduling and execution of batch processes in the core settlement system that causes a counterparty to be under-collateralized at the point that is fails resulting in direct financial loss in the form of exposure to the counterparty that cannot be recovered.”

This risk statement allows discussion of this instance of risk amongst the members of the operating committee who have not reviewed the original source records. It gives them the basis to discuss, challenge and ultimately accept the now revised risk assessment of Yellow. Even though the risk assessment has been revised down (because of the lowered likelihood assessment) the stakeholders understand the importance of mitigating this risk and are able to prioritize the funding for the action plan.

Conclusion

Aggregation of risk exposures and control effectiveness assessments by taxonomic types is essential for risk accounting and analytical purposes. However capturing details of the individual instances of risks and then assessing them and subsequently aggregating those assessments (rather than performing assessments at the level of the aggregation point) enables:

More accurate assessments
More accurate risk type identification
Increased probability of buy in from business stakeholders into risk assessments.
Identification of appropriate focused mitigation activities and increased buy-in as to their appropriateness.

Endnotes

Endnote 1

BIS Basel Committee on Banking Supervision: “Principles for effective risk data aggregation and risk reporting” January 2013. Page 7.

Dan Shalev is an Information Architect and Governance, Risk and Compliance (GRC) professional focused on enabling data-driven business decision-making. His key focus and passion is the enablement of business decision-making using consistently organized data irrespective of the systems, businesses or geography those data have been sourced from. Dan can be contacted at dan.shalev@bifrostanalytics.com.