Insights · Report · Risk · Apr 2026
A board-ready narrative on important business services, substitutability, scenario testing, and how supervisors expect evidence to look in audits and exams.

Operational resilience has moved from a regulatory side project to a core strategic discipline for boards and executive committees. Supervisory authorities across multiple jurisdictions now expect firms to demonstrate that critical services can withstand severe but plausible disruptions, recover within stated tolerances, and communicate transparently while doing so. The convergence of operational resilience with ICT risk management and third-party oversight creates a single governance challenge that cannot be delegated to any one function alone.
The foundation of any resilience program is a credible definition of important business services. These are not internal processes or technology systems. They are the outcomes that end users, clients, and counterparties depend on: processing payments, settling trades, issuing policies, or granting access to deposits. Defining them requires collaboration between product owners, risk teams, and operations leaders, because the language must be meaningful to regulators and actionable for engineers simultaneously.
Separating customer harm from internal inconvenience is the sharpest test of a well-scoped resilience program. A delayed internal management report is an inconvenience. A customer unable to access funds, execute a trade, or file an insurance claim represents genuine harm with regulatory and reputational consequences. Boards should insist on a clear taxonomy that distinguishes these categories, ensuring that investment in resilience is directed toward the scenarios that matter most to the people the firm serves.
Impact tolerances define the maximum level of disruption a firm is willing to accept for each important business service. They are not recovery time objectives borrowed from disaster recovery plans. Impact tolerances consider duration, volume of affected customers, data integrity, and financial loss simultaneously. Setting credible tolerances demands quantitative analysis, not aspirational statements. Firms that express tolerances only in time-to-recover terms often discover they have addressed the wrong dimension of customer harm.
We can present findings in a working session, map recommendations to your portfolio and risk register, and help you prioritize next steps with clear owners and timelines.
The layer model presented in this brief assigns clear ownership from board governance through operational execution. At the top, the board risk committee sets appetite and reviews tolerance breaches. Below that, a dedicated resilience function coordinates cross-functional mapping and testing. The operational layer encompasses technology, operations, and supplier management teams responsible for day-to-day controls. Clear boundaries between layers prevent overlapping committees and duplicated vocabularies that slow decision-making during actual incidents.
ICT third-party concentration has become a primary area of supervisory attention. Cloud infrastructure providers, market data vendors, core banking platforms, and identity verification services each represent potential single points of failure. Regulators are not asking firms to avoid concentration entirely, as that would be impractical. Instead, they expect documented evidence that firms understand their concentration exposures, have assessed the systemic implications, and maintain credible contingency arrangements. A register of critical ICT providers, updated quarterly, forms the baseline evidence.
Substitutability is the practical question beneath every concentration discussion. How quickly can you move a critical function to an alternate provider, an alternate region, or a controlled manual fallback without breaching your stated impact tolerances? Answering this question honestly requires technical proof, not procurement assurances. Firms that have actually tested failover to a secondary cloud region or manual payment processing under time pressure carry fundamentally different credibility in supervisory conversations than those relying on theoretical plans.
Leading firms maintain a living register of exit triggers for every material ICT third-party relationship. An exit trigger is not simply contract expiry. It includes persistent service degradation, security incidents affecting trust, changes in the supplier's ownership or financial condition, and regulatory actions against the provider. The register documents the trigger, the response playbook, the estimated transition timeline, and the last date the playbook was reviewed. Static registers that gather dust between annual reviews provide limited supervisory comfort.
Contract clauses deserve more attention than they typically receive during initial procurement. Access to audit logs during incidents, the right to participate in joint resilience testing, transparency on subprocessor chains, and guaranteed cooperation during exit scenarios are not standard provisions in most vendor agreements. Renegotiating these terms after an incident occurs is costly and slow. Procurement and legal teams benefit from a standardized clause library mapped to regulatory expectations for material outsourcing arrangements.
Scenario design is the area where resilience programs most frequently stall. Effective scenarios are specific enough to produce measurable actions and decisions. They name the affected service, the failure mode, the time of occurrence, the escalation path, and the communication obligations. A scenario that merely states cloud provider outage without specifying which services are affected, during which business hours, and with what customer communication sequence provides insufficient basis for meaningful testing or board assurance.
Tabletop exercises remain the most accessible form of resilience testing, yet many firms execute them poorly. An effective tabletop presents a scenario with evolving injects that force participants to make real decisions under time pressure. The exercise should involve senior decision-makers, not only operational staff, because the value lies in testing communication chains and authorization pathways. Timestamped evidence of decisions made during the exercise constitutes the primary artifact for supervisory review.
Beyond tabletops, firms with more mature programs conduct simulation exercises that involve actual system failovers, switchover to backup providers, or activation of manual processing procedures. These exercises generate quantitative data on recovery times, error rates during degraded operation, and communication latency that tabletops alone cannot produce. The investment required is significantly higher, but the evidence generated is correspondingly more persuasive to supervisors assessing whether a firm's resilience posture is genuine rather than aspirational.

Technology organizations play a central role in translating engineering telemetry into business-relevant resilience signals. Service health dashboards that measure latency, error rates, and throughput at the infrastructure level must be mapped to important business services so that degradation in a database cluster or message queue translates immediately into an assessment of customer impact. Without this mapping, engineers detect problems quickly but executives learn about customer harm only after social media escalation or regulatory inquiry.
Dependency mapping is the connective tissue between technology architecture and resilience governance. A comprehensive map traces each important business service through application components, infrastructure layers, data stores, network paths, and third-party interfaces. Maintaining accuracy requires automated discovery tools supplemented by manual validation, because no single scanning technology captures the full dependency chain. Maps that are only updated annually become unreliable within weeks of their creation as deployments and configuration changes accumulate.
Procurement and legal teams receive a dedicated section in this brief outlining a checklist for material contract amendments. The checklist covers audit access provisions, joint testing participation requirements, subprocessor disclosure obligations, incident notification timelines, and exit cooperation commitments. Each item maps to a specific regulatory expectation from recent supervisory guidance. Implementing these amendments proactively, rather than reactively after an incident, reduces argument time when response speed determines the extent of customer harm.
The brief includes a twelve-month implementation roadmap designed to be deliberately achievable rather than aspirational. Quarter one focuses on data quality remediation and critical service identification. Quarter two delivers dependency mapping and initial impact tolerance calibration. Quarter three introduces the first round of tabletop scenarios and supplier attestation collection. Quarter four conducts a full program review, updates the board reporting framework, and sets the testing calendar for the following year. The roadmap survives leadership changes because it avoids heroic assumptions.
Supervisory examinations consistently probe three themes: evidence of informed board oversight, proof that lessons from prior incidents produced tangible control improvements, and demonstration that third-party risk integrates with the broader enterprise risk management framework rather than operating in isolation. Boards that receive only aggregated risk scores without underlying narrative context struggle to demonstrate informed challenge. The brief provides a template for board reporting that balances brevity with the specificity supervisors expect to see.
Common examination findings include resilience programs that exist on paper but lack operational evidence, impact tolerances set without supporting analysis, third-party registers that omit critical dependencies, and scenario testing that has not involved senior leadership. Each finding represents an area where the gap between documented policy and operational reality creates supervisory concern. The appendix of this brief maps each common finding to a remediation action, enabling firms to conduct an honest self-assessment before external review.
Operational resilience is a continuous discipline, not a compliance project with a completion date. Firms that treat it as an ongoing capability, embedded in technology decision-making, supplier management, and board governance, build genuine durability. Those that treat it as a periodic regulatory exercise will find themselves repeatedly remediating the same gaps. Use this brief as both a reference framework and a self-assessment tool, and revisit it as supervisory expectations continue to evolve.