Heads Up! Phones Down!! No more distracted walking, or driving for that matter!!!
Disaster recovery is a subset of business continuity efforts and basically deals with technological aspects of the BCP. In accordance with the NRB guidelines, during the Disaster Recovery Planning (DRP), the bank should choose suitable data recovery strategies for different business processes to meet the required RPOs and RTOs as specified in the BIAs of those processes.
The bank must put a management approved DRP in place to prepare for the recovery of critical business functions and continuation of technology infrastructure to achieve the same. Such plan should be able to strictly define the resources, action plan, tasks, procedures and data required to manage the technology recovery effort of the bank.
After you completed the BIA, it is a best practice to document a management-approved formal business continuity strategy in respect of people, premises, technology, information, and relationships. This strategy would be the key to guide the course of actions to be used in the development and implementation of the bank’s BCP.
During this process, the BCP Coordinator and the BCP Executive Team (with assistance from technical experts or advisors) should assign proper roles and responsibilities for various other BCP Functional Teams, such as Executive Management Team, Damage Assessment/Salvage Team, IT/Communications Team, Logistics/Transportation Team, Facilities/Security Team, PR/Communication Team, etc. During the disaster recovery process, BCP Functional Teams or Disaster Recovery (DR) Teams have distinct roles to play including but not limited to the following:
Table-2: Roles and Responsibilities of BCP Functional Teams
Depending on the scope and goals of the BCP, banks could form other functional teams, such as Finance/Accounting Team, Human Resources Team to support their disaster recovery needs. These BCP Functional Teams, aka DR Teams, will be responsible for both the continuity as well as the recover aspects of the BCP. They are assigned with specific duties to perform in both pre and post disaster context.
Each team’s critical business information including call list, task list, customer list, immediate action plans, response procedures, critical equipment, software, supplies, vendors, vital records, etc. must be documented electronically, stored in the Cloud as well as in hard copy formats.
Training, Testing and Update
Every bank should ensure that BCM is embedded in its organizational culture; as a result, all relevant personnel and staff are aware of their BCP roles and responsibilities. At the headquarters level, each BCP Functional Team (with the help of BCP Coordinator, BCP Executive Team and technical advisors) will be responsible for developing training and exercise materials for their teams based on the information contained in their BCP including both ERP and DRP.
It is important that the awareness and training activities are followed by frequent drills (including tabletop exercise and departmental or full scale tests) for each BCP Functional Team or DR Team.
The NRB guidelines require that the BCP should be periodically tested (at least annually) to ensure its effectiveness. The testing should include all aspects and constituents of the bank i.e. people, processes and resources including technology infrastructure. BCP testing should be both planned and unplanned and should be audited by internal audit of the bank.
The guidelines further require that the testing and its outcome should be documented and amendments in BCP be made as suggested by the outcome of the test. In addition to regular testing, it is recommended that the team members and managers receive annual refresher training regarding the emergency alert, emergency response, and notification procedures, etc.
The alternate site test procedure sits at the heart of the disaster recovery test. It deals with two major aspects; firstly exercising the system recovery procedures and establishing the communication links and secondly testing the recovery of the participating application software.
During the full scale test, the application owners and respective DR Teams are responsible to successfully run their applications at the alternate site. The full scale test provides opportunities to address areas where the exercise was successful, problems were encountered, and improvements were necessary.
The NRB guidelines suggest that the bank should check transaction and data integrity between Datacenter and Disaster Recovery site periodically. It is recommended to make this check as a part of End of Day (EOD) or Beginning of Day (BOD) process.
BCP Coordinator, in coordination with the DR Teams, should be responsible for the regular update of the DRP, especially following the full scale test. Afterwards, all participants should be notified of the changes as well as encouraged to maintain the hard copies of the same. Since the recovery solutions are primarily based on BIAs, the BCP Coordinator must also update the bank’s BIAs, at least annually.
The overarching objectives of a BCP testing and exercise program are to create a learning environment for all the participants and to document changes. Testing and exercising the DRP would verify that the recovery procedures work as intended and that the supporting documentation is current, accurate and relevant. Eventually, the program would help determine the state of readiness of the bank’s BCP.
Planning for the Pandemic
In the age of COVID-19 pandemic, it is highly pertinent for the commercial banks and financial institutions to recognize the fact that there are a few notable differences between the conventional Business Continuity Planning (BCP) process and planning for the challenges posed by the pandemic.
Unlike natural, man-made and technological disasters, the impact of a pandemic is highly difficult to determine because of the scale and duration of the crisis situation. These differences call for the banks and financial institutions to review their existing BCPs and prepare to take appropriate actions to respond to the COVID-19 crisis which has potential to cause major business disruptions; both internal as well as external and at multiple levels.
In a recently published report (Anticipate, prepare and respond to crisis, 2021) on the world day for safety and health at work, the International Labor Organization (ILO) particularly emphasizes that investing in a sound and resilient Occupational Safety and Health (OSH) system can build capacity to face future emergencies while supporting the survival and business continuity of enterprises.
During the COVID-19 pandemic, it is vital that workplaces adopt adequate policies and develop action plans for the prevention and mitigation of the contagion. These should include emergency response preparedness, as part of their BCP, and be in line with the results of proper risk assessments.
COVID-19 presents an unusual risk scenario where a conventional BCP measure such as relocating staff to an alternate site may not necessarily mitigate the risk. Pandemic events may extend longer than a typical BCP risk scenario so an effective communication strategy is critically important as the pandemic continues to evolve over time.
In the meantime, banks and financial institutions need to ensure the continuity of their critical services, such as providing continued deposit and lending services, cash management, keeping ATMs and online banking functional, managing financial markets, and maintaining the payment and settlement system, etc.
Other key concerns may include health protection of staff, mitigating panic, strengthening morale, providing current and essential information to staff, and resumption of normal business activities once virus containment measures have been eased.
Banks and financial institutions should, therefore, establish a framework for COVID-19 operational risk-management. This framework should be able to put together a COVID-19 Committee, thereby conducting a thorough risk assessment and devising a pandemic response plan. Such plan, eventually a part of the OSH system, would support the bank’s business continuity in its true sense.
Business Impact Analysis (BIA)
BIA is the key element of the BCP planning process, since it provides the foundation upon which the BCP is developed.
Bank’s critical business functions are time-sensitive and must be restored first in the event of a disaster to avoid unacceptable financial and operational losses. BIA helps identify these time-sensitive critical business functions within various departments of the bank. The purpose is to identify the impacts of disruptions that may result in denied access to the critical banking services, buildings and facilities.
The NRB guidelines specifically require that there should be detail procedures for prioritizing critical business functions, incident handling and how the bank will manage and control identified risks.
BIA helps analyze the operational, financial and non-financial impacts on various bank activities (within each of the identified critical business functions), when these business functions are not available or the access to normal workspace is denied.
Furthermore, BIA also helps identify resource requirements, such as competent staff, office equipment, office technology, computer applications, vital records, office stationery, and third-party services etc. to support the technology and business recovery process of the bank.
As per the NRB guidelines, the bank should accurately determine and prioritize such mission-critical business activities along with their recovery strategy, alternate site locations, testing, training, etc.
It would be meaningful if the BIAs were conducted before the risk assessment in order to identify urgent business functions upon which risk assessment could be focused.
BIA is often completed in two major steps targeting first functional recovery (activity recovery) and next computer application recovery on a priority basis. The idea is to determine the bank’s functional recovery priorities, identify interdependent activities and establish appropriate recovery objectives so that Recovery Time Objectives (RTOs) and Recovery Point Objectives (RPOs) can be set for those mission critical business functions as well as activities within them.
RPO and RTO
Recovery Point Objective (RPO) is the point in time at which backup data, such as backup tapes or replication must be restored and synchronized by IT to resume business processing. It is basically the frequency of data backup (e.g. software backup, user data backup, application backup, etc.) or in other words, the measure of data loss (in hours or days) acceptable to your bank.
For example, if you have an RPO of 24hrs, then the data restored from the backup will be 24hrs old and that the business function will manually recover the missing data.
In the best-case scenario RPO is zero which basically means that all affected computer systems utilize mirroring (real-time data/transaction copying) technology to simultaneously copy all incoming data/transactions to another identical system in a remote location.
Determining RPO may also depend on the modification frequency of the data that is being backed up. Data that does not change often can have longer RPOs, such as account information, personal records, employee records, etc. On the other hand, shorter RPOs are advised for frequently updated data, such as credit card data, financial transactions, etc.
Recovery Time Objective (RTO) is the period of time within which IT systems, applications, or business functions of the bank must be recovered or put back in operation after an outage. That means a 24hrs RTO would indicate that the particular business function could operate using temporary manual workarounds for the first 24hrs following a disaster declaration. During this period the business function can continue to function in an emergency mode without access to the IT systems or applications.
Determining RTO may also include a “time of year” or “seasonal” component, such as busy festival times, end of fiscal year, quarterly reporting period, etc.; when a disruptive event can prove to be disastrous.
For example, in the middle of the month or quarter your finance team may go days without accessing the finance application, but during the end of the month or quarter, even few hours without this application can be extremely disruptive.
The NRB guidelines require that the bank’s BCP should specify RPO and RTO of different business processes. The guidelines, however, allow the bank to choose from the Hot, Warm or Cold backup sites to meet the RPO and RTO requirements as specified in the bank’s BIAs.
For disaster recovery backup purposes, the NRB guidelines call for the bank’s own standby site and system or having it outsourced from some disaster recovery providers. Depending on RPO and RTO requirements, bank may opt for high availability system to keep both system and data replicated on remote site or live replication of data to offsite location. The bank may also choose to have full system backup, off-site incremental backups or backups made to electronic media and sent offsite periodically.
As per the requirements and criticality of business functions, it is recommended to go for a combination of above strategies utilizing Hot, Warm and Cold backup sites.
Table-1: Comparison of Hot, Warm and Cold Backup Sites
DC is a physical location which hosts computer systems and network equipment to facilitate and support day to day banking operations. It could be located on the bank premises, co-located outside or on Cloud.
Whatever the arrangements has been done for standby site (or disaster recovery sites; Cold, Hot and Warm), the NRB guidelines dictate that the bank should also adopt disaster mitigating strategies such as locally mirroring data and system, arranging UPS and generator for long term power failure, using surge protector to minimize the effect of power fluctuations and providing adequate physical and environmental controls in the DC.
Moreover, the delivery channels such as ATM, internet banking, mobile banking tend to significantly increase the risk of financial loss and electronic frauds along with other banking risks, such as credit risk, reputation risk, compliance risk, market risk, strategic risk, etc. Therefore, the DC, disaster recovery solution, enterprise network and security and branch or delivery channels should be designed and configured for high availability and no single point of failure, as prescribed by the NRB guidelines.
The guidelines further requires that the location of building containing the DC and critical equipment rooms must be chosen so as to minimize the risk of natural and man-made disaster, flood, fire, explosion, riots, environmental hazards etc. Physical access to DC and critical equipment rooms must be restricted to authorized individuals only.