AdSlot 1 (Leaderboard)

A self-inflicted tech debacle

THERE IS A VERY SIMPLE safety rule in sailing: one hand for the boat and one hand for you. In any mission-critical arena, you have rules and procedures, check lists and cross-checks.

No matter how many times you have done something, you follow the agreed procedure, check and cross-check, and never wing it. The same goes for software and hardware upgrades and changes. The checks are in there for a very simple reason: to reduce the likelihood of human and machine error, because if things go wrong, they go badly wrong, and they always cost.

So I wouldn’t be surprised if there is a new line in the industry manual: ‘Doing an RBS’. You’d have to be living on a desert island not to have heard about the recent fiasco in which millions of RBS customers were left without money for days. This was a self-inflicted injury of Richter Scale 8. If they had lost all of their customer data – and they got close – it would have been a 9, or mass extinction event.

But it is almost impossible to find out what went wrong, apart from the fact that someone goofed badly. What has been gleaned from employees that left the company includes:

• Support function and software production has long been outsourced to India;

• Software upgrades and changes are now routinely conducted and managed out of India;

• In-house technical skills and ability have been denuded by continuous cost-cutting and staff-trimming, with many UK redundancies;

• Batch scheduling and processing is used without human oversight;

• Several variants of the batch processing software are run in parallel;

• The changeover was most likely remotely managed from India where a lot of the work is dealt with by second-order sub-contractors.

All RBS has confirmed is that the problem was confined to a new version of the batch processing software. All other reports are conjecture, rumour and the opinions of ex-employees and reporters outside the company.

What is suspected: An upgrade was sanctioned and went ahead without all the necessary checks and balances and/or some significant deviation from agreed procedures occurred. But when you look at the best industry practices, the golden rules are very simple:

1) Have three back-up copies of all data in different locations on different systems giving immediate, fast and slow recovery abilities;

2) Maintain copies of all variants of the operating system and applications and apply strong version tracking and control;

3) Record and save all upgrade packages, again with strong version tracking and control;

4) Run three parallel systems: one online, one off (in hot mode) and one ‘cold’ reserve;

5) When loading anything new, test it and never load anything untested and uncertified;

6) Before the installation day, load it onto the hot standby and do as much testing as possible;

7) When satisfied that all is well, bring the hot standby into front-line service and demote the old online system into standby cold mode;

8) Promote the cold system to hot status;

9) When the system is proved stable, upgrade the software of the standby systems.

And all of this has to be managed by well-trained and experienced people.

The RBS failure is in the list above, but it will be many years before the truth leaks out, if it ever does. But there is one thing I can guarantee: every bank, finance house and company will be paying attention and checking procedures. No one needs the damage of doing an RBS.

However, I can almost guarantee that cost and people cuts have had a part to play – and these would have been conducted by people knowing nothing about the risks of attempting a take-off without ticking all the boxes, following a procedure, checking and checking again. ?

Related reading