ACM Committee on Computers and Public Policy, Peter G. Neumann, moderator
SDI and John McCarthy
SDI and Safeguard
SDI and Robert Jastrow
Some financial disaster cases from Software Engineering Notes
>Date: 12 Sep 85 0057 PDT >From: John McCarthy![]()
John Mashey <mips!mash@glacier > Thu, 12 Sep 85 22:56:02 pdt
SDI and Safeguard
I used to work with many of the people at Bell Labs who worked on the Safeguard ABM; they were competent people who knew how to build complex systems. Maybe there were some who believed that it was actually possible to build a reliable, deployable, maintainable ABM that one could expect to work in real use; if so, I never met any; most folks did not so believe, and said so. [They did believe that you could shoot down missiles in well-controlled tests, because they'd done it; they just didn't believe it would work when it needed to.]![]()
Herb Lin <LIN@MIT-MC.ARPA> Thu, 12 Sep 85 20:08:22 EDT
SDI and Robert Jastrow
To: JMC@SU-AI.ARPA cc: LIN@MIT-MC.ARPA, risks@SRI-CSL.ARPA From: John McCarthy![]()
Peter G. Neumann
Some financial disaster cases from Software Engineering Notes
Fri 13 Sep 85 00:22:19-PDT I hope that the RISKS Forum will not degenerate into only an SDI Forum, so I thought I would counterbalance this issue with a new topic. I have resurrected a contribution from the July 1985 SIGSOFT SEN, and also preview some newer cases that will appear in the October 1985 SEN (which is just about ready to go to press). (The few of you who are ACM SIGSOFT members please pardon me for the duplications.) [FROM ACM Software Engineering Notes vol 10 no 3, July 1985]
Disasters Anonymous 1: A Rose is Arose is (Three) Z-Rose
Now and then I get a story that I cannot print. (I do have a few, but don't ask. I have of course conveniently forgotten them all.) Here, is one that can be printed -- although its author must remain anonymous. Note that the case of the three extra zeroes resulting from two different assumptions about the human interface bears an eerie resemblance in cause to the case of the shuttle laser experiment, which follows after this one. [PGN]
A group within my company had a policy of dealing only in multiples of one thousand dollars, so they left off the last three digits in correspondence to the wire transfer area to make their job easier. Other groups, however, had to write out the full amount since they did not always deal with such nice round numbers. One day, a transaction was processed that had a value of $500,000. The person who entered the transaction thought that it was from the group who dealt in multiples of $1000 and entered it as $500,000,000. Of course, this was not the case, so a $500,000 transaction became a $500,000,000 one.[FROM ACM Software Engineering Notes vol 10 no 5, October 1985]The only thing that prevented a disaster was that it was sent to a small company that called back to verify the amount, and the error was then caught. However, this was a Federal Reserve transaction and the funds had been transferred, but the timing was good and the transaction was backed out before it became a disaster. My opinion is that such critical software should have caught the error before the wire was sent to the Federal Reserve.
Another error in a Federal Reserve transfer had to do with multiple transactions per communications transfer. In this case, the Federal Reserve software put a pair of nulls in the data that should have been translated as blanks. However, they were stripped out and a $200,000,000 incoming wire lost. To maintain the Fed balance, money was purchased to cover a deficit that didn't exist -- since the money was a credit. This was a substantial monetary loss because of inadequately tested software.
Disasters Anonymous 2: Financial Losses
Our anonymous contributor from SEN 10 3 (July 1985) has come through again.
Since I sent some disaster reports to you in May, another one has occurred. This one caused some financial loss and acute headaches among managers.[FROM ACM Software Engineering Notes vol 10 no 5, October 1985]Most large banks subscribe to the Federal Reserve's funds transfer system, frequently referred to as "Bankwire". Our system that connects to Fedwire was being upgraded with a new DDA interface to the host to help protect against overdrafts. During a review, it was determined that the software was not quite ready, but should be okay to put into production two days later. I cautioned them against doing so since not all of the bugs had been resolved, and the software had not been "stress tested" (or whatever phrase you wish to use about testing that ensures that it will work in production).
The first day of production went fine. However, the main file in the new software was an ISAM file that had degraded significantly during the first day. On the second day, that file continued to fragment and started to consume a large amount of the system resources. This slowed response time so much that by the end of the banking day, we still had hundreds of wires to send to the Federal Reserve. We had to request extensions every half hour for hours to try and squeeze the transactions through the system so that the money would get to our customers.
In addition, the response-time problem and other bugs in the software prevented us from knowing our Federal Reserve balance. Since we must maintain some 150 million dollars in our Fed "checking account", this lack of information could cause significant financial loss as 1.5 billion dolars were posted that day and we were off by hundreds of millions of dollars at first.
Another part of this disaster is that the slow response time caused one program to assume that the host was down. When a transaction finally went through, our system would transmit the DDA information, but the host did not acknowledge that they already had the wire. Thus a large number of wires were being "double posted" (money sent twice). At the end of the day, tens of millions had been double posted.
As of this writing, the Fed balance had been straightened out, but not all of the double postings had been recovered. Note that at current interest rates, a bank loses $350 per day per million dollars of unused money.
Disasters Anonymous 3: Insurance, Reinsurance, and Rereinsurance
Perhaps anonymity is contagious. Re: reinsurance, here is another letter from a different contributor.
I'm newly receiving SEN and found the ``war stories'' quite interesting. Here are three more. I would prefer anonymity should you choose to print these.This first is hearsay (from a former co-worker). Apparently he and his wife had a joint account with a $300 balance. They needed $200 in cash, but due to miscommunication they both made $200 withdrawals - she at a teller's window (cage?) and he at an ATM (automatic teller machine) - within minutes of each other. When the dust settled they found that their account had a zero balance: the first $200 withdrawal left a $100 balance, the second should have left a negative balance of $100, but the computer generated a $100 credit to offset the shortfall. The icing on the cake was my friend's inability to explain/convince the bank of this situation and have them accept restitution.
I need to be circumspect about this second story -- it might well have involved fraud. While a consultant, I was hired to review a reinsurance agreement. The reinsurance industry is an old-boys, ``handshake is my bond'' industry as insurors frequently offset their risk by selling it (reinsuring) to other insurors. That is, I insure your building for $10,000,000 and re-sell all or part of that risk to another firm. Apparently, late one Monday morning (nearly 11:00 a.m. EST), my client got notice across his computer network from another firm that it was reinsuring (i.e. off-loading risk) to my client to the tune of several million dollars. The message was time-dated Friday evening (6:00 P.M., WST). As ``luck'' would have it the property in question had suffered a catastrophic loss over the weekend. The bottom line was that the message had been sent directly (not through any of the store-and-forward services) and the time-date was thus determined by the clock-calendar on the sender's computer. Need I say more?
Finally, a story told to me ``out of school'' by a friend at one of the nation's largest insurance companies. They apparently are involved in so many reinsurance deals that it turned out that they were reinsuring themselves. I.e., Jones reinsured with Smith who reinsured with Brown who reinsured with White who reinsured with Smith. Smith, it turned out was paying both Brown and White commissions for accepting his own risk. The computer system was not designed to look beyond the current customer, neglecting the loop.
Report problems with the web pages to the maintainer