Business schools use the medium of case studies as a learning instrument. But, the case studies used in several Business schools are mostly linear or unidirectional. They are invariably success stories, the cause of success being a new marketing strategy, an innovative product design, a cost effective manufacturing process or discovery of a virgin market. The students never get to see the big picture but only get compartmentalized wisdom in specific functional areas such as Finance, Marketing or HR. It is difficult to understand how a young student can appreciate specific issues in a company without a total understanding of the company’s other areas. How can a company with excellent HR but shoddy marketing or excellent manufacturing process with unfriendly customer relationship qualify to be called “great”? Failures can invariably be traced to interconnections among elements and not in the elements per se. B-school case studies should include more of failure cases. Culturally, the Western world is not ashamed of documenting classic failure cases and enabling the world at large to learn from the mistakes. The reason for the US accounting for a large number of innovations is that the society encourages failures. In the Orient, failure is met with stigma and embarrassment sometimes leading even to suicides. The risk averse mindset may produce some sort of stability but great opportunities are missed too. Success only reconfirms what we already know. It is only failure that teaches us new lessons.
Fire at Manchester airport
The following incident is described in Learning from Failure by Joyce Fortune and Geoff Peters. It contains a vivid description of a fire aboard an aircraft that had aborted takeoff at Manchester international airport. On 22 August 1985, Flight KT28M operating a Boeing 737 aircraft with 131 passengers and 6 crew members was readying to take
off from Manchester to Corfu. A fire was noticed in one of the engines and the take off was aborted. However 54 people died in the accident. Of these only 6 were burnt to death. The others died through inhaling of smoke and other toxic gases. From an account of what exactly happened, it became clear that the fire by itself had not caused muchdamage. There could have been more survivors had there been proper communication and control among the interdepen-dent agencies and systems at the airport. The failure was traced to the situation wherein the on board crew, the staff at the flight control tower and the ground support staff saw only sections of the problem. That the fire hydrants were taken up for maintenance work was unknown to the other agencies. However, the lessons learnt from that failure have resulted in processes that strengthen formal communication among all stakeholders responsible for airport and aircraft safety.
The Bhopal gas tragedy claimed at least 3000 lives and resulted in permanent disability of more than 250000 people. To this day a systemic study of the accident has not been done. Instead, precious time is wasted in blame games, law suits, extradition treaties and negotiations on compensation amounts. Can we assure that Bhopal will not be repeated? Obviously we cannot.
ISRO, a learning organization
The first mission of Polar Satellite Launch Vehicle of ISRO failed in 1993. Within minutes of launch, the reason for the failure had become clear. The navigation system provides the current “state” of the rocket; the guidance system which knows the final destination, finds out where the current state ought to be; the control system bridges the gap. This is done by firing small thrusters to manipulate the pitch, yaw and roll of the rocket. After the on board computers perform all the calculations, the command for controlling the rocket is sent as a digital byte which contains the amount of turning required as well as the direction (pitch up or down, yaw left or right, roll clockwise or anti clockwise). Digital computers store negative numbers in what is known as a 2’s complement form. This gives rise to an asymmetry in the arrangement of numbers. An 8-bit computer, for example, can store numbers from -128 to +127. This should not come as a surprise because we can distinguish 256 different states using 8 bits and the range should include the number 0 also. While assembling the rocket subsystems someone had mounted the control thrusters in the wrong
planes. Instead of disassembling and assembling them in the right planes again, the engineer asked his software friend to alter the polarity of the control byte sent to the control electronics. This seemed to be an innocuous, innocent and reasonable request and the friend obliged. After all one “negate” assembly language instruction had to be inserted
in the code. At one moment during the flight, the rocket needed the maximum amount of control corresponding to maximum positive value. Since -128 does not have a negative counterpart, the computer treats the “Negate” instruction with the illegal operand (-128) as a “No operation”. It just ignores the instruction. This resulted in the rocket getting maximum control in the direction opposite to the desired one and the mission was lost. What went wrong? The software had been tested under thousands of simulated conditions before being loaded into the onboard computer. To describe briefly, the software that underwent qualification-checks and the software that actually flew aboard the rocket were different (even if the difference lay in the addition of just one instruction).
How did ISRO react to the above failure? There was no witch hunting or finding a scapegoat. It was realized that the concepts such as quality control, quality assurance, configuration management and version control applied as much to software as to hardware. Accordingly Quality Assurances mechanisms were put in place. Processes were evolved so that modifications to flight certified subsystems could be done only under wide visibility, however well intentioned they may be. As a result, 18 successive PSLV flights have flown successfully since the
failure of the first flight.
Stewart Hamilton and Alicia Micklethwart in their book Greed & Corporate failure: lessons from recent disasters give the following as main reasons for corporate meltdowns:
D Inadequate external surveillance - not-so-independent directors on the Board, not-so-serious auditors, ambiguous control mechanisms
D Ineffective internal control - the internal audit and internal quality assurance functions not reporting to the highest levels
D Greed, Hunger for power, Hubris
D Stubborn CEOs who with their bloated egos and devious personal agendas behave like mules
D Mindless organic growth without due application of mind and inorganic growth without due diligence Will India learn from its failures to curb corruption in every possible domain that have come to light in the last one year?
Process for learning from failure – a classic example A classic example of investigating and documenting failure is the
report of Nobel Laureate Richard Feynman who inquired into the cause of the explosion in the space shuttle Challenger that in 1986 killed seven people including a school teacher. Feynman did not follow the conventional analytical method of breaking the shuttle rocket into subsystems and components and verifying the performance of each. At every stage he would have had to eliminate possible causes and the sheer fatigue of doing it would have resulted in an impatient state of mind. At every stage he would have found defensive mechanisms that
would deflect the blame elsewhere. On the contrary, he interviewed several segments of people involved with the design, manufacture, assembly and operations of the space vehicle. He received bits and pieces of information, each informant looking at the issue from one's domain. He spoke to tradesmen on the shop floor, managers in their
cabins, the operations crew at the launch pad and the administrators in their plush offices. He received as many different perceptions of the problem as he could. He then assimilated them and beautifully synthesized these bits and pieces of information into the big picture – and lo and behold, the cause of the accident was staring him at the
Briefly stated, the cause of failure turned out to be a simple oversight – that rubber loses its elastic property under cold temperature conditions. The circumstances leading to the oversight revealed that there was lack of proper understanding between core technicians and technical managers. It revealed gaps in the governance structure of NASA and the subcontractors. The workers knew that the accident was bound to happen, because the “O-ring” made of rubber would not function properly if the temperature falls below a stated limit. The temperature at the time of launch was indeed below the limit at which the component was certified to operate. However their voices were overridden by a managerial decision that the launch could not be postponed. Postponement of a launch for a variety of reasons (unacceptable weather conditions being one of them) is not an unusual phenomenon.
Feynman could do the exercise in the way he sought to do because in a holistic pursuit there are no blame games whereas the linear thinking model has a built-in corollary of finding a scapegoat on whom blame
can be thrust.