Understanding Human "Error": Fault Tolerance, Practical Drift And Traps

.

Understanding Human "Error": Fault Tolerance, Practical Drift And Traps

Marc Green

An error is an out of tolerance action, where the limits of tolerable performance are defined by the system. Swain and Guttman (1983)

Imagine for a moment that authorities designed an interstate highway with lanes that were seven feet wide. Since cars are about six feet wide, this would leave only 1/2 foot of lane on either side for a car traveling straight down the middle. Theoretically, a driver could stay in his lane if he steered almost perfectly, moving no more than a 1/2 foot to either side. If he varied even one foot from perfect center, for example, then he would be entering a neighboring lane, which would presumably be an "error". If he collided with another car in the neighboring lane, then authorities would say that human error caused the accident.

Now, imagine the same driver on a road with the more typical twelve-foot width. If he varied lane position by one foot from the center, he would not enter a neighboring lane and would not have made an "error." The identical behavior which would be called an error with the seven-foot lane would not be called an error with a twelve-foot lane.

This is the point of the Swain and Guttman quote above.¹ Human "error" is a joint function of human behavior and of system tolerances. The implication is clear: in accident analysis, it is important to consider both 1) range of possible, "normal" and likely behaviors and 2) the range of system tolerance. If a system is designed with proper tolerances, humans make few "errors." A system designed with tolerances below foreseeable, normal behavior variability is defective and dangerous—it produces "normal errors," behavior within normal human variability but outside the system fault tolerance. The presence of frequent error is proof that the system tolerance, and not the user, has been inadequate.

The Swain and Guttman error definition also has strong implications for fixing blame. Authorities who go by the letter of the law and lawyers wishing to sue the miscreant would say that the driver on the seven-foot wide road made an "error," which presumably means that he is at fault and should be punished for colliding with a vehicle in the next lane. In reality, the road authority who authorized the seven-foot wide road caused the accident, not the driver. The system tolerance was too narrow relative to the foreseeable, normal variability of human steering behavior. It might be very convenient for road authorities to assume that drivers will steer perfectly down the lane center, allowing the design of narrower, cheaper highways. This would be a deficient design or, to use the more technical term, a "latent condition" or "latent error." Latent means that the effect is not immediately felt but is an accident waiting to happen. The first 10, 100, 1000, etc drivers may escape having an accident, but that is due to luck as much as virtue. The creation of cheaper, low tolerance systems is a common form of wishful thinking. The designer/authority/manager assumes an impossible level of perfect, or near perfect, ideal human performance to save money or effort. For example, it is generally cheaper to put up a warning sign and make unrealistic assumptions about compliance than to design a hazard out of a product or environment².

Fault Tolerance And Impossible Behavior

The notion of impossible level of performance is more complex than might seem at first glance because there are two distinct meanings:

1. Behavior that is literally impossible, i.e., a driver would have to respond in .1 second, faster than is humanly possible or see an object that is below human contrast detection threshold, etc. This is relatively straightforward and readily understood by authorities, courts and laymen. However, I can say from experience that such psychophysical (basic sensory) and motor limitations cause only a minority of accidents. For example, accidents commonly occur because a viewer fails to "see" some object that is theoretically visible .

2. Behavior that is theoretically achievable by an individual on any given occasion but is unlikely to be exhibited by every normal individual on every occasion. For example, a driver could sometimes steer straight down the seven-foot wide lane without crossing into another lane. It is possible human behavior. However, no one is going to do it 100% of the time, although some in the normal population may be able to do it more often than others. Steering, like all other human behavior is inherently variable both across individuals and within the same individual at different times. Ask 100 drivers to travel down the seven-foot lane and stay in the lane for 100 yards, and some will accomplish the task while others won't. Rerun the test, and again some will stay in their lane and others won't. But the successful set of drivers on the second test will likely differ from those on the first. No one driver steers straight down the middle of the lane (even if he had the requisite ability) just as no driver spends much time on the interstate staring forward exactly 12-14 seconds ahead (as truck driver manuals recommend) with eyes glued to the road for stopped vehicles. No one can maintain such a level of performance for any length of time and such rules are futile. Variability depends on many factors, such as innate sensorimotor abilities, competing goals, concurrent tasks, environmental conditions, fatigue, mental stress as well as distraction and circadian variations.

Another reason that most people don't continuously drive straight down the road or stare 12-14 seconds ahead is simply because normal driving does not require this behavior. In a real sense, the system teaches the user the acceptable range of behaviors, i.e., it teaches them the system fault tolerance - how much deviation from perfection is appropriate. Perfect behavior is expensive in time and mental effort. Adaptation is perhaps the fundamental aspect of human nature and our core skill. In most cases, system tolerance is not so clearly defined as the lines on the road. Instead, people learn the normal system fault tolerance and tune their actions to operate efficiently within it.

Since the efficiency gain is desirable, moreover, authorities are unlikely to punish such violations. On the contrary, they are likely to simply pretend that they are not happening. If you doubt this, consider the widely known fact that the best way to bring an organization to a halt is for everyone to follow the rules to the letter, i.e., "work to rule." When an accident occurs, however, then the authority hypocritically blames the miscreant for not following rules that have never been enforced. Authorities are serious about their rules only to the extent that they actually enforce them. Everything else is lip service.

Lastly, the system tolerance may tacitly represent the designing authority's view on acceptable error rate. When authorities set a narrow tolerance, they are essential saying that a high "error" rate is acceptable while a wide tolerance indicates that only a low "error" rate is acceptable. Since wider tolerance usually costs more, there is a tradeoff between expense and acceptable error rate. In most cases, the tolerance width is based on an optimism bias. In some, however, it is a calculated (literally) decision made by actuaries and lawyers.

Practical Drift & Satisficing

Two mechanisms cause the efficiency gain, practical drift and satisficing. Rules are for beginners, who quickly evolve beyond them so that they may work more efficiently (Dreyfus and Dreyfus, 1986). The problem is that rules are usually clumsy, narrow, theoretical, confining and limit efficiency. They are general and fail to recognize that situations vary.

The behavior of experienced people evolves away from rules and toward efficient performance as they adapt to the fault tolerance, a phenomenon termed "practical drift" (Snook, 2011). Occasional unintentional or intentional (perhaps by observing the more experienced) departures from the rule produce efficiency gains without any obvious negative consequences. This leads to further departures and further efficiency gains. As any operant psychologist (like me) would say, the environment has shaped behavior by reinforcing the rule violations. Since violations are common and accidents rare means that violations seldom result in accident. This process follows directly from basic operant conditioning where contingencies, the connection between behavior and outcome, "shape" behavior by constantly saying that there is no risk. Perhaps the fundamental truth of all psychology is Thorndike's "Law of Effect": behavior followed by favorable consequences is more likely to be repeated. The result is a "contingency trap" (Fuller, 1991), where the user is led to believe that the behavior will be safe in the future because it has always been safe in the past. Such traps are highly recognizable as a variety of "error trap" (Reason, 2004), a recurrent situation that predictably snares a large number of different people. When such an error pattern is revealed, the locus of causation is not the individual but the condition. The rat is always right.

Behavior is readily reinforced when the task is easier and faster to perform. The key word "efficiency" needs to be emphasized in any discussion of human performance. Humans are by nature neither satisficers nor optimizers (Simon, 1972). Optimizing is the seeking of the perfect solution. However, perfection, even if possible, usually requires a high level of effort. Instead, humans more typically seek a problem solution that is a good enough tradeoff between effort and outcome. These solutions are often mediated by "heuristics," rules-of-thumb that almost always give good results (e.g., Green, et al., 2008) but which are not perfect. Even a moment of introspection should confirm the correctness of Simon's observation about our satisficing human nature. Perfection is expensive in terms of money and effort and is seldom justified by the return on investment. In many cases, it is also simply not possible. Moreover, humans typically act in a "bounded rationality", which means that they have access to some but not all the relevant information needed for perfect performance.

In sum, human nature seeks to reach a satisfactory level of performance with minimal effort. It achieves this performance level by adapting to the system fault tolerance, which specifies the tradeoff between achieving a goal and effort. As with other goals, humans reach a satisficing level of safe performance and do not attempt to optimize it. If you doubt this, consider the last time you walked 300 feet to the nearest crosswalk just to cross the street as safely as possible. Lastly, the goal of safety is almost never primary (Hale & Glendon, 1987), despite all the rhetoric about putting safety first. Humans will attempt to satisfice other goals first and trade them off against safety if necessary.

Attention

Perhaps the best example of learned tolerance lies in attention, which is a popular poster child for human fallibility and human error because it, or more exactly lack of it, is the presumed cause of so many accidents. Humans can always theoretically have paid attention to the critical information (if it were visible) and can always be accused of inattention if they fail to notice it.

This view ignores the reality that humans are mentally incapable of paying attention to everything and that no one pays close attention to anything 100% of the time, especially in routine tasks (Land, 2006). It is no more possible for a human to "see" everything than to lift a 10,000-pound weight. Experiences teach viewers where to allocate their limited resources. Many recent research studies (e.g., Yanko & Spalek, 2013), for example, have shown that drivers traveling familiar routes cease noticing benign road objects. High levels of alertness and vigilance are very stressful and exert a high cost (e.g., Warm, Parasuraman, & Matthews, 2008) which people seek to reduce over the long run. Given the tendency to satisfice, people either relax their attention and/or shift to a simpler and easier information source, a phenomenon called cue generalization. For example, they will shift attention from reading, which is relatively effortful and attention-demanding, to shape or location, which are readily and easily perceptible. Of course, people can concentrate attention for short periods, as when driving in a snowstorm or heavy traffic, but this only occurs because they recognize that the fault tolerance has drastically shrunk. They cannot maintain the high level of vigilance indefinitely, however. If drivers had to travel with this level of concentration all of the time, the road transportation system could not work and would require major redesign.

Accidents also frequently occur because a person unexpectedly encounters a situation where the learned fault tolerance decreases. In some cases, the change occurs so rapidly that the person cannot respond sufficiently fast (a pedestrian steps into the road in front of his car). In others, the person does not recognize that the situation has departed from the norm (a car on the freeway ahead is stopped). There is an unfortunate catch-22 here: the person has learned that close attention is unnecessary so that his chances of noticing changes from the normal situation are reduced. The human failure to see unlikely events is normal (e.g., Wolfe, Horowitz, & Kenner, 2005). TSA tests of baggage screeners at airports, for example, found that they failed to see 26% of smuggled guns. Radiologists who scan x-rays looking for tumors miss an astounding 30%. Moreover, tests show that 90% of the missed tumors were readily seen by radiologists after they knew where to look (Berlin, 1976).

The most astonishing errors often occur in the most familiar and apparently safe situations and to the most highly skilled people. These are exactly the people who are the most efficient because they have a highly tuned sense of the fault tolerances and have learned what to ignore. "The art of being wise is knowing what to overlook" (James, 1890).

The Dull End Of The System Creates The Fault Tolerance But Avoids Blame

In the scientific study of error, it is common to distinguish between the "sharp and the dull" ends of the system (Reason, 1990). The sharp end is the human who ultimately interacts with the hazard. The "dull" end has designed the system and determined the hazards and tolerances.

Since the sharp end's actions immediately precede the negative outcome, he is most often blamed. Our innate sense of causality biases us to fix blame based on temporal sequence (Michotte, 1946/1963): if A immediately precedes B, then A caused B. The people in the dull end, the authorities, designers and managers, are more remote in space and time to the negative outcome and seem less directly connected. Their role as causal agents is not as intuitive. They are often shadowy figures who are not sitting in the courtroom and whose identities may not even be known. They readily escape blame despite having had much more control over the outcome. Unlike the driver who may have a few seconds to make a life and death decision, for example, designers and authorities have much more time and knowledge to develop and test possible solutions.

Moreover, people tend to assign causality to the mutable, i.e., changeable, part of the event. If a person falls of the roof, for example, few would attribute the cause to gravity, which is immutable. Human behavior is highly mutable, and it can (and often is) argued that the person could have prevented the accident by acting otherwise, so he must have been at fault. The result is counterfactual thinking.

Counterfactuals are popular for those wishing to blame the sharp end because they underscore the mutability of behavior. They posit an imaginary world in which the sharp end could have acted otherwise. They also foster a 20-20 hindsight bias, making the correct action seem obvious after-the-fact. However, counterfactual thinking is highly misleading and essentially a blind alley. A driver might have prevented the accident by traveling slower or looking in a different direction, but he could have also prevented the accident by staying in bed that day or going to France instead. In fact, there are an infinite number ways he could have prevented the accident. There has probably never been an accident that could not have been avoided if someone would have acted differently.

The correct question is whether the action he performed was reasonable based on "bounded rationality", the information available before the event, which includes his past experience with the system. Saying what a person failed to do generally implies little about the reasonableness of the actions that he did perform (e.g. Dekker, 2002). Counterfactuals are little more than attempts at 20-20 hindsight, oversimplifying the situations and the complexity of the choices to be made.

Conclusion

Of course, sometimes drivers and product users do act outside the scope of normal behavior and are a partial or even complete cause of an accident. This point requires no elaboration because humans come designed with the tendency to blame people rather than circumstances for causation (the fundamental attribution error, Ross, 1977). That is, they generally start with the assumption that the worker, driver, etc. is at fault.

Thinking in terms of fault tolerance can reduce fundamental attribution error for several reasons. First, it emphasizes the interdependency between system design and "error-free" human behavior. Second, it acknowledges the reality that human behavior is inherently variable, so there is no point in expecting or requiring some Platonian notion of perfect behavior 100% of the time. If the system is not designed with sufficient tolerance to accommodate normal user behavior and variability, then "normal" users will make "errors". However, the errors are caused the dull end and not by the sharp end. Third, it recognizes fundamental human nature—humans are highly adaptive and satisfice by learning to find a balance between effort and efficiency based on feedback from their environment.

This article has applied some fundamental human error concepts to explain that accidents should be examining the system fault tolerance as well as the human behavior. In most cases, however, the spotlight falls on the human because he is the sharp end of the system and because his behavior is always seen as mutable. It is often less obvious that the system itself is mutable, i.e., that the designers could also have done otherwise. Moreover, blame is too often assigned to the human because of cognitive factors such as hindsight bias, which is amplified by counterfactual scenarios, and by fundamental attribution error. This leads to the wrong conclusions and ultimately more accidents because the real cause of the mishap is not remedied.

References

Berlin, L. 1996. Malpractice issues in radiology. Perceptual errors. AJR. American Journal Of Roentgenology, 167(3), 587-590.

Dekker, S. 2002. The Field Guide to Human Error Investigations. Burlington Vt.: Ashgate Publishing.

Fuller, R. (1991). Behavior analysis and unsafe driving: Warning learning trap ahead!. Journal of applied behavior analysis, 24(1), 73.

Hale, A. & Glendon, I. (1987). Individual Behaviour in the Face of Danger. http://www.hastam.co.uk/personnel/publications/hale_and_glendon.html.

Land, M. 2006. Eye movements and the control of actions in everyday life. Progress in Retinal Eye Research 25, 296-324.

Langham M, Hole, G., Edwards, J. and C. O'Neil. 2002. An analysis of 'looked but failed to see' accidents involving parked police vehicles. Ergonomics 45, 167-85.

Michotte, A. (1946/1963). La Perception de la Causalite. Louvain: Institut Superieur de Philosophie, 1946. [English translation by T. Miles & E. Miles, The perception of causality, Basic Books, 1963.]

Reason, J. (1990). Human error. New York: Cambridge University Press.

Reason, J. (2004). Beyond the organisational accident: the need for error wisdom on the frontline. Quality and Safety in Health Care, 13 (Suppl II), ii28-ii33.

Ross, L., (1977). The intuitive psychologist and his shortcomings: Distortions in the attribution process. In L. Berkowitz (Ed.), Advances in Experimental Social Psychology Vol 10 (New York: Academic Press), 173-220.

Simon, H. A. (1972). Theories of bounded rationality. Decision And Organization, 1(1), 161-176.

Snook, S. A. (2011). Friendly fire: The accidental shootdown of US Black Hawks over northern Iraq. Princeton University press.

Swain, A.D. and Guttman, H.E. (1983). Handbook of Human Reliability Analysis With Reference to Nuclear Power Plant Applications, Washington DC: U.S. Nuclear Regulatory Commission, 2-7.

Warm, J., Parasuraman, R., & Matthews, G. (2008). Vigilance requires hard mental work and is stressful. Human Factors, 50, 433-41.

Wolfe J., Horowitz T., M. & Kenner N. (2005). Rare items often missed in visual searches. Nature , 435, 439-440.

Yanko, M. R., & Spalek, T. M. (2013). Route familiarity breeds inattention: A driving simulator study. Accident Analysis & Prevention.

Endnotes

¹The definition of error may seem simple enough, but it has proved to be a difficult problem for those who study the topic. For example, is it an error if the bad outcome is intended or only if it is unintentional? Is it an error if there is no bad outcome because it is caught in time? etc., etc. The Swain and Guttman definition cuts through all of this and provides a precise definition that can be operationally defined in most situations.

²"Graceful degradation" is an important design issue that I haven't mentioned here. When a system fails, it should not fail catastrophically but only degrade incrementally. In terms of "human error," this means that the negative consequences should be proportional to the degree of deviance from the system tolerance. A small "error" should not produce a catastrophic outcome.

A division of 2057949 Ontario, Inc.
Copyright © 2024 Marc Green, Ph. D.
Home Page
Contact Us