61 pages • 2 hours read
Daniel Kahneman, Olivier Sibony, Cass R. SunsteinA modern alternative to SparkNotes and CliffsNotes, SuperSummary offers high-quality Study Guides with detailed chapter summaries and analysis of major themes, characters, and more.
Summary
Chapter Summaries & Analyses
Key Figures
Themes
Index of Terms
Important Quotes
Essay Topics
Tools
The authors acknowledge that selecting “the best possible human judges” is a good way to improve noise, as “the most highly skilled will be both less noisy and less biased” (257). The characteristics of good judges are the subject of Chapter 18. In this section, the authors also review other methods of reducing noise, including decision hygiene, sequencing relevant information, and structuring complex judgments.
According to the authors, good judgment depends on “what you know, how well you think, and how you think” (262). The authors argue that some people who are highly accredited by their professional peers become “respect-experts” (264). However, even professional doctrines which state how practitioners should act leave room for interpretation, and this is where noise occurs.
Intelligence, measured by GMA (general mental ability) tests that examine verbal, special, and quantitative capacity, is a key predictor of good decision making. However, as individuals cannot submit everyone they encounter to a GMA test, they must guess who the higher GMA people are. They do not necessarily seem like the most confident and persuasive speakers.
People with better judgment are likely to score higher on a CRT (cognitive reflection test), deploying reflective thought processes as opposed to impulsive thought processes. However, “actively open-minded thinking” is also an important quality in judgment-making, as it means searching for information that contradicts pre-existing beliefs (271). Such people believe that being convinced by an opponent’s argument is a sign of good character, while strongly disagreeing with the idea that intuition is the best means for making decisions. They understand that no one’s judgment, is ever perfect and that it must always be a work-in-progress.
The authors emphasize that the personality type of excellent judges does not always correlate with that of decisive leaders. Rather than picking the commonly favored bold, eloquent leader who inspires confidence to make a judgment, it is better to pick the person who is open to counterarguments and does not reach conclusions prematurely.
Before testing out a noise-reduction strategy, this chapter examines de-biasing techniques that are commonly used by companies. Ex post or corrective debiasing is intuitively implemented. For example, a boss might add a buffer zone of an extra month to an over-ambitious team’s three-month project deadline. Ex ante or preventative debiasing can take the form of nudges, which alter various conditions to discourage or promote certain biases. For example, this could mean automatically enrolling employees in pension plans, or the placement of healthy foods in accessible places. The second type of ex ante biasing trains decision-makers to recognize and overcome their biases. This is a difficult process, which involves recognizing that “a new problem is similar to one we have seen elsewhere and that a bias that we have seen in one place is likely to materialize in other places” (277).
Debiasing has its limitations because the exact specificity of biases can be hard to identify. In complex situations, multiple psychological biases may occur simultaneously, potentially offsetting each other and causing chaos. However, ex post or ex ante debiasing can work, especially when the trend of error-making is clear.
The authors advise that companies position decision observers in situations where important decisions are being made. As it is easier to identify biases in others than in oneself, the authors advocate that “observers can be trained to spot, in real time, the diagnostic signs that one or several familiar biases are affecting someone else’s decisions or recommendations” (280). The decision observer should be present when a group is making an important decision, and they should use a checklist to diagnose biases that are likely to affect the outcome. The decision observer should monitor proposals and processes used by the deciding team, in addition to their dynamic.
Another noise reduction strategy is deploying decision hygiene. As noise errors are difficult to detect, it is best to prevent them before they happen. The authors use the analogy of handwashing where “you may not know precisely which germ you are avoiding – you just know that handwashing is good prevention for a variety of germs” to explain how decision-making helps reduce noise without knowing which errors you are avoiding (283).
Although forensic fingerprinting is seen as an objectively scientific discipline in the public imagination, it is subject to the psychological biases of examiners. To begin with, the process of matching fingerprints is not as precise as it seems. This is because latent fingerprints left on random surfaces are not as easy to read as those impressed on a machine expressly made for reading them. Deciding whether latent prints match those of a particular suspect is thus the job of highly specialized human fingerprint examiners who follow the ACEV process, meaning “analysis, comparison, evaluation, and verification” (288). If an identification takes place, the prints are verified by a second examiner.
Itiel Dror, a cognitive neuroscience researcher, embarked on a noise audit in the discipline of fingerprint analysis. Dror did not agree that fingerprint identification was an exact science, instead maintaining that it was “a matter of judgment […] and wherever there is judgment, there must be noise” (289). Dror began by looking at occasion noise and examining the variability that occurs when the same experts look at the same prints twice. This is a more effective experiment than having judges consider the same case twice, as prints are not easily memorable. When Dror added information that would weight the case towards a particular verdict, he found that the fingerprint examiners were susceptible to bias, with four out of five of them changing their previous identification in concert with the new information. Dror coined the term “forensic confirmation bias” to describe how context influences decisions about whether to make an identification or not (291).
A 2012 FBI-commissioned study confirmed that forensic experts are prone to occasion noise, and about one decision in ten was different when the experts were asked to consider the same prints twice. Horrifically, this led to some convictions when the data was inconclusive. Several more investigations have shown that noise and error are present in fingerprint identification. However, reassuringly, a consistent finding across all studies was that the examiners, aware of the high cost of a wrong conviction, erred on the side of caution. Thus, they are more likely to say that the prints are inconclusive than to make false positive identifications.
Still, the problem of noise and the erroneous judgments it enables will continue as long as fingerprint forensics deny the role that noise and confirmation bias can play in their field. However, Dror’s investigation has caused an increasing number of forensic labs to take error-reducing measures. One of these measures involves sequencing information, which means deciding which information about a case is relevant and which is irrelevant and potentially distracting. Thus, examiners are given only the case information they need at each step of an identification process. Additionally, when a second examination needs to be made, decision hygiene is applied, as the first examiner is unaware of the second one’s decision.
Many judgments involve forecasting future events, such as quantifying the effects of climate change by 2050. Forecasting analysts make a clear distinction between bias and noise, which they also label “inconsistency or unreliability” (301).
Forecasters are often biased in favor of making their companies look good. For example, they may make overly optimistic predictions of growth. However, noise is often present, as forecasters disagree with one another.
One strategy of noise-reduction in forecasting is to select better judges and to average their estimates of a situation. The more independently-made judgments are averaged, the more noise is reduced. Pioneering work in forecasting quality began with the 2011 Good Judgment Project started by Philip Tetlock, Barbara Mellers, and Don Moore. The group wanted to learn what makes some people especially good forecasters and whether forecasting skills could be taught or improved. The Good Judgment Project’s steering group allowed participants to adjust their forecasts whenever new information was received. The Project also used meteorologist Glenn W. Brier’s system of measuring the difference between predictions and what actually happened to encourage people to take decisive stances and not hedge their bets. For example, when forecasting whether it will rain tomorrow, participants should feel confident about advising people whether they should carry an umbrella. Only two percent of the Good Judgment Project’s volunteers were superforecasters, meaning their predictions were “much better than chance” (308). The superforecasters’ shared traits included intelligence, meaning they had high GMA scores and a tendency to break down a problem into its component parts rather than act on a gut feeling. When tackling the question of whether the United Kingdom would leave the European Union, for example, they would ask “What would it take for the answer to be yes? What would it take for the answer to be no?” (309). Superforecasters are also are comfortable with taking the outside view, ignoring irrelevant information and seeking statistics about similar events. Importantly, superforecasters are more distinguished by how they apply their intelligence than their level of intelligence. The best way to become a superforecaster is to keep updating your knowledge base and synthesizing other perspectives, even if they are different from one’s own.
The Good Judgment Project used three strategies to help improve forecasting: training in probabilistic reasoning, working with others and debating predictions, and selection, whereby only the top two percent of forecasters were hired. The Project also found that some people were especially gifted in seeking and analyzing relevant data and less prone to random errors caused by noise—for example, the best superforecasters did not overreact to a seemingly relevant piece of news. On the other hand, some people tended to always over- or underestimate the probability of change and therefore exhibited a bias towards change or stability. Interestingly, the Group also found that they could train people to reduce their psychological bias by reducing noise. Of all three parts of training, selection helped the most, as it promoted “a better use of information” (313).
Still, the value of selection should not cause people to ignore the benefits that teaming, or pooling diverse perspectives, can bring. Thus, when composing a judging team, one’s first choice should be the best judge, but one’s second choice should always be someone complementary, rather than an individual whose skills match those of the first judge. While pattern noise will be higher in a diverse group, paradoxically the average of its judgments will be better than those of a noiseless homogenous group.
Medical diagnosis requires a form of judgment that is not as objective as people might think. The variance between doctors’ assessments of the same symptoms can be so broad that getting a second opinion is a standard practice. Medicine uses the kappa statistic to measure the noise or variance between different doctors’ opinions. A 1 score reflects complete agreement, while 0 is complete disagreement. This means that whether someone is diagnosed with a serious disease like cancer or not depends on the doctor they see. For example, in one clinic the diagnostic accuracy of melanomas was just 64 percent, indicating that doctors misdiagnosed one third of patients.
Moreover, noise increases during the trajectory of a doctor’s day, given their rising levels of fatigue and their tendency to run behind schedule. Studies have shown that doctors are more likely to order preventative measures such as cancer screenings early in the morning than late in the afternoon.
The authors argue that it would “be a major contribution not only to medicine but also to human knowledge to provide a comprehensive account of the existence and magnitude of noise in the context of different medical problems” (324). This is especially the case for areas of medicine where a high degree of judgment is required. Aggregation of medical opinions, in addition to the application of algorithms where possible, would help reduce noise.
Still, guidelines such as physician Virginia Apgar’s score to determine the health of newborns, which uses non-numerical criteria such as skin-tone and the pitch of a cry, reduce noise, while retaining the need for judgment. This score focuses on “the relevant predictors, simplification of the predictive model, and mechanical aggregation” (327).
Psychiatry may be the noisiest form of medicine, as psychiatrists often disagree on diagnosing the same patient. For that reason, noise-reduction strategies have been deployed in the discipline since the 1940s. Psychiatrists belonging to different schools—for example, a biomedical expert and a clinician with developmental training—might make different diagnoses. However, in this branch of medicine as in others, guidelines have succeeded in reducing bias and noise.
Performance assessment is almost ubiquitous in large organizations and can have a significant impact on people’s careers. However, performance assessments are extremely noisy, as it is difficult to measure output, especially in knowledge workers. Studies have found that variance between assessors in a performance review amounts to 20-30 percent, and 70-80 percent of that divergence in the ratings is system noise. Noise emerges when two raters either form two different impressions of a candidate’s performance or use the rating scale “differently to express the same opinion” (335). Then there is pattern noise present, for example, if one rater has either consistent positive or negative feelings towards a candidate, or occasion noise when it comes to their mood on the day. As one study puts it, “the relationship between job performance and ratings of job performance is likely to be weak or at best uncertain” (336).
One way of tackling noise in performance ratings is to aggregate several ratings from different sources, including peers, subordinates, and bosses. This is important because it reflects the fact that good job performance means more than pleasing superiors. However, the development of overcomplex feedback questionnaires has not significantly increased the quality of the performance review given, and some were subject to inflation in ratings so that true excellence was difficult to identify.
Introducing standardization to performance ratings was a key tool for improving them, such as forced ranking, where employees are rated against each other. There was far less noise in relative judgments than absolute ones, as it reduced pattern noise and level noise, as both lenient and tough rankers “use the same ranks” (341). While ranking is helpful in distinguishing the truly excellent from the merely adequate, it should not be used to distort the facts that all performers meet predefined expectations. Indeed, this would be expected in a high-performing organization.
While accuracy in performance ratings remains a concern, picking the right scale in assessment can be helpful, as it offers a common reference point. Moreover, descriptors for assessing performance should be specific enough to be interpreted consistently.
Vetting job candidates requires a form of judgment. The authors argue that while many employers use unstructured interviews and an intuitive approach to determine whether to hire someone, such informal measures are “often useless” when it comes to determining whether a candidate will succeed or fail in a role (350). The chance that a selected candidate who performs better at an interview will succeed in their role is 56-60 percent greater than for the unselected candidate. While there is uncertainty as to how well the selected candidate will adjust to their job and which life events will influence their performance, interviews are full of psychological biases in addition to noise, as different interviewers respond differently to the same candidate. The average rate of agreement about candidates amongst interviewers is 76 percent, and two interviewers can form conflicting views and narratives of cause and effect about the candidate’s answers to the questions. This shows that human interpretations are always influenced by previously held attitudes.
First impressions of candidates influence interviewers disproportionately, especially in unstructured interviews where the interviewers steer the conversation in accordance with how the candidate behaves. They may ask harsher questions of candidates they do not initially warm to, collecting different evidence about them.
While a mechanical approach might reduce hiring errors, companies favor aggregating their judgments, leading to further noise. Google takes measures to reduce this noise by ensuring that all interviewers rate candidates individually before comparing notes. They also adopt a strategy called “structuring complex judgments” (357). This involves decomposition—breaking down the decision into mediating assessments—which encourages judges to hone in on important criteria and filter out irrelevancies. Google recruiters also evaluate candidates’ answers in relation to a predetermined rubric scale. While structured interviews are unpopular with both interviewers and interviewees for their exam-like quality, they are far better predictors of whether someone will be capable of performing a job well. Google also uses work sample tests and elicits backdoor references, not from references the candidate has picked, but from people with whom they have crossed paths. The final principle of “delayed holistic judgment” by a select hiring committee involves not excluding intuition, that staple of unstructured interviews, but delaying it (360). The offer will be granted or not depending on the four interviewers’ average of scores.
Kahneman, Sibony, and Australian professor Dan Lovallo founded a new method in decision-making called the “mediating assessments protocol,” which incorporates the decision hygiene strategies seen in previous chapters (363). The protocol is adaptable to several types of organizations and decisions. For example, the protocol used to evaluate different potential candidates for a job can be adapted to evaluating strategic options. The protocol “maximizes the value of information by keeping the dimensions of the evaluation independent of each other” (365). It also postpones the bias towards a particular conclusion until all the different assessments have been made.
The first task is to draw up a list of assessments to be evaluated and define the specificity of the criteria to be assessed. The more specific the criteria for evaluation, the greater the reduction of noise and the capacity for improvement. Then, the protocol ensures that all assessments are made by candidates independently to avoid the corrupting effect of mutual influence.
When collecting the results, it is important to remember that discordance is a good sign, as it will incite important conversations about how to improve things. Using the “estimate-talk-estimate method,” where people with opposing views explain their position and listen to other views, the matter can be deliberated further (375). Finally, intuition should be used at the end of the process when all factual assessments have been made available. This will save companies from costly mistakes.
In this section, readers learn about the traits of good judges and what it takes to be a superforecaster—someone who can make predictions of above-average accuracy about an uncertain future. While high GMA scores and a degree of specialized knowledge were predictably helpful traits, the application of intelligence is the most important factor in becoming a superforecaster. While quick thinking and rapid intuitions are often valued in today’s society, such traits were absent in superforecasters. Instead, they applied slower, more deliberative System 2 thinking to structure the problem into different assessments and then find a balance of seeking new information while eliminating superfluous evidence. Importantly, the Good Decision Project urged forecasters not to share their findings until they had all come up with their individual researched opinion, to avoid the halo top effect of mutual influence. Additionally, rather than rushing to use intuition, the Good Decision Project advocated delaying its use until the more objective assessments had been made.
The authors argue that while the Good Decision Project was developed in a lab-like setting, its approaches would prevent noise-related mistakes in several areas. They show how, for example, Google uses elements of the Good Decision project’s strategy by insisting that interviewers evaluate candidates separately, seeking new information about candidates from outside references and delaying the use of intuition until the end.
While these strategies might seem more applicable to disciplines such as employment where the role of fallible human judgment seems obvious, the authors show how they can also help in more apparently scientific practices, such as correctly diagnosing disease or matching fingerprints. They demonstrate how even where advanced diagnostic technology is used, human judgment still plays a role, and this contributes to unwanted variance amongst professionals and even in the same individual on different days. For example, the fact that doctors are more likely to prescribe cancer screenings to patients they see in the mornings than in the afternoons is a bias that has more to do with the doctor’s daily trajectory than the patient’s susceptibility to disease. This example shows how noise could potentially cost lives. While the authors’ research may be alarming in showing readers the limits to the trust people place in professionals, it is also optimistic in its proposal of strategies to combat this.
Plus, gain access to 8,800+ more expert-written Study Guides.
Including features:
Books on Justice & Injustice
View Collection
Business & Economics
View Collection
New York Times Best Sellers
View Collection
Philosophy, Logic, & Ethics
View Collection
Psychology
View Collection
Science & Nature
View Collection
Self-Help Books
View Collection
Sociology
View Collection
Truth & Lies
View Collection