Clinical Judgment and Mechanical Prediction Research Paper

View sample clinical judgment and mechanical prediction research paper. Browse other research paper examples and check the list of psychology research paper topics for more inspiration. If you need a psychology research paper written according to all the academic standards, you can always turn to our experienced writers for help. This is how your paper can get an A! Feel free to contact our custom writing service for professional assistance. We offer high-quality assignments for reasonable rates.

Considerable effort has been made to improve the quality of assessment information (e.g., by constructing new tests). However, it is also important that advances be made in the way that assessment information is used to make judgments and decisions. Two general methods for making judgments and decisions will be described and critiqued in this research paper: clinical judgment and mechanical prediction.

Academic Writing, Editing, Proofreading, And Problem Solving Services

Get 10% OFF with 24START discount code

Having suffered through statistics classes, students and professionals may be put off by the term mechanical prediction. They may even feel weak and bewildered when confronted with terms such as actuarial prediction, automated assessment, and statistical prediction. Terminology in this area is sometimes confusing, so it will behoove us to take a moment to clarify the meaning of these and other terms.

In the context of personality assessment, clinical judgment refers to the method by which judgments and decisions that are made by mental health professionals. Statistical prediction refers to the method by which judgments and decisions that are made by using mathematical equations (most often linear regression equations). These mathematical equations are usually empirically based—that is, the parameters and weights for these equations are usually derived from empirical data. However, some statistical prediction rules (e.g., unit weight linear rules) are not derived using empirical data. The terms statistical prediction and actuarial prediction are close in meaning: They can be used interchangeably to describe rules that are derived from empirical data. Statistical and actuarial prediction can be distinguished from automated assessment. Automated assessment computer programs consist of aseries of if-then statements. These statements are written by expert clinicians based on their clinical experiences and their knowledge of the research literature and clinical lore. Computer-based test interpretation programs are examples of automated assessment programs. They have been enormously popular—for example, for the interpretation of the Minnesota Multiphasic Personality Inventory–II (MMPI-2; Butcher, Dahlstrom, Graham, Tellegen, & Kaemmer, 1989). Finally, the term mechanical prediction also needs to be defined. As defined by Grove, Zald, Lebow, Snitz, and Nelson (2000), mechanical prediction is “statistical prediction (using explicit equations), actuarial prediction (as with insurance companies’actuarial tables), and what we may call algorithmic prediction (e.g., a computer program emulating expert judges). . . . Mechanical predictions are 100% reproducible” (p. 19). In other words, mechanical prediction is a global term that subsumes statistical prediction, actuarial prediction, and automated assessment, but not clinical judgment.

To clarify how mechanical prediction rules can be used in personality assessment, it will be helpful to describe a model study.InastudyconductedatWesternPsychiatricInstituteand Clinic at the University of Pittsburgh (Gardner, Lidz, Mulvey, & Shaw, 1996), the judgment task was to predict whether patients would become violent in the next 6 months. Clinicians were psychiatrists, psychiatric residents, and nurse-clinicians who had seen the patients in the emergency (admissions) department and who had conferred on the cases together. Clinical and statistical predictions were made for 784 patients. To obtain outcome scores, patients and significant others were interviewed over the following 6 months. Additional information was also used to learn if a patient had become violent: commitment, hospital, and police records were searched for reports of violent incidents. Patients were said to be violent if they had “laid hands on another person with violent intent or threatened someone with a weapon” (Lidz, Mulvey, & Gardner, 1993, p. 1008). One of the strengths of the study is that the data were analyzed using receiver operating characteristics (ROC) analysis. ROC methods form an important part of signal detection theory. Using ROC methods, measures of validity are unaffected by base rates or by clinicians’biases for or against Type I or Type II errors (McFall & Treat, 1999; Mossman, 1994; Rice & Harris, 1995). For both clinical prediction and statistical prediction, the average area under the ROC curve (AUC) was reported. For this task, the AUC is equal to the probability of a randomly selected violent patient’s being predicted to be violent more often than a randomly selected nonviolent patient. The greater theAUC, the greater the accuracy of predictions.Avalue of .5 represents the chance level of prediction. With regard to the results, the AUC for statistical prediction was .74 and the AUC for clinical prediction was only .62.

Historically, the issue of clinical versus statistical prediction has been the subject of intense debate. Theissuefirstdrew a great deal of attention in 1954 when Paul Meehl published his classic book, Clinical versus Statistical Prediction:ATheoretical Analysis and a Review of the Evidence. This is a book that for many years was read by nearly all graduate students in clinical and counseling psychology programs. In his book, Meehl noted that in almost every comparison between clinical and statistical prediction, the statistical method was equal or superior to informal clinical judgment. This conclusion has generally been supported in subsequent reviews (e.g., Dawes, Faust, & Meehl, 1989, 1993; Garb, 1994; Goldberg, 1991; Grove et al., 2000; Grove & Meehl, 1996; Kleinmuntz, 1990; Marchese, 1992; Meehl, 1986; Wiggins, 1981). Meehl is one of the most highly regarded psychologists in the history of clinical psychology, and late in his career he bemoaned the fact that psychologists were neglecting the research on statistical prediction. According to Meehl (1986):

There is no controversy in social science that shows such a large body of qualitatively diverse studies coming out so uniformly in the same direction as this one. When you are pushing 90 investigations, predicting everything from the outcome of football games to the diagnosis of liver disease and when you can hardly come up with a half dozen studies showing even a weak tendency in favor of the clinician, it is time to draw a practical conclusion, whatever theoretical differences may still be disputed. (pp. 373–374)

According to Meehl and other advocates of statistical prediction, mental health professionals should be using statistical rules to make diagnoses, descriptions of traits and symptoms, behavioral predictions, and other types of judgments and decisions. Yet, clinicians rarely do this. One is left wondering why.

The following topics will be covered in this research paper: (a) resultsonclinicalversusmechanicalprediction,(b)thestrengths and limitations of clinical judgment, (c) the strengths and limitations of automated assessment, and (d) the strengths and limitations of statistical prediction. Recommendations will be made for improving the way that judgments and decisions are made in clinical practice.

Clinical Versus Mechanical Prediction

The most comprehensive and sophisticated review of studies on clinical versus mechanical prediction was conducted by Grove et al. (2000). In addition to locating more studies than anyone else, they published the only meta-analysis in this area. Their review will be described in detail.

In their search of the literature, Grove et al. (2000) included only studies in the areas of psychology and medicine. Studies were included if clinicians and mechanical procedures were used to “predict human behavior, make psychological or medical diagnoses or prognoses, or assess states and traits (including abnormal behavior and normal personality)” (p. 20). Also, studies were included only if the clinicians and the mechanical procedures had access to “the same (or almost the same) predictor variables” (p. 20). After an extensive search of the literature, 136 studies were found that qualified for inclusion.

The results reported by Grove et al. (2000) favor mechanical prediction. Mechanical prediction techniques substantially outperformed clinical prediction in 44%, or 60%, of the studies. In contrast, clinicians substantially outperformed mechanical prediction techniques in 6%, or 8%, of the studies (results were calculated from their Figure 1, p. 21). In the remaining studies, clinical predictions were roughly as accurate as mechanical predictions. On average, mechanical prediction rules were about 10% more accurate than clinicians.

Overall, the results of the meta-analysis support the general superiority of mechanical prediction. However, in light of these findings, comments made by statistical prediction advocates seem too extreme. For example, Meehl’s (1986, p. 374) claim that there are only “a half dozen studies showing even a weak tendency in favor of the clinician” no longer seems accurate. As noted by Grove et al. (2000), “Our results qualify overbroad statements in the literature opining that such superiority is completely uniform” (p. 25).

Grove et al. (2000) also reported additional interesting findings. The general superiority of mechanical prediction holds across categories: “It holds in general medicine, in mental health, in personality, and in education and training settings” (p. 25). They also found that mechanical prediction was usually superior regardless of whether clinicians were “inexperienced or seasoned judges” (p. 25). With regard to a third result, one variable was notable in the eight studies in which clinical judgment outperformed mechanical prediction: In seven of those eight studies, the clinicians received more data than the mechanical prediction rules. One implication of this finding is that optimal information has not always been used as input for mechanical prediction rules. One more result will be mentioned. Mechanical prediction rules were superior to clinicians by a larger margin when interview information was available. Limitations of interview information have been described in the clinical judgment literature (Ambady & Rosenthal, 1992; Garb, 1998, pp. 18–20).

To check on the integrity of their findings, Grove et al. (2000) conducted additional analyses:

[We] examined specific study design factors that are rationally related to quality (e.g., peer-reviewed journal versus chapter or dissertation, sample size, level of training and experience for judges, cross-validated versus non-cross-validated statistical formulae). Essentially all of these study-design factors failed to significantly influence study effect sizes; no such factor produced a sizable influence on study outcomes. (p. 25)

Thus, roughly the same results were obtained in studies varying in terms of methodological quality.

The Grove et al. (2000) meta-analysis is a landmark study, but it does not address many important issues. For example, specific mechanical prediction rules that clinicians should be using are not described; nor are obstacles to developing better mechanical prediction rules. Finally, conditions under which clinical judgment should be preferred to mechanical prediction are not described. These issues and others will now be discussed.

Critique of Mechanical Prediction

Automated Assessment

As already noted, automated assessment programs consist of a series of if-then statements. They are written by clinicians on the basis of their clinical experiences and their knowledge of the research literature and clinical lore. They are considered to be mechanical prediction rules because statements generated by automated assessment programs are 100% reproducible.

Several strengths of automated assessment programs can be described. First, they are written by clinicians who are generally thought to be experts. Another advantage is that they are mechanical prediction methods, and thus test-retest reliability is perfect (e.g., given a particular MMPI-2 test protocol, the same test report will always be written). Also, the general superiority of mechanical prediction methods was supported by Grove et al. (2000), although results were not analyzed separately for automated assessment programs and statistical prediction rules.

A number of weaknesses can also be described. First, in empirical studies, alleged experts often have been no more accurate than other clinicians (for reviews, see Garb, 1989, 1998; Garb & Schramke, 1996). Second, although test-retest reliability is perfect, interrater reliability is not. Computerbased test reports generated by automated assessment programs are generally used by clinicians along with other information (e.g., history information). One should not assume that psychologists will make similar judgments and decisions when they integrate all of this information. Finally, and perhaps most important, many automated assessment programs for interpreting psychological test results are not validated (Adams & Heaton, 1985; Garb, 1998, 2000b; Garb & Schramke, 1996; Honaker & Fowler, 1990; Lanyon, 1987; Matarazzo, 1986; Snyder, 2000; Snyder, Widiger, & Hoover, 1990; but also see Butcher, Perry, & Atlis, 2000). Thus, automated assessment programs can “lend an unwarranted impression of scientific precision” (Snyder, 2000, p. 52).

Statistical Prediction

One can expect statistical prediction rules to be more accurate than automated assessment programs and clinical judges. After all, statistical prediction rules are usually based on accurate feedback. That is, when deriving statistical prediction rules, accurate criterion scores are usually obtained. Put another way (Garb, 2000a), “In general, statistical prediction rules will do well because they make use of the inductive method. A statistical prediction rule will do well to the extent that one can generalize from a derivation sample to a new sample” (p. 32). In contrast, in the course of clinical practice, it is normally too expensive for clinicians to obtain good criterion scores. For example, clinicians are unable to follow up with patients after a 6 month time period to learn if they have become violent. Similarly, when writing a computer-based test interpretation program, an expert clinician will not normally collect criterion information.

There is another important reason one can expect statistical prediction rules to be more accurate than automated assessment programs and clinical judges. The use of statistical prediction rules can minimize the occurrence of errors and biases, including race bias and gender bias (Garb, 1997). Automated assessment programs may be biased (e.g., descriptions may be more accurate for White clients than Black clients), because criterion scores are not usually obtained to learn whether accuracy varies by client characteristic (e.g., race). Errors and biases that occur when clinicians make judgments will be described in a later section. Suffice it to say that a carefully derived statistical rule will not make predictions that vary as a function of race or gender unless race or gender has been shown to be related to the behavior one is predicting. To make sure that statistical predictions are unbiased, the effects of client characteristics (e.g., race, gender) need to be investigated.

Although there are reasons to believe that statistical prediction rules will transform psychological assessment, it is important to realize that present-day rules are of limited value (Garb, 1994, 1998, 2000a). For tasks involving diagnosis or describing personality traits or psychiatric symptoms, many statistical prediction rules make use of only limited information (e.g., results from only a single psychological test). This might be satisfactory if investigators first determined that the assessment information represents the best information that is available. However, this is not the case. For tasks involving diagnosis and describing personality traits and psychiatric symptoms, investigators rarely collect a large amount of information and identify optimal predictors.

There is a methodological reason why optimal information has rarely been used for the tasks of diagnosis and describing personality traits and psychiatric symptoms. When statistical prediction rules have been derived for these tasks, criterion ratings have usually been made by psychologists who use information that is available in clinical practice (e.g., history and interview information). If information used by criterion judges is also used as input information for statistical prediction rules, criterion contamination can occur. To avoid criterion contamination, information that is given to criterion judges is not used as input information for statistical prediction rules, even though this information may be optimal. Thus, in many studies, statistical predictions are made using results from a psychological test but not results from history and interview information.

To avoid criterion contamination, new methods need to be used to construct and validate statistical rules for the tasks of diagnosis and describing personality traits and psychiatric symptoms (Garb, 1994, 1998, 2000a). For example, by collecting longitudinal information, one can obtain criterion scores that are not based on information that is normally used by mental health professionals. Thus, if a statistical rule makes a diagnosis of major depression, but longitudinal data reveal that the client later developed a manic episode, then we could say that this diagnosis was incorrect.

Criterion contamination is not a problem for behavioral prediction (e.g., predicting suicide), so it is not surprising that statistical prediction rules that have been used to predict behavior have been based on optimal information. For behavioral prediction, outcome scores are obtained after assessment information has been collected and predictions have been made.All of the information that is normally available in clinical practice can be used by a statistical prediction rule without fear of criterion contamination.

Most present-day statistical prediction rules have not been shown to be powerful.As already noted, statistical prediction rules for making diagnoses and describing personality traits and psychiatric symptoms have almost always made use of limited information that has not been shown to be optimal (e.g., Carlin & Hewitt, 1990; Danet, 1965; Goldberg, 1965, 1969, 1970; Grebstein, 1963; Hiler & Nesvig, 1965; Janzen & Coe, 1975; Kleinmuntz, 1967; Lindzey, 1965; Meehl, 1959; Oskamp, 1962; Stricker, 1967; Todd, 1954; Vanderploeg, Sison, & Hickling, 1987). Typically, the statistical prediction rules, and the clinicians to which they have been compared, have been given results from only a single test.

An example will be given. In one of the best known studies on clinical versus statistical prediction (Goldberg, 1965), MMPI (Hathaway & McKinley, 1942) results were used to discriminate between neurotic and psychotic clients. Goldberg constructed a formula that involves adding and subtracting MMPI T scores: Lie (L) + Paranoia (Pa) + Schizophrenia (Sc) – Hysteria (Hy) – Psychasthenia (Pt). Using data collected by Meehl (1959), hit rates were 74% for the Goldberg index and only 68% for the average clinician. Clinicians in this study were not given any information other than the MMPI protocols. The study is well known not so much because the statistical rule did better than clinicians, but because a simple linear rule was more accurate than complex statistical rules including regression equations, profile typologies, Bayesian techniques, density estimation procedures, the Perceptron algorithm, and sequential analyses. However, one can question whether the Goldberg index should be used by itself in clinical practice to make differential diagnoses of neurosis versus psychosis. As observed by Graham (2000), “It is important to note that the index is useful only when the clinician is relatively sure that the person being considered is either psychotic or neurotic. When the index is applied to the scores of normal persons or those with personality disorder diagnoses, most of them are considered to be psychotic” (p. 252). Thus, before using the Goldberg index, one needs to rule out diagnoses of normal and of personality disorder, either by relying on clinical judgment or another statistical prediction rule. Of course, the other limitation of the Goldberg index is that it is possible, and perhaps even likely, that clinicians could outperform the index if they were given history and interview information in addition to MMPI results.

In contrast to diagnosis and the description of personality traits and psychiatric symptoms, present-day rules are more promising for the task of prediction. Statistical prediction rules have been developed for predicting violence (e.g., Gardner et al., 1996; Lidz et al., 1993; Monahan et al., 2000), but they are not yet ready for clinical use. In commenting on their prediction rule, Monahan et al. (p. 318) noted that “the extent to which the accuracy of the actuarial tool developed here generalizes to other types of clinical settings (e.g., forensic hospitals) is unknown.” One can anticipate (and hope) that actuarial rules for predicting violence will soon be available for widespread use in clinical practice.

Although valuable actuarial rules for predicting violence may soon be available, prospects are less promising for the prediction of suicide. This is such an important task that if a rule could obtain even a low level of accuracy, it might be of use in clinical practice. However, results for actuarial rules have been disappointing. For example, in one study (R. B. Goldstein, Black, Nasrallah, & Winokur, 1991), predictions were made for 1,906 patients who had been followed for several years. Forty-six of the patients committed suicide. Several risk factors for suicide were identified (e.g., history of suicide attempts, suicidal ideation on index admission, and gender). However, these risk factors could not be meaningfully used to make predictions. When the risk factors were incorporated into a statistical rule, five predictions of suicide were made, but only one of them was valid and predictions of no suicide were made for 45 of the 46 patients who did kill themselves. The statistical rule did not do well even though it was derived and validated on the same data set.

Among the most valuable statistical prediction rules currently available are those in the area of behavioral assessment. These rules are helpful for conducting functional analyses.As observed by Schlundt and Bell (1987),

when clients keep a self-monitoring diary, a large amount of data is often generated. Typically, clinicians review the records and use clinical judgment to identify patterns and draw inferences about functional relationships among antecedents, behaviors, and consequences. Although the clinical review of self-monitoring records provides data that might not be otherwise obtained, clinical judgment is known to be subject to inaccuracies . . . and statistical prediction is typically more accurate and reliable than clinical judgment. (p. 216)

The shortcomings of clinical judgment for functional analyses were illustrated in a study by O’Brien (1995). In this study, the self-monitoring data for a client who complained of headaches were given to eight clinical psychology graduate students. Over a period of 14 days, the client monitored a number of variables including stress level, arguments, hours of sleep, number of headaches, headache severity, duration of headaches, and number of painkillers taken. The task for the graduate students was to estimate “the magnitude of functional relationships that existed between pairs of target behaviors and controlling factors by generating a subjective correlation” (p. 352). Results were both surprising and disappointing: The graduate students identified the controlling variables that were most strongly correlated with each headache symptom only 51% of the time.

Given the shortcomings of clinical judgment for describing functional relationships, it is important to note that sequential and conditional probability analyses have been used to analyze self-monitoring data. These statistical analyses have been used to clarify the functional relationships involved in a variety of problems including smoking addiction, bulimia, hypertension, and obesity (e.g., Schlundt & Bell, 1987; Shiffman, 1993).

In conclusion, there are reasons one can expect statistical prediction rules to be more accurate than automated assessment programs and clinical judges. However, relatively few statistical prediction rules can be recommended for clinical use. Substantial progress has occurred with predicting violence, child abuse and neglect among the offenders and it does seem likely that powerful statistical rules for these tasks will become available for use in clinical practice in the near future (see Wood, Garb, Lilienfeld, & Nezworski, 2002). Also, statistical rules for analyzing functional relationships are impressive. On the other hand, before powerful statistical rules become available for other tasks, such as diagnosis, the description of personality era psychopathology, and planning methodological barriers will have to be overcome.

Critique of Clinical Judgment

A strength of clinical judgment is that mental health professionals can make use of a wide range of information. Automated assessment programs and present-day statistical prediction rules generally make use of limited information, for example, results from a single psychological test. In contrast, mental health professionals can make judgments after reviewing all of the information that is normally available in clinical practice. As noted earlier, in seven of the eight studies that found clinicians to be substantially more accurate than mechanical prediction rules (Grove et al., 2000), clinicians had more information available than did the mechanical prediction rules.

Mental health professionals can make reliable and valid judgments if they are careful about the information they use, if they avoid making judgments for tasks that are extremely difficult (tasks that are so difficult the clinicians are unable to make reliable and valid judgments), and if they are careful in how they make their judgments (Garb, 1998). For example, they can make reliable and valid diagnoses if they adhere to diagnostic criteria. Similarly, they can make moderately valid predictions of violence.

The focus of this section is on the limitations of clinical judgment. Results from empirical studies reveal that it can be surprisingly difficult for mental health professionals to learn from clinical experience. That is, a large body of research contradicts the popular belief that the more experience clinicians have, the more likely it is that they will be able to make accurate judgments. Numerous studies have demonstrated that when different groups of clinicians are given identical sets of information, experienced clinicians are no more accurate than are less experienced clinicians (Dawes, 1994; Garb, 1989, 1998; Garb & Boyle, in press; Garb & Schramke, 1996; Goldberg, 1968; Wiggins, 1973; also see Meehl, 1997). Remarkably, these results even extend to comparisons of mental health professionals and graduate students in mental health fields. These results, along with results on the value of training, will be described. Afterward, the reasons clinicians have trouble learning from experience will be described.

Experience and Validity

The validity of judgments will be described for presumed expert versus nonexpert clinicians, experienced versus less experienced clinicians, clinicians versus graduate students, and graduate students followed over time.Also described will be research on illusory correlations. Results from all of these studies describe the relations among presumed expertise, experience, and validity.

For the task of interpreting objective and projective personality test results, alleged experts have been no more accurate than other clinicians, and experienced clinicians have been no more accurate than less experienced clinicians (Graham, 1967; Levenberg, 1975; Silverman, 1959; Turner, 1966; Walters, White, & Greene, 1988; Wanderer, 1969; Watson, 1967). In these studies, all of the clinicians were given the assessment information. For example, in one study (Turner), expert judges were “25 Fellows in the Society for Projective Techniques with at least 10 years of clinical experience with the Rorschach” (p. 5). In this study, different groups of judges were to use Rorschach results to describe the personality functioning of clients. Not only were the presumed expert judges no more accurate than a group of recently graduated psychologists (PhDs) and a group of graduate students in clinical psychology, they were not even more accurate than a group of “25 undergraduate psychology majors who were unfamiliar with the technique” (p. 5). In another study (Graham, 1967), one group of PhD-level psychologists had used the MMPI much more frequently than a less experienced group of psychologists. Also, the experienced group, but not the inexperienced group, demonstrated a broad knowledge of the research literature on the MMPI. In this study, as in the others, judgmental validity was not related to experience and presumed expertise.

The relation between experience and validity has also been investigated among psychiatrists. Results indicate that experience is unrelated to the validity of diagnoses and treatment decisions, at least under some circumstances (Hermann, Ettner, Dorwart, Langman-Dorwart, & Kleinman, 1999; Kendell, 1973; Muller & Davids, 1999). For example, in one study (Muller & Davids, 1999), psychiatrists who described themselves as being experienced in the treatment of schizophrenic patients were no more adept than less experienced psychiatrists when the task was to assess positive and negative symptoms of schizophrenia. In another study (Hermann et al., 1999), the number of years of clinical experience was negatively related to validity. Hermann et al. found that “psychiatrists trained in earlier eras were more likely to use ECT [electroconvulsive therapy] for diagnoses outside evidencebased indications” (p. 1059). In this study, experienced psychiatrists may have made less valid judgments than younger psychiatrists because education regarding the appropriate use of ECT has improved in recent years. If this is true, then the value of having years of clinical experience did not compensate for not having up-to-date training.

Results have been slightly different in the area of neuropsychology. Neuropsychologists with national reputations did better than PhD psychologists when using the BenderGestalt Test to diagnose organic brain damage (Goldberg, 1959) and when using the Halstead-Reitan Neuropsychological Test Battery to describe neurological impairment (Wedding, 1983). Otherwise, results in the area of neuropsychology have been similar to results obtained in the areas of personality assessment and diagnosis. For example, neuropsychologists with the American Board of Professional Psychology (ABPP) diploma have generally been no more accurate than less experienced and presumably less qualified doctorallevel neuropsychologists (Faust et al., 1988; Gaudette, 1992; Heaton, Smith, Lehman, & Vogt, 1978; Wedding, 1983).

One of the neuropsychology studies will be described. In this study (Faust et al., 1988), 155 neuropsychologists evaluated results from several commonly used neuropsychological tools (including the Halstead-Reitan Neuropsychological Test Battery). The judgment task was to detect the presence of neurological impairment and describe the likely location, process, and etiology of any neurologic injury that might exist. Clinicians’ levels of training and experience were not related to the validity of their judgments. Measures of training included amount of practicum experience in neuropsychology, number of supervised neuropsychology hours, relevant coursework, specialized neuropsychology internship training, and the completion of postdoctoral training in neuropsychology. Measures of experience included years of practice in neuropsychology and number of career hours spent on issues related to neuropsychology. Status in the ABPP was used as a measure of presumed expertise. The results indicated that there is no meaningful relationship between validity, on the one hand, and training, experience, and presumed expertise, on the other.

An assumption that is frequently made without our even being aware that we are making the assumption is that clinical and counseling psychologists are more accurate than psychology graduate students. However, with few exceptions, this assumption has not been supported. In empirical studies, psychologists and other types of mental health professionals have rarely been more accurate than graduate students, regardless of the type of information provided to clinicians. This has been true when judgments have been made on the basis of interviews (Anthony, 1968; Grigg, 1958; Schinka & Sines, 1974), case history information (Oskamp, 1965; Soskin, 1954), behavioral observations (Garner & Smith, 1976; E. Walker & Lewine, 1990), recordings of psychotherapy sessions (Brenner & Howard, 1976), MMPI protocols (Chandler, 1970; Danet, 1965; Goldberg, 1965, 1968; Graham, 1967, 1971; Oskamp, 1962; Walters et al., 1988; Whitehead, 1985), human figure drawing protocols (Levenberg, 1975; Schaeffer, 1964; Stricker, 1967), Rorschach protocols (Gadol, 1969; Turner, 1966; Whitehead, 1985), screening measures for detecting neurological impairment (Goldberg, 1959; Leli & Filskov, 1981, 1984; Robiner, 1978), and all of the information that clinical and counseling psychologists normally have available in clinical practice (Johnston & McNeal, 1967).

Although mental health professionals have rarely been more accurate than graduate students, two exceptions can be described. In both instances, the graduate students were just beginning their training. In the first study (Grebstein, 1963; reanalyzed by Hammond, Hursch, & Todd, 1964), the task was to use Rorschach results to estimate IQ. Clinical psychologists were more accurate than graduate students who had not yet had practicum training, although they were not more accurate than advanced graduate students. In a second study (Falvey & Hebert, 1992), the task was to write treatment plans after reading case histories. Certified clinical mental health counselors wrote better treatment plans than graduate students in master’s degree programs, but half of the graduate students had not yet completed a single class related to diagnosis or treatment planning.

Although mental health professionals were sometimes more accurate than beginning graduate students, this was not always the case. In one study (Whitehead, 1985), psychologists, first-year clinical psychology graduate students, and fully trained clinical psychology graduate students were instructed to make differential diagnoses on the basis of Rorschach or MMPI results. For example, one task they were given was to differentiate patients with schizophrenia from those with bipolar disorder. The first-year graduate students had received training in the use of the MMPI, but they had not yet received training in the use of the Rorschach. For this reason, the only Rorschach data given to beginning graduate students were transcripts of the Rorschach sessions. In contrast, the Rorschach data given to psychologists and fully trained graduate students included transcripts, response location sheets, and Rorschach scores (using the Comprehensive System Structural Summary; Exner, 1974). In general, all three groups of judges were able to make valid judgments (accuracy was better than chance), although they were significantly less accurate when the Rorschach was used as the sole source of data. A repeated measures analysis of variance indicated that accuracy did not vary for the three groups of judges, both for the Rorschach data and the MMPI data.

To learn about the relation between experience and validity, one can conduct a longitudinal study. In one study (Aronson & Akamatsu, 1981), 12 graduate students made judgments using the MMPI before and after they completed a year-long assessment and therapy practicum. All of the students had already completed a course on MMPI interpretation. To determine validity, graduate students’ judgments were compared with criterion ratings made on the basis of patient and family interviews. Results revealed that validity increased from .42 to only .44 after graduate students completed their practicum. The practicum experience did not serve to improve accuracy significantly.

Studies on illusory correlations (Chapman & Chapman, 1967, 1969; Dowling & Graham, 1976; Golding & Rorer, 1972; Kurtz & Garfield, 1978; Lueger & Petzel, 1979; Mowrey, Doherty, & Keeley, 1979; Rosen, 1975, 1976; Starr & Katkin, 1969; R. W. Waller & Keeley, 1978) also demonstrate that it can be difficult for clinicians to learn from clinical experience (for a review, see Garb, 1998, pp. 23–25). An illusory correlation occurs when a person believes that events are correlated even though they really are not.

In a classic study that established the paradigm for studying illusory correlations, Chapman and Chapman (1967) hoped to learn why psychologists use the sign approach to interpret the draw-a-person test despite research that reflects negatively on its validity (Groth-Marnat & Roberts, 1998; Joiner & Schmidt, 1997; Kahill, 1984; Lilienfeld, Wood, & Garb, 2000, 2001; Motta, Little, & Tobin, 1993; Swensen, 1957; Thomas & Jolley, 1998). The sign approach involves interpreting a single feature of a drawing (e.g., size of figure, unusual eyes). It can be contrasted to the global approach, in which a number of indicators are summed to yield a total score. The global approach has a stronger psychometric foundation than the sign approach (e.g., Naglieri, McNeish, & Bardos, 1991).

In their study, Chapman and Chapman (1967) instructed psychologists to list features of drawings (signs) that are associated with particular symptoms and traits. They then presented human figure drawings to undergraduates. On the back of each drawing was a statement that described a trait or symptom that was said to be descriptive of the client who had drawn the picture. Undergraduates were to examine each drawing and then read the statement on the back.Afterwards, they were to describe signs that were associated with the traits and symptoms. The undergraduates were unaware that the experimenters had randomly paired the drawings and the statements on the back of the drawings. Remarkably, the undergraduates reported observing the same relations that had been reported by the clinicians.

The results of the Chapman and Chapman (1967) study indicate that clinicians respond to the verbal associations of human figure drawings. For example, both clinicians and undergraduates reported that there is a positive relation between unusually drawn eyes and watchfulness or suspiciousness.

The results from the Chapman and Chapman study help to explain why clinicians continue to interpret specific drawing signs even though the overwhelming majority of human figure drawing signs possess negligible or zero validity. Psychologists believe they have observed these relations in their clinical experience, even when they have not. Along with results from other studies on illusory correlation, the results from the Chapman and Chapman study show that clinicians can have a difficult time learning from experience.

Unanswered questions remain. Do psychologists who interpret projective drawings know the research literature on the validity of specific drawing signs? Would they stop making invalid interpretations if they became aware of negative findings or would they weigh their clinical experiences more heavily than the research findings? Research on experience and validity is important because it helps us understand the problems that can occur when psychologists ignore research findings and are guided only by their clinical experiences.

Training and Validity

Empirical results support the value of training. In some, but not all, studies, clinicians and graduate students were more accurate than lay judges. In other studies, mental health professionals with specialized training were more accurate than health professionals without specialized training.

When the task was to describe psychopathology using interview data, psychologists and graduate students outperformed undergraduate students (Grigg, 1958; Waxer, 1976; also see Brammer, 2002). However, for a similar task, they did not outperform physical scientists (Luft, 1950). Additional research needs to be done to clarify whether psychologists and graduate students did better than undergraduates because of the training they received or because they are more intelligent and mature.

When asked to describe psychopathology on the basis of case history data, clinicians outperformed lay judges when judgments were made for psychiatric patients (Horowitz, 1962; Lambert &Wertheimer, 1988;Stelmachers & McHugh, 1964; also see Holmes & Howard, 1980), but not when judgments were made for normal participants (Griswold & Dana, 1970; Oskamp, 1965; Weiss, 1963). Of course, clinicians rarely make judgments for individuals who are not receiving treatment. As a consequence, clinicians may incorrectly describe normals as having psychopathology because they are not used to working with them.

In other studies, judgments were made on the basis of psychological test results. Psychologists were not more accurate than lay judges (e.g., undergraduates) when they were given results from projective techniques, such as Rorschach protocols (Cressen, 1975; Gadol, 1969; Hiler & Nesvig, 1965; Levenberg, 1975; Schaeffer, 1964; Schmidt & McGowan, 1959; Todd, 1954, cited in Hammond, 1955; C. D. Walker & Linden, 1967). Nor were they more accurate than lay judges when the task was to detect brain impairment using screening instruments (Goldberg, 1959; Leli & Filskov, 1981, 1984; Nadler, Fink, Shontz, & Brink, 1959; Robiner, 1978). For example, in a study on the Bender-Gestalt Test (Goldberg, 1959) that was later replicated (Robiner, 1978), clinical psychologists were no more accurate than their own secretaries! Finally, positive results have been obtained for the MMPI. In several studies on the use of the MMPI, psychologists and graduate students were more accurate than lay judges (Aronson & Akamatsu, 1981; Goldberg & Rorer, 1965, and Rorer & Slovic, 1966, described in Goldberg, 1968; Karson & Freud, 1956; Oskamp, 1962). For example, in a study that was cited earlier, Aronson and Akamatsu (1981) compared the ability of graduate and undergraduate students to perform Q-sorts to describe the personality characteristics of psychiatric patients on the basis of MMPI protocols. Graduate students had completed coursework on the MMPI and had some experience interpreting the instrument. Undergraduates had attended two lectures on the MMPI. Validity was determined by using criterion ratings based on family and patient interviews. Validity coefficients were .44 and .24 for graduate and undergraduate students, respectively. Graduate students were significantly more accurate than undergraduates.

The value of specialized training in mental health has also been supported. For example, neuropsychologists are more accurate than clinical psychologists at detecting neurological impairment (e.g., S. G. Goldstein, Deysach, & Kleinknecht, 1973), psychologists with a background in forensic psychology are more accurate than other psychologists when the task is to detect lying (Ekman, O’Sullivan, & Frank, 1999), and psychiatrists make more appropriate decisions than other physicians when prescribing antidepressant medicine (e.g., making sure a patient is on a therapeutic dose; Fairman, Drevets, Kreisman, & Teitelbaum, 1998).

Impediments to Learning From Experience

It is important to understand why it can be difficult for mental health professionals to learn from experience. Invalid assessment information, fallible cognitive processes, and inadequate feedback are some of the factors that can lead to poor judgments and a failure to learn from experience (Arkes, 1981; Brehmer, 1980; Dawes, 1994; Dawes et al., 1989; Einhorn, 1988; Garb, 1998).

Assessment Information

It will be difficult for clinicians to learn from experience if they are using invalid, or marginally valid, information. This point was made by Trull and Phares (2001):

The accuracy of predictions is limited by the available measures and methods that are used as aids in the prediction process. If scores from psychological tests, for example, are not strongly correlated with the criterion of interest (that is, highly valid), then it is unlikely one could ever observe an effect for clinical experience. The accuracy of predictions will remain modest at best and will not depend on how “clinically experienced” the clinician is. (p. 277)

Bearing this in mind, one should be aware that some psychological techniques are controversial, at least when they are used for some tasks. For example, there is a controversy surrounding the use of the Rorschach (Lilienfeld et al., 2000, 2001). One problem is that the norms of the Rorschach Comprehensive System (Exner, 1993) may be inaccurate and may tend to make individuals look pathological even when no pathology exists. This issue has been hotly contested (Aronow, 2001; Exner, 2001; Hunsley & Di Giulio, 2001; Meyer, 2001; Widiger, 2001; Wood, Nezworski, Garb, & Lilienfeld, 2001a, 2001b).

Cognitive Processes

Cognitive biases, cognitive heuristics, and memory processes can exert a major negative impact on judgment and decisionmaking strategies. Cognitive biases are preconceptions or beliefs that can negatively influence clinical judgment. Cognitive heuristics are simple rules that describe how clinicians, and other people, make judgments and treatment decisions. Reliance on cognitive heuristics can be efficient because they are simple and they allow us to make judgments and decisions quickly and with little effort, but they are fallible and can lead clinicians to fail to learn from their experiences. With regard to memory, it should be obvious that clinicians will not learn from their experiences when their memories of those experiences are incorrect.

Several cognitive biases and heuristics will be described. Confirmatory bias occurs when clinicians seek, attend to, and remember information that can support but not counter their hunches or hypotheses. When psychologists ask questions that can confirm but not refute their impressions of a client, they are unlikely to make good judgments and decisions and they are unlikely to learn from their experiences. Similarly, psychologists are unlikely to learn from experience if their memories are distorted to support their preconceptions. Empirical research indicates that confirmatory bias does occur when psychologists work with clients (Haverkamp, 1993; Lee, Barak, Uhlemann, & Patsula, 1995; Murdock, 1988; Strohmer, Shivy, & Chiodo, 1990).

Hindsight bias describes how individuals, including mental health professionals, generate explanations for events that have occurred. Psychologists are generally unaware that knowledge of an outcome influences the perceived likelihood of that outcome (Fischhoff, 1975). In other words, after an event has occurred, people are likely to believe that the event was bound to occur. Results on hindsight bias have been replicated across a range of judgment tasks (Hawkins & Hastie, 1990), including the diagnosis of neurological impairment (Arkes, Faust, Guilmette, & Hart, 1988). Hindsight bias is important for understanding why mental health professionals have difficulty learning from clinical experience because it suggests that they think in deterministic (not probabilistic) terms. As observed by Einhorn (1988):

The clinical approach to diagnosis and prediction can be characterized by its strong reliance on attempting to explain all the data. Indeed, a significant feature of diagnostic thinking is the remarkable speed and fluency that people have for generating explanations to explain any result. For example, “discussion sections” in journal articles are rarely at a loss to explain why the results did not come out as predicted (cf. Slovic & Fischhoff, 1977); psychotherapists are quick to point out that a patient’s suicide should have been anticipated; and commissions, panels, committees, and the like, place blame on administrators for not knowing what is “obvious” in hindsight. As Fischhoff (1975) has pointed out, the past has few surprises but the future has many. (p. 63)

Mental health professionals will have trouble learning from experience if they do not recognize that all assessment information is fallible and that we frequently cannot make predictions with a high degree of certainty. That is, they will believe they have learned many things from a case when they have not. In conclusion, the cognitive processes described by the hindsight bias can lead clinicians to the erroneous belief that a particular combination of symptoms or behaviors is almost invariably associated with a particular outcome.

With regard to cognitive heuristics, the heuristic that is most relevant to understanding why clinicians can have a difficult time learning from experience is the availability heuristic (Kahneman, Slovic, & Tversky, 1982). This heuristic describes how selective memory can lead to judgmental error. Mental health professionals typically recall only selected information about a case because it is difficult, or even impossible, to remember all the details about a client. If their memories of a case are inadequate, they will have trouble learning from the case. According to the availability heuristic, the strength of a memory is related to the vividness of information and the strength of verbal associative connections between events. For example, a mental health professional is likely to remember a client who is striking or unusual in some way. Similarly, when trying to remember if a test indicator and a symptom or behavior co-occurred, a mental health professional may be influenced by the verbal associative connections between the test indicator and the symptom or behavior.

Finally, a large body of research on covariation misestimation suggests that mental health professionals are more likely to remember instances in which a test indicator and symptom are present than those in which a test indicator is absent and a symptom is either present or absent (Arkes, 1981; Kayne & Alloy, 1988). To learn whether a test indicator can be used to describe a symptom, one has to remember instances when the test indicator is absent as well as instances when it is present. Of course, an illusory correlation is said to be present when clinicians cannot accurately determine how two events covary. Thus, in the Chapman and Chapman (1967) study on illusory correlation, when undergraduates mistakenly remembered that there is a positive relation between unusually drawn eyes and watchfulness or suspiciousness, they may have been remembering cases when clients drew unusual eyes but forgetting cases when this drawing characteristic was not present. To be more specific, if a significant proportion of clients who draw unusual eyes are watchful or suspicious, then clinicians may believe this is a valid indicator. However, if a significant proportion of clients who do not draw unusual eyes are also watchful or suspicious, then it would be inappropriate to conclude that unusual eyes is a valid indicator. Thus, covariation misestimation, in addition to verbal associative connections (as mentioned by Chapman & Chapman), may in part explain the occurrence of illusory correlation phenomena.

One other theory about memory and clinical judgment will be mentioned. The act of making a diagnosis can influence how a mental health professional remembers a client’s symptoms (Arkes & Harkness, 1980). According to this theory, a mental health professional may forget that a client has a particular symptom because the symptom is not typical of the symptoms associated with the client’s diagnosis. Similarly, a symptom that is typical of the diagnosis may be “recalled,” even though the client may not have that symptom. Of course, it is difficult to learn from experience when the details of cases are remembered incorrectly.

Environmental Factors

Mental health professionals learn from experience when they receive unbiased feedback, but the benefits of feedback are likely to be setting specific. In several studies (Goldberg & Rorer, 1965 and Rorer & Slovic, 1966, cited in Goldberg, 1968; Graham, 1971), psychologists made diagnoses using MMPI profiles. They became more accurate when they were told whether their diagnoses were valid or invalid, but only when all of the MMPI protocols came from the same setting.

Unfortunately, mental health professionals typically do not receive accurate feedback on whether their judgments and decisions are valid. For example, after making a diagnosis, no one comes along and tells them whether the diagnosis is correct or incorrect. They sometimes receive helpful feedback from a client, but client feedback is subjective and can be misleading. In contrast, when physicians make judgments, they frequently receive accurate feedback from laboratory results, radiology studies, and, in some cases, autopsies. In most cases, for mental health professionals to determine the accuracy of a judgment or decision, longitudinal or outcome data would have to be collected. Longitudinal and outcome data are collected in empirical studies, but most clinicians find this data to be too expensive and time consuming to collect in clinical practice.

Client feedback can be misleading for several reasons. First, clients may be reluctant to dispute their therapists’ hypotheses. This can occur if clients are passive, suggestible, fearful of authority, or motivated to be pleasing. Second, clients may be unable to give accurate feedback because they may not be able to describe all of their traits and symptoms accurately. Even their reports of whether they have improved will be subjective and will be influenced by how they feel when they are asked. Finally, mental health professionals may describe clients in general terms. Their descriptions may be true of clients in general and may not describe traits that are specific to a client (e.g., “You have a superb sense of humor” and “You have too strong a need for others to admire you”— from Logue, Sher, & Frensch, 1992, p. 228). This phenomenon has been labeled the Barnum effect, after the circus figure P. T. Barnum (Meehl, 1954). Occurrence of the Barnum effect will be misleading to clinicians if they believe their judgments and decisions are valid for a specific client and not for clients in general.

Client feedback will also be misleading if clinicians make incorrect interpretations but convince their clients that they are correct. For example, after being told by their therapists that they were abused, some clients falsely remember having been abused (Loftus, 1993; Ofshe & Watters, 1994). These therapists have used a variety of techniques to help clients believe they remember having been abused, including telling them that they were abused, repeatedly asking them to remember the events, interpreting their dreams, hypnotizing them, and referring them to incest-survivor groups. Of course, clinicians will have a hard time learning from experience if they convince clients to accept incorrect interpretations and judgments.

Summary and Discussion

It was not possible to cover all areas of research on clinical judgment and mechanical prediction in this research paper. Most notably,littlewassaidaboutthevalidityofjudgmentsmadeby mental health professionals (e.g., the reliability of diagnoses, the validity of descriptions of personality). An entire book on these topics has been written (Garb, 1998). However, conclusions from key areas of research were described. First, many automated assessment programs for interpreting psychological test results are not validated. Second, although there are reasons to believe that statistical prediction rules will transform psychological assessment, present-day rules are of limited value. Finally, the value of training in psychology and other mental health fields is supported, but research illustrates the difficulty of learning from clinical experience. These last results highlight the importance of continuing education, although continuing education may be of limited value unless it capitalizes on the findings of empirical research.

It is likely that clinical experience is valuable under certain circumstances. Experienced mental health professionals may be more adept at structuring judgment tasks (Brammer, 2002). In virtually of the studies that have been done, the tasks were already structured for clinicians: They were told what judgments to make and they were given information. However, in clinical practice, supervision can be helpful because questions are raised about what judgments and decisions need to be made (Do you think she is suicidal? Has the client ever had a manic episode?). Similarly, supervision can be helpful because supervisors provide direction on what information should be collected. Just the same, although experience may be helpful under certain circumstances, it does not seem to be useful for helping clinicians evaluate the validity of an assessment instrument. Nor does it seem to help clinicians make more valid judgments than graduate students when those judgments are made for a structured task.

Anumber of recommendations can be made for improving the way that judgments and decisions are made. The recommendations are made for both practicing clinicians and research investigators. First, mental health professionals should not use automated assessment programs to interpret test results unless they are appropriately validated. Second, as discussed earlier, new methods for building and validating statistical prediction rules need to be utilized. Data need to be collected for judgment tasks that have not yet been studied. Also, new analyses, including neural network models and multivariate taxometric analyses, should be used to build statistical rules (Marshall & English, 2000; Price, Spitznagel, Downey, Meyer, & Risk, 2000; N. G. Waller & Meehl, 1998). Third, mental health professionals need to become familiar with the research literature on clinical judgment. By becoming familiar with the results of studies on the validity of judgments made by mental health professionals, they can avoid making judgments for tasks that are surprisingly difficult and for which they are unlikely to be accurate. Fourth, clinicians should rely more on their notes and less on their memories. Fifth, to decrease confirmatory bias, clinicians should consider alternative hypotheses when making judgments and decisions. Sixth, when deciding whether to use an assessment instrument or treatment method, clinicians should weigh empirical findings more heavily than clinical experiences. That is, they should not use an assessment instrument or treatment method simply because it seems to work. In conclusion, to improve clinical practice dramatically, powerful statistical prediction rules need to be constructed and clinicians need to place less emphasis on their clinical experiences and greater emphasis on scientific findings.

Bibliography:

Adams, K. M., & Heaton, R. K. (1985). Automated interpretation of neuropsychological test data. Journal of Consulting and Clinical Psychology, 53, 790–802.
Ambady, N., & Rosenthal, R. (1992). Thin slices of expressive behavior as predictors of interpersonal consequences: A metaanalysis. Psychological Bulletin, 111, 256–274.
Anthony, N. (1968). The use of facts and cues in clinical judgments from interviews. Journal of Clinical Psychology, 24, 37–39.
Arkes, H. R. (1981). Impediments to accurate clinical judgment and possible ways to minimize their impact. Journal of Consulting and Clinical Psychology, 49, 323–330.
Arkes, H. R., Faust, D., Guilmette, T. J., & Hart, K. (1988). Eliminating the hindsight bias. Journal of Applied Psychology, 73, 305–307.
Arkes, H. R., & Harkness, A. R. (1980). Effect of making a diagnosis on subsequent recognition of symptoms. Journal of Experimental Psychology: Human Learning and Memory, 6, 568–575.
Aronow, E. (2001). CS norms, psychometrics, and possibilities for the Rorschach technique. Clinical Psychology: Science and Practice, 8, 383–385.
Aronson, D. E., & Akamatsu, T. J. (1981). Validation of a Q-sort task to assess MMPI skills. Journal of Clinical Psychology, 37, 831–836.
Brammer, R. (2002). Effects of experience and training on diagnostic accuracy. Psychological Assessment, 14, 110–113.
Brehmer, B. (1980). In one word: Not from experience. Acta Psychologica, 45, 223–241.
Brenner, D., & Howard, K. I. (1976). Clinical judgment as a function of experience and information. Journal of Clinical Psychology, 32, 721–728.
Butcher, J. N., Dahlstrom, W. G., Graham, J. R., Tellegen, A., & Kaemmer, B. (1989). Minnesota Multiphasic Personality Inventory-2 (MMPI-2): Manual for administration and scoring. Minneapolis: University of Minnesota Press.
Butcher, J. N., Perry, J. N., & Atlis, M. M. (2000). Validity and utility of computer-based test interpretation. Psychological Assessment, 12, 6–18.
Carlin, A. S., & Hewitt, P. L. (1990). The discrimination of patient generated and randomly generated MMPIs. Journal of Personality Assessment, 54, 24–29.
Chandler, M. J. (1970). Self-awareness and its relation to other parameters of the clinical inference process. Journal of Consulting and Clinical Psychology, 35, 258–264.
Chapman, L. J., & Chapman, J. P. (1967). Genesis of popular but erroneous psychodiagnostic observations. Journal of Abnormal Psychology, 72, 193–204.
Chapman, L. J., & Chapman, J. P. (1969). Illusory correlation as an obstacle to the use of valid psychodiagnostic signs. Journal of Abnormal Psychology, 74, 271–280.
Cressen, R. (1975). Artistic quality of drawings and judges’ evaluations of the DAP. Journal of Personality Assessment, 39, 132– 137.
Danet, B. N. (1965). Prediction of mental illness in college students on the basis of “nonpsychiatric” MMPI profiles. Journal of Consulting Psychology, 29, 577–580.
Dawes, R. M. (1994). House of cards: Psychology and psychotherapy built on myth. New York: Free Press.
Dawes, R. M., Faust, D., & Meehl, P. E. (1989). Clinical versus actuarial judgment. Science, 243, 1668–1674.
Dawes, R. M., Faust, D., & Meehl, P. E. (1993). Statistical prediction versus clinical prediction: Improving what works. In G. Keren & C. Lewis (Eds.), A handbook for data analysis in the behavioral sciences: Methodological issues (pp. 351–367). Hillsdale, NJ: Erlbaum.
Dowling, J. F., & Graham, J. R. (1976). Illusory correlation and the MMPI. Journal of Personality Assessment, 40, 531–538.
Einhorn, H. J. (1988). Diagnosis and causality in clinical and statistical prediction. In D. C. Turk & P. Salovey (Eds.), Reasoning, inference, and judgment in clinical psychology (pp. 51–70). New York: Free Press.
Ekman, P., O’Sullivan, M., & Frank, M. G. (1999). A few can catch a liar. Psychological Science, 10, 263–266.
Exner, J. E., Jr. (1974). The Rorschach: A comprehensive system (Vol. 1). New York: Wiley.
Exner, J. E., Jr. (1993). The Rorschach: A comprehensive system, Vol. 1: Basic foundations (3rd ed.). New York: Wiley.
Exner, J. E. (2001). A comment on The misperception of psychopathology: Problems with the norms of the Comprehensive System for the Rorschach. Clinical Psychology: Science and Practice, 8, 386–388.
Fairman, K. A., Drevets, W. C., Kreisman, J. J., & Teitelbaum, F. (1998). Course of antidepressant treatment, drug type, and prescriber’s specialty. Psychiatric Services, 49, 1180–1186.
Falvey, J. E., & Hebert, D. J. (1992). Psychometric study of the Clinical Treatment Planning Simulations (CTPS) for assessing clinical judgment. Journal of Mental Health Counseling, 14, 490– 507.
Faust, D., Guilmette, T. J., Hart, K., Arkes, H. R., Fishburne, F. J., & Davey, L. (1988). Neuropsychologists’ training, experience, and judgment accuracy. Archives of Clinical Neuropsychology, 3, 145–163.
Fischhoff, B. (1975). Hindsight foresight: The effect of outcome knowledgeonjudgmentunderuncertainty. JournalofExperimentalPsychology:HumanPerceptionandPerformance,1,288–299.
Gadol, I. (1969). The incremental and predictive validity of the Rorschach test in personality assessments of normal, neurotic, and psychotic subjects. Dissertation Abstracts, 29, 3482-B. (UMI No. 69-4469)
Garb, H. N. (1989). Clinical judgment, clinical training, and professional experience. Psychological Bulletin, 105, 387–396.
Garb, H. N. (1994). Toward a second generation of statistical prediction rules in psychodiagnosis and personality assessment. Computers in Human Behavior, 10, 377–394.
Garb, H. N. (1997). Race bias, social class bias, and gender bias in clinical judgment. Clinical Psychology: Science and Practice, 4, 99–120.
Garb, H. N. (1998). Studying the clinician: Judgment research and psychological assessment. Washington, DC: American Psychological Association.
Garb, H. N. (2000a). Computers will become increasingly important for psychological assessment: Not that there’s anything wrong with that! Psychological Assessment, 12, 31–39.
Garb, H. N. (2000b). Introduction to the Special Section on the use of computers for making judgments and decisions. Psychological Assessment, 12, 3–5.
Garb, H. N., & Boyle, P. (in press). Understanding why some clinicians use pseudoscientific methods: Findings from research on clinical judgment. In S. O. Lilienfeld, J. M. Lohr, & S. J. Lynn (Eds.), Science and pseudoscience in contemporary clinical psychology. New York: Guilford Press.
Garb, H. N., & Schramke, C. J. (1996). Judgment research and neuropsychological assessment: A narrative review and metaanalyses. Psychological Bulletin, 120, 140–153.
Gardner, W., Lidz, C. W., Mulvey, E. P., & Shaw, E. C. (1996). Clinical versus actuarial predictions of violence by patients with mental illnesses. Journal of Consulting and Clinical Psychology, 64, 602–609.
Garner, A. M., & Smith, G. M. (1976). An experimental videotape technique for evaluating trainee approaches to clinical judging. Journal of Consulting and Clinical Psychology, 44, 945–950.
Gaudette, M. D. (1992). Clinical decision making in neuropsychology: Bootstrapping the neuropsychologist utilizing Brunswik’s lens model (Doctoral dissertation, Indiana University of Pennsylvania, 1992). Dissertation Abstracts International, 53,
Goldberg, L. R. (1959). The effectiveness of clinicians’ judgments: The diagnosis of organic brain damage from the Bender-Gestalt test. Journal of Consulting Psychology, 23, 25–33.
Goldberg, L. R. (1965). Diagnosticians versus diagnostic signs: The diagnosis of psychosis versus neurosis from the MMPI. Psychological Monographs, 79(9, Whole No. 602).
Goldberg, L. R. (1968). Simple models or simple processes? Some research on clinical judgments. American Psychologist, 23, 483– 496.
Goldberg, L. R. (1969). The search for configural relationships in personality assessment: The diagnosis of psychosis versus neurosis from the MMPI. Multivariate Behavioral Research, 4, 523–536.
Goldberg, L. R. (1970). Man versus model of man. Psychological Bulletin, 73, 422–432.
Goldberg, L. R. (1991). Human mind versus regression equation: Five contrasts. In W. M. Grove & D. Cicchetti (Eds.), Thinking clearly about psychology: Vol. 1. Matters of public interest (pp. 173–184). Minneapolis: University of Minnesota Press.
Golding, S. L., & Rorer, L. G. (1972). Illusory correlation and subjective judgment. Journal of Abnormal Psychology, 80, 249– 260.
Goldstein, R. B., Black, D. W., Nasrallah, M. A., & Winokur, G. (1991). The prediction of suicide. Archives of General Psychiatry, 48, 418–422.
Goldstein, S. G., Deysach, R. E., & Kleinknecht, R. A. (1973). Effect of experience and amount of information on identification of cerebral impairment. Journal of Consulting and Clinical Psychology, 41, 30–34.
Graham, J. R. (1967). A Q-sort study of the accuracy of clinical descriptions based on the MMPI. Journal of Psychiatric Research, 5, 297–305.
Graham, J. R. (1971). Feedback and accuracy of clinical judgments from the MMPI. Journal of Consulting and Clinical Psychology, 36, 286–291.
Graham, J. R. (2000). MMPI-2: Assessing personality and psychopathology (3rd ed). New York: Oxford University Press.
Grebstein, L. (1963). Relative accuracy of actuarial prediction, experienced clinicians, and graduate students in a clinical judgment task. Journal of Consulting Psychology, 37, 127–132.
Grigg, A. E. (1958). Experience of clinicians, and speech characteristics and statements of clients as variables in clinical judgment. Journal of Consulting Psychology, 22, 315–319.
Griswold, P. M., & Dana, R. H. (1970). Feedback and experience effects on psychological reports and predictions of behavior. Journal of Clinical Psychology, 26, 439–442.
Groth-Marnat, G., & Roberts, L. (1998). Human Figure Drawings and House Tree Person drawings as indicators of self-esteem: A quantitative approach. Journal of Clinical Psychology, 54, 219–222.
Grove, W. M., Zald, D. H., Lebow, B. S., Snitz, B. E., & Nelson, C. (2000). Clinical versus mechanical prediction: A meta-analysis. Psychological Assessment, 12, 19–30.
Grove, W. M., & Meehl, P. E. (1996). Comparative efficiency of informal (subjective, impressionistic) and formal (mechanical, algorithmic) prediction procedures: The clinical-statistical controversy. Psychology, Public Policy, and Law, 2, 293–323.
Hammond, K. R. (1955). Probabilistic functioning and the clinical method. Psychological Review, 62, 255–262.
Hammond, K. R., Hursch, C. J., & Todd, F. J. (1964). Analyzing the components of clinical inference. Psychological Review, 71, 438–456.
Hathaway, S. R., & McKinley, J. C. (1942). The Minnesota Multiphasic Personality Inventory. Minneapolis: University of Minnesota Press.
Haverkamp, B. E. (1993). Confirmatory bias in hypothesis testing for client-identified and counselor self-generated hypotheses. Journal of Counseling Psychology, 40, 303–315.
Hawkins, S. A., & Hastie, R. (1990). Hindsight: Biased judgments of past events after the outcomes are known. Psychological Bulletin, 107, 311–327.
Heaton, R. K., Smith, H. H., Jr., Lehman, R. A. W., & Vogt, A. T. (1978). Prospects for faking believable deficits on neuropsychological testing. Journal of Consulting and Clinical Psychology, 46, 892–900.
Hermann, R. C., Ettner, S. L., Dorwart, R. A., Langman-Dorwart, , & Kleinman, S. (1999). Diagnoses of patients treated with ECT: A comparison of evidence-based standards with reported use. Psychiatric Services, 50, 1059–1065.
Hiler, E. W., & Nesvig, D. (1965). An evaluation of criteria used by clinicians to infer pathology from figure drawings. Journal of Consulting Psychology, 29, 520–529.
Holmes, C. B., & Howard, M. E. (1980). Recognition of suicide lethality factors by physicians, mental health professionals, ministers, and college students. Journal of Consulting and Clinical Psychology, 48, 383–387.
Honaker, L. M., & Fowler, R. D. (1990). Computer-assisted psychological assessment. In G. Goldstein & M. Hersen (Eds.), Handbook of psychological assessment (2nd ed., pp. 521–546). New York: Pergamon Press.
Horowitz, M. J. (1962). A study of clinicians’ judgments from projective test protocols. Journal of Consulting Psychology, 26, 251–256.
Hunsley, J., & Di Giulio, G. (2001). Norms, norming, and clinical assessment. Clinical Psychology: Science and Practice, 8, 378– 382.
Janzen, W. B., & Coe, W. C. (1975). Clinical and sign prediction: The Draw-A-Person and female homosexuality. Journal of Clinical Psychology, 31, 757–765.
Johnston, R., & McNeal, B. F. (1967). Statistical versus clinical prediction: Length of neuropsychiatric hospital stay. Journal of Abnormal Psychology, 72, 335–340.
Joiner, T. E., & Schmidt, K. L. (1997). Drawing conclusions–or not– from drawings. Journal of Personality Assessment, 69, 476–481.
Kahill, S. (1984). Human figure drawing in adults: An update of the empirical evidence, 1967–1982. Canadian Psychology, 25, 269– 292.
Kahneman, D., Slovic, P., & Tversky, A. (Eds.). (1982). Judgment under uncertainty: Heuristics and biases. New York: Cambridge University Press.
Karson, S., & Freud, S. L. (1956). Predicting diagnoses with the MMPI. Journal of Clinical Psychology, 12, 376–379.
Kayne, N. T., &Alloy, L. B. (1988). Clinician and patient as aberrant actuaries: Expectation-based distortions in assessment of covariation. In L.Y.Abramson (Ed.), Social cognition and clinical psychology: A synthesis (pp. 295–365). New York: Guilford Press.
Kendell, R. E. (1973). Psychiatric diagnoses: A study of how they are made. British Journal of Psychiatry, 122, 437–445.
Kleinmuntz, B. (1967). Sign and seer: Another example. Journal of Abnormal Psychology, 72, 163–165.
Kleinmuntz, B. (1990). Why we still use our heads instead of formulas: Toward an integrative approach. Psychological Bulletin, 107, 296–310.
Kurtz, R. M., & Garfield, S. L. (1978). Illusory correlation: Afurther exploration of Chapman’s paradigm. Journal of Consulting and Clinical Psychology, 46, 1009–1015.
Lambert, L. E., & Wertheimer, M. (1988). Is diagnostic ability related to relevant training and experience? Professional Psychology: Research and Practice, 19, 50–52.
Lanyon, R. I. (1987). The validity of computer-based personality assessment products: Recommendations for the future. Computers in Human Behavior, 3, 225–238.
Lee, D. Y., Barak, A., Uhlemann, M. R., & Patsula, P. (1995). Effects of preinterview suggestion on counselor memory, clinical impression, and confidence in judgments. Journal of Clinical Psychology, 51, 666–675.
Leli, D. A., & Filskov, S. B. (1981). Clinical-actuarial detection and description of brain impairment with the W-B Form I. Journal of Clinical Psychology, 37, 623–629.
Leli, D. A., & Filskov, S. B. (1984). Clinical detection of intellectual deterioration associated with brain damage. Journal of Clinical Psychology, 40, 1435–1441.
Levenberg, S. B. (1975). Professional training, psychodiagnostic skill, and Kinetic Family Drawings. Journal of Personality Assessment, 39, 389–393.
Lidz, C. W., Mulvey, E. P., & Gardner, W. (1993). The accuracy of predictions of violence to others. Journal of the American Medical Association, 269, 1007–1011.
Lilienfeld, S. O., Wood, J. M., & Garb, H. N. (2000). The scientific status of projective techniques. Psychological Science in the Public Interest, 1, 27–66.
Lilienfeld, S. O., Wood, J. M., & Garb, H. N. (2001, May). What’s wrong with this picture? Scientific American, 80–87.
Lindzey, G. (1965). Seer versus sign. Journal of Experimental Research in Personality, 1, 17–26.
Loftus, E. F. (1993). The reality of repressed memories. American Psychologist, 48, 518–537.
Logue, M. B., Sher, K. J., & Frensch, P. A. (1992). Purported characteristics of adult children of alcoholics: A possible “Barnum Effect.” Professional Psychology: Research and Practice, 23, 226–232.
Lueger, R. J., & Petzel, T. P. (1979). Illusory correlation in clinical judgment: Effects of amount of information to be processed. Journal of Consulting and Clinical Psychology, 47, 1120–1121.
Luft, J. (1950). Implicit hypotheses and clinical predictions. Journal of Abnormal and Social Psychology, 45, 756–760.
Marchese, M. C. (1992). Clinical versus actuarial prediction: A review of the literature. Perceptual and Motor Skills, 75, 583–594.
Marshall, D. B., & English, D. J. (2000). Neural network modeling of risk assessment in child protective services. Psychological Methods, 5, 102–124.
Matarazzo, J. D. (1986). Computerized clinical psychological test interpretations: Unvalidated plus all mean and no sigma. American Psychologist, 41, 14–24.
McFall, R. M., & Treat, T. A. (1999). Quantifying the information value of clinical assessments with signal detection theory. Annual Review of Psychology, 50, 215–241.
Meehl, P. E. (1954). Clinical versus statistical prediction: A theoretical analysis and a review of the evidence. Minneapolis: University of Minnesota Press.
Meehl, P. E. (1959). A comparison of clinicians with five statistical methods of identifying psychotic MMPI profiles. Journal of Counseling Psychology, 6, 102–109.
Meehl, P. E. (1986). Causes and effects of my disturbing little book. Journal of Personality Assessment, 50, 370–375.
Meehl, P. E. (1997). Credentialed persons, credentialed knowledge. Clinical Psychology: Science and Practice, 4, 91–98.
Meyer, G. J. (2001). Evidence to correct misperceptions about Rorschach norms. Clinical Psychology: Science and Practice, 8, 389–396.
Monahan, J., Steadman, H. J., Appelbaum, P. S., Robbins, P. C., Mulvey, E. P., Silver, E., Roth, L. H., & Grisso, T. (2000). Developing a clinically useful actuarial tool for assessing violence risk. British Journal of Psychiatry, 176, 312–319.
Mossman, D. (1994). Assessing predictions of violence: Being accurate about accuracy. Journal of Consulting and Clinical Psychology, 62, 783–792.
Motta, R. W., Little, S. G., & Tobin, M. I. (1993). The use and abuse of human figure drawings. School Psychology Quarterly, 8, 162– 169.
Mowrey, J. D., Doherty, M. E., & Keeley, S. M. (1979). The influence of negation and task complexity on illusory correlation. Journal of Abnormal Psychology, 88, 334–337.
Muller, M. J., & Davids, E. (1999). Relationship of psychiatric experience and interrater reliability in assessment of negative symptoms. Journal of Nervous and Mental Diseases, 187, 316–318.
Murdock, N. L. (1988). Category-based effects in clinical judgment. Counseling Psychology Quarterly, 1, 341–355.
Nadler, E. B., Fink, S. L., Shontz, F. C., & Brink, R. W. (1959). Objective scoring vs. clinical evaluation of the Bender-Gestalt. Journal of Clinical Psychology, 15, 39–41.
Naglieri, J. A., McNeish, T. J., & Bardos, A. N. (1991). Draw-APerson: Screening procedure for emotional disturbance. Austin, TX: ProEd.
O’Brien, W. H. (1995). Inaccuracies in the estimation of functional relationships using self-monitoring data. Journal of Behavior Therapy and Experimental Psychiatry, 26, 351–357.
Ofshe, R., & Watters, E. (1994). Making monsters: False memories, psychotherapy, and sexual hysteria. New York: Scribner’s.
Oskamp, S. (1962). The relationship of clinical experience and training methods to several criteria of clinical prediction. Psychological Monographs, 76(28, Whole No. 547).
Oskamp, S. (1965). Overconfidence in case-study judgments. Journal of Consulting Psychology, 29, 261–265.
Price, R. K., Spitznagel, E. L., Downey, T. J., Meyer, D. J., & Risk, N. K. (2000). Applying artificial neural network models to clinical decision making. Psychological Assessment, 12, 40–51.
Rice, M. E., & Harris, G. T. (1995). Violent recidivism: Assessing predictive validity. Journal of Consulting and Clinical Psychology, 63, 737–748.
Robiner, W. N. (1978). An analysis of some of the variables influencing clinical use of the Bender-Gestalt. Unpublished manuscript.
Rosen, G. M. (1975). On the persistence of illusory correlations with the Rorschach. Journal of Abnormal Psychology, 84, 571– 573.
Rosen, G. M. (1976). “Associative homogeneity” may affect the persistence of illusory correlations but does not account for their occurrence. Journal of Abnormal Psychology, 85,
Schaeffer, R. W. (1964). Clinical psychologists’ ability to use the Draw-A-Person Test as an indicator of personality adjustment. Journal of Consulting Psychology, 28,
Schinka, J. A., & Sines, J. O. (1974). Correlates of accuracy in personality assessment. Journal of Clinical Psychology, 30, 374– 377.
Schlundt, D. G., & Bell, C. (1987). Behavioral assessment of eating patterns and blood glucose in diabetes using the self-monitoring analysis system. Behavior Research Methods, Instruments, and Computers, 19, 215–223.
Schmidt, L. D., & McGowan, J. F. (1959). The differentiation of human figure drawings. Journal of Consulting Psychology, 23, 129–133.
Shiffman, S. (1993). Assessing smoking patterns and motives. Journal of Consulting and Clinical Psychology, 61, 732–742.
Silverman, L. H. (1959). A Q-sort study of the validity of evaluations made from projective techniques. Psychological Monographs, 73(7, Whole No. 477).
Slovic, P., & Fischhoff, B. (1977). On the psychology of experimental surprises. Journal of Experimental Psychology: Human Perception and Performance, 3, 544–551.
Snyder, D. K. (2000). Computer-assisted judgment: Defining strengths and liabilities. PsychologicalAssessment, 12, 52–60.
Snyder, D. K., Widiger, T. A., & Hoover, D. W. (1990). Methodological considerations in validating computer-based test interpretations: Controlling for response bias. Psychological Assessment, 2, 470–477.
Soskin, W. F. (1954). Bias in postdiction from projective tests. Journal of Abnormal and Social Psychology, 49, 69–74.
Starr, B. J., & Katkin, E. S. (1969). The clinician as an aberrant actuary: Illusory correlation and the Incomplete Sentences Blank. Journal of Abnormal Psychology, 74, 670–675.
Stelmachers, Z. T., & McHugh, R. B. (1964). Contribution of stereotyped and individualized information to predictive accuracy. Journal of Consulting Psychology, 28, 234–242.
Stricker, G. (1967). Actuarial, naive clinical, and sophisticated clinical prediction of pathology from figure drawings. Journal of Consulting Psychology, 31, 492–494.
Strohmer, D. C., Shivy, V. A., & Chiodo, A. L. (1990). Information processing strategies in counselor hypothesis testing: The role of selective memory and expectancy. Journal of Counseling Psychology, 37, 465–472.
Swensen, C. H. (1957). Empirical evaluations of human figure drawings. Psychological Bulletin, 54, 431–466.
Thomas, G. V., & Jolley, R. P. (1998). Drawing conclusions: A reexamination of empirical and conceptual bases for psychological evaluation of children from their drawings. British Journal of Clinical Psychology, 37, 127–139.
Trull, T. J., & Phares, E. J. (2001). Clinical Psychology (6th ed.). Belmont, CA: Wadsworth.
Turner, D. R. (1966). Predictive efficiency as a function of amount of information and level of professional experience. Journal of Projective Techniques and Personality Assessment, 30, 4–11.
Vanderploeg, R. D., Sison, G. F. P., Jr., & Hickling, E. J. (1987). A reevaluation of the use of the MMPI in the assessment of combat-related Posttraumatic Stress Disorder. Journal of Personality Assessment, 51, 140–150.
Walker, C. D., & Linden, J. D. (1967). Varying degrees of psychological sophistication in the interpretation of sentence completion data. Journal of Clinical Psychology, 23, 229–231.
Walker, E., & Lewine, R. J. (1990). Prediction of adult-onset schizophrenia from childhood home movies of the patients. American Journal of Psychiatry, 147, 1052–1056.
Waller, N. G., & Meehl, P. E. (1998). Multivariate taxometric procedures. Thousand Oaks, CA: Sage.
Waller, R. W., & Keeley, S. M. (1978). Effects of explanation and information feedback on the illusory correlation phenomenon. Journal of Consulting and Clinical Psychology, 46, 342–343.
Walters, G. D., White, T. W., & Greene, R. L. (1988). Use of the MMPI to identify malingering and exaggeration of psychiatric symptomatology in male prison inmates. Journal of Consulting and Clinical Psychology, 56, 111–117.
Wanderer, Z. W. (1969). Validity of clinical judgments based on human figure drawings. Journal of Consulting and Clinical Psychology, 33, 143–150.
Watson, C. G. (1967). Relationship of distortion to DAP diagnostic accuracy among psychologists at three levels of sophistication. Journal of Consulting Psychology, 31, 142–146.
Waxer, P. (1976). Nonverbal cues for depth of depression: Set versus no set. Journal of Consulting and Clinical Psychology, 44,
Wedding, D. (1983). Clinical and statistical prediction in neuropsychology. Clinical Neuropsychology, 5, 49–55.
Weiss, J. H. (1963). The effect of professional training and amount and accuracy of information on behavioral prediction. Journal of Consulting Psychology, 27, 257–262.
Whitehead, W. C. (1985). Clinical decision making on the basis of Rorschach, MMPI, and automated MMPI report data (Doctoral dissertation, University of Texas at Southwestern Medical Center at Dallas, 1985). Dissertation Abstracts International, 46(08), 2828B.
Widiger, T. A. (2001). The best and the worst of us? Clinical Psychology: Science and Practice, 8, 374–377.
Wiggins, J. S. (1973). Personality and prediction: Principles of personality assessment. Reading, MA: Addison-Wesley.
Wiggins, J. S. (1981). Clinical and statistical prediction: Where are we and where do we go from here? Clinical Psychology Review, 1, 3–18.
Wood, J. M., Nezworski, M. T., Garb, H. N., & Lilienfeld, S. O. (2001a). Problems with the norms of the Comprehensive System for the Rorschach: Methodological and conceptual considerations. Clinical Psychology: Science and Practice, 8, 397–402.
Wood, J. M., Nezworski, M. T., Garb, H. N., & Lilienfeld, S. O. (2001b). The misperception of psychopathology: Problems with the norms of the Comprehensive System for the Rorschach. Clinical Psychology: Science and Practice, 8, 350–373.
Wood, J. M. Garb, H. N., Lilienfeld, S. O., & Nezworski, M. T. (2002). Clinical assessment. Annual Review of Psychology, 53, 519–543.