Clinical Versus Actuarial Prediction Research Paper

Academic Writing Service

Sample Clinical Versus Actuarial Prediction Research Paper. Browse other research paper examples and check the list of research paper topics for more inspiration. iResearchNet offers academic assignment help for students all over the world: writing from scratch, editing, proofreading, problem solving, from essays to dissertations, from humanities to STEM. We offer full confidentiality, safe payment, originality, and money-back guarantee. Secure your academic success with our risk-free services.

Paul Meehl’s book Clinical Versus Statistical Pre-diction: A Theoretical Analysis and a Review of the Evidence (Meehl 1954) concluded that the prediction of numerical criterion variables of psychological interest (e.g., faculty ratings of graduate students who had just obtained a Ph.D.) from numerical predictor variables (e.g., scores on the Graduate Record Examination, grade point averages, ratings of letters or recommendation) is better done by a proper linear model than by the clinical intuition of people presumably skilled in such prediction. The point of this research paper is to review summaries and conclusions sub-sequent to Meehl’s original one and to present evidence that even what can be termed ‘improper’ linear models (Dawes 1979) often yield predictions superior to human intuition.

Academic Writing, Editing, Proofreading, And Problem Solving Services

Get 10% OFF with 25START discount code


1. Type Of Statistical Models

A proper linear model is one in which the weights given to the predictor variables are chosen in such a way as to optimize the relationship between the prediction and the criterion. Simple regression analysis, where the predictor variables are weighted in order to maximize the correlation between the subsequent weighted composite and the actual criterion, is the most common example. Discriminant function analysis is another example; weights are given to the predictor variables in such a way that the resulting linear composites maximize the discrepancy between two or more groups. Ridge regression analysis, another example, attempts to assign weights in such a way that the linear composites correlate maximally with the criterion of interest in a new set of data.

2. Review Of The Empirical Findings

Meehl was concerned primarily with the statistical vs. clinical methods for integrating information; thus, he compared instances in which both types of prediction had been made on the basis of exactly the same data. (He also insisted that the accuracy of the statistical model should not be checked on the same data on which it was derived—or that the sample size be so large that it will not appear superior owing to capitalizing on chance fluctuations.) Twelve years later, Jack Sawyer (1966) published a review of about 45 studies; again, in none was clinical prediction superior. Unlike Meehl, Sawyer also included studies in which the clinician had access to more information than that used in a statistical model—for example, interviews of people about whom the predictions were made, or interviews by experts who had access to the statistical model information prior to the interview. Such interviews did not improve the clinical pre-dictions. In fact, the predictions were better when the opinions of the interviewers were ignored.




A prototypical study by Carroll et al. (1988) supports Sawyer’s conclusion. A Pennsylvania parole board considered about 25 percent of the 743 parolees to be failures within one year of being released, for reasons such as being recommitted to prison, absconding, being apprehended on a criminal charge, or committing a technical parole violation. A parole board interviewer’s ratings had predicted none of these outcomes; the largest correlation was only 0.06. In contrast, a three-variable model based on the type of offense that had led to imprisonment, the number of past convictions, and the number of noncriminal violations of prison rules did have a modest predictability, correlating about 0.22, a result consistent with earlier findings that actuarial predictions based on prior record predict with a correlation of about 0.30 across a large number of settings. When parolees were convicted of new offenses, the seriousness of their crimes was correlated 0.27 with the interviewers’ ratings of assaultive potential, but a simple dichotomous evaluation of past heroin use correlated 0.46. All parole board interviewers had access to all this statistical information, but did worse.

None of these correlations is particularly high; first, the sample is highly select, being limited to those who have been convicted of a crime; second, not all the parolees who committed crimes were caught, and third, these types of behaviors are not as predictable as we believe they are or would like them to be. The difference in the effectiveness of actuarial vs. clinical prediction is, however, clear.

Moreover, this difference is consistent with comparisons of actuarial vs. clinical methods for predicting violence (see, e.g., Werner et al. 1984, Monahan 1997). An important qualification: the best prediction in general is that neither violence nor criminal behavior will be repeated. Although the general ‘base rate’ prediction is that people will not repeat problems, judges—professional and nonprofessional alike— have a bias to believe that repetition is common. The studies show that judgments about who is more likely than whom to repeat are much better made on an actuarial than on a clinical basis.

After his book had been out about 30 years, Meehl (1986) was able to conclude. ‘There is no controversy in social science which shows such a large body of qualitatively diverse studies coming out so uniformly in the same direction as this one.’ Since that time, even more evidence has been accumulating in favor of Meehl’s generalization and practical conclusion; 110 studies that Dawes et al. (1989) reviewed favored it. Subsequently, Grove and Meehl (1996) published a meta-analysis involving even more studies. Their conclusion (Grove and Meehl (1996 p. 293) was, ‘The clinical method relies on human judgment that is based on information contemplation and, sometimes, discussion with others (e.g., case conferences). The mechanical method involves a formal, algorithmic, objective procedure (e.g., equation) to reach the decision. Empirical comparison of the accuracy of the two methods (136 studies over a wide range of predictors) shows that the mechanical method is almost invariably equal to or superior to the clinical method.’

There are four logical relationships possible between the information set on which a clinician or expert makes a prediction and information set on which a formal model (e.g., equation) is based. These sets may be identical, which was the original requirement for a comparison in Meehl’s 1954 book. Or one information set may be a subset of the other; in the studies reviewed by Sawyer and in almost all subsequent studies having this structure, the set on which the model is based is the subset of the set available through the clinician (who, for example, is allowed to supplement a sparse information set with an interview). In the field of psychology, the model prediction has always been superior. There are, however, exceptions in the field of medicine, in the situations where the clinical physician has access to more information than is used by the model (not in situations where the inputs are identical). Even in these, however, the models may be modified on the basis of interviewing the clinicians themselves to ‘distill’ what they are responding to, so that once again the statistical prediction becomes superior. For example, a predictive system termed APACHE-II did not predict as well as physicians who would survive in emergency wards, yet the modified model termed APACHE-III did in fact predict better than physicians who would survive the first 24 hours (Knaus et al. 1991). Another possible relationship is that the in-formation sets are overlapping, which has not been studied much.

3. Explanation For The Empirical Findings

Why the consistent results? The answer involves an understanding of which factors are favorable to the linear model and which disfavor the intuitive pre-diction. Whatever additional factors there may be that disfavor the model or favor the clinician are out-weighed by the former types of factors in almost all contexts.

First, consider the statistical prediction. Each in-stance to be predicted is characterized in terms of its aspects that allow its location in a category or along a dimension. These categories and locations have been found in the past to be in general predictive. Aspects that have no such predictive power are automatically ignored. The model involves the weighting of the predictive aspects. Moreover, these aspects often are diverse, e.g., an undergraduate grade point average, a number of prison violations, an instance of violence that led to hospitalization (even a rating routinely made on the first day of hospitalization). The statistical model automatically makes the predictive variables comparable, by assigning weights to integrate them.

Now, consider the drawbacks of attempting to make an intuitive integration of the information. Suppose even that we are not interested in making an optimal prediction, just in integrating information from di-verse and incomparable dimensions. Suppose, for example, we were deciding between two jobs where the important considerations are pay and enjoyment of the activities involved. It may be very easy, knowing our preferences for different types of activities, to judge which job will be more enjoyable. But now job A pays more than job B, but is less enjoyable. Which should be chosen? We must weight the two dimensions—at least implicitly—if we are to make a choice. What psychologists have found (e.g., Svenson 1992, Langer 1994) is that in such conflicting dimensions situations people generally search for reasons to dismiss one or another of the dimensions as ‘not really that important.’ Then, there is no conflict between dimensions, and all that must be done is to make an ordinal judgment, again of the form ‘more is better.’

Now, consider the additional complications of trying to decide on an intuitive basis which of two instances is more predictive of something. While we might at least have some insight—explicit or implicit—into how we assess differences and how much to care about them, we often have less insight into how to assess differences and to weight them in order to predict. (Feedback observing all outcomes without being affected by our judgment—e.g., of success or failure—is necessary, but not sufficient; see Einhorn and Hogarth (1978).)

As noted, statistical integration has the advantage that weighting obviates the problem of conflicting dimensions. The question then arises of whether the weighting system need be optimal in order to maintain the advantage. The answer—based on mathematical considerations, simulations, and empirical investigation—is no. The robust result is that so long as the dimensions are weighted in the correct direction, ad hoc weighting (e.g., ‘intuitive weighting,’ unit weighting, or even weights chosen according to some random sampling scheme) yields results that are close enough to those provided by optimal weighting that the resulting linear composites still outperform clinical intuition, especially when the predictive variables tend to be positively correlated with each other. Composites based on such nonoptimal weights have been termed ‘improper linear models’ (Dawes 1979). In fact, not only may such models yield predictions similar enough to optimal models that they outperform clinical judgment, but they may even outperform optimal models on cross-validation, because they are not subject to the ‘over-fitting’ problem that plagues many optimal models. For example, the ‘robustness’ of unit-weighted models on cross-validation has been noted as far back as the 1930s by Wilks (1938). The mathematical rational for this robustness of unit weighting has been provided by Wainer (1976) and Wainer and Thissen (1976).

In fact, improper models such as unit-weighted models may be particularly advantageous in situations where models developed in one context are to be applied in a slightly different context—which many of us (Dawes 1997) believe to be the norm in social science, as opposed to applying a model developed on one sample of a particular population to another sample drawn from this exact same population. (‘Cross-validation’ actually refers to such a subsequent application to a sample from the exact same population; a far more descriptive term would be simply ‘validation,’ where what is commonly termed a ‘validation’ sample should be termed a ‘development’ sample. The point here is that when we move across contexts that vary to some—perhaps known perhaps unknown—degree, even the estimate based on what is standardly termed ‘cross-validation’ may be overly optimistic.)

An empirical overview of the success of improper models in general is provided by Dawes (1979). Perhaps the most famous example of an improper model is that provided by Goldberg (1965) to predict a diagnosis of psychosis vs. neurosis by using MMPI profiles. The unit-weighted composite obtained by adding together three scaled scores indicating psychosis (L, Pa, Sc) and subtracting two indicating neurosis (Hy, Pt) not only ‘outperformed all diagnosticians’ (Goldberg 1965, p. 24) but was more stable across subsamples than were nonlinear complex scores—and it was not for want of trying enough of the latter.

Later, Dawes and Corrigan (1974) demonstrated that improper weights (in fact two weighting systems chosen on random bases except for the direction of the weights) not only outperformed clinical judgment in the MMPI diagnosis problem and several others, but did as well as the linear composites based on the diagnostic experts’ judgments. Unit weighting was even better.

4. Conclusions And Recommendations

Thus, ‘the whole trick is to know what variables to look at and then to know how to add’ (Dawes and Corrigan 1974, p. 105). Of course, there are some contexts where configural, multiplicitive, or even more complicated models are more appropriate than are simple additive models; such models may, for example, be found in the area of predator–prey population dynamics. But the decision making discussed here involves a prediction of important human outcomes where—although the human experience may be complex—what can be best distilled from it are simple predictive variables where more (or less) is better (e.g., test scores, indicators of past performance, past criminal or psychiatric record). Not only do we find such simple monotone relationships in our studies, but we search for such relationships and tend to code our social world in terms of variables capturing monotone relationships to what is important to us. (Occasionally, a predictor variable has a single-peaked relationship to the criterion, as when moderate aggression is more desirable in a business person; such a variable is easily transformed into a monotone variable by evaluating distance from the ideal.) When there are strong scientific reasons for hypothesizing a complex model, naturally they should not be ignored. In the absence of such theory or evidence, however, the claim that it is possible to construct a valid nonadditive model ‘in our head’—particularly one representing some sort of valid, ineffable intuition—is pure hubris.

In contrast, an understanding of the research reviewed in this paper leads to ‘an awareness of the modest results that are often achieved by even the best methods, [an awareness which] can help to counter unrealistic faith in our predictive powers and our understanding of human behavior. It may well be worth exchanging inflated beliefs for an unsettling sobriety, if the result is openness to new approaches and variables that ultimately increase our explanatory and predictive powers’ (Dawes et al. 1989, p. 1673).

For the most recent survey and analysis, see Swets et al. (2000).

Bibliography:

  1. Carroll J S, Werner R L, Coates D, Galegher J, Alibrio J J 1988 Evaluation, diagnosis, and prediction in parole decision making. Law and Society Review 17: 199–228
  2. Dawes R M 1979 The robust beauty of improper linear models in decision making. American Psychologist 34: 571–82
  3. Dawes R M 1997 Qualitative consistency masquerading as quantitative fit. In: Dall Chiara M L, Doets K, Mundici D (eds.) Structures and Norms in Science. Kluwer, Dordrecht, The Netherlands, pp. 387–94
  4. Dawes R M, Corrigan B 1974 Linear models in decision making. Psychological Bulletin 81: 95–106
  5. Dawes R M, Faust D, Meehl P E 1989 Clinical versus actuarial judgment. Science 243: 1668–74
  6. Einhorn H J, Hogarth R M 1978 Confidence in judgment: persistence in the illusion of validity. Psychology Review 85: 395–416
  7. Goldberg L R 1965 Diagnosticians vs. diagnostic signs: the diagnosis vs. neurosis from the MMPI. Psychological Mono-graphs 79: 1–28
  8. Grove W M, Meehl P E 1996 Comparative efficiency of informal (subjective, impressionistic) and formal (mechanical, algorithmic) prediction procedures: the clinical–statistical controversy. Psychology, Public Policy, and Law 2: 293–323
  9. Knaus W A, Wagner D P, Lynn J 1991 Short-term mortality predictions for critically ill hospitalized adults: science and ethics. Science 254: 389–94
  10. Langer E 1994 The illusion of calculated decisions. In: Schank R, Langer E (eds.) Beliefs, Reasoning and Decision Making: Psycho-Logic in Honor of Bob Abelson. L. Erlbaum, Hillsdale, NJ, pp. 33–53
  11. Meehl P E 1954 Clinical Versus Statistical Prediction: A Theoretical Analysis and a Review of the Evidence. University of Minnesota Press, Minneapolis, MN
  12. Meehl P E 1986 Causes and effects of my disturbing little book. Journal of Personality Assessment 50: 370–5
  13. Monahan J 1997 Clinical and actuarial predictions of violence. In: Faigman D, Kaye D, Saks M, Sanders J (eds.) Modern Scientific Evidence: the Law and Science of Expert Testimony. West, St. Paul, MN, Vol. 1, pp. 300–18
  14. Sawyer J 1966 Measurement and prediction, clinical and statistical. Psychological Bulletin 66: 178–200
  15. Svenson O 1992 Differentiation and consolidation theory of human decision making: a frame of reference for the study of pre- and post-decision processes. Acta Psychologica 80: 143–68
  16. Swets J A, Dawes R M, Monahan J 2000 Psychological science can improve diagnostic decisions. Psychological Science in the Public Interest 1: 1–26
  17. Wainer H 1976 Estimating coefficients in linear models: it do not make no nevermind. Psychological Bulletin 83: 213–7
  18. Wainer H, Thissen D 1976 Three steps toward robust regression. Psychometrika 41: 9–34
  19. Werner P D, Rose T L, Yesavage J A, Seeman K 1984 Psychiatrists’ judgments of dangerousness in patients on an acute care unit. American Journal of Psychiatry 141: 263–6
  20. Wilks S S 1938 Weighting systems for linear functions of correlated variables when there is no dependent variable. Psychometrika 8: 20–6
Confidentiality And Statistical Disclosure Research Paper
Chaos Theory Research Paper

ORDER HIGH QUALITY CUSTOM PAPER


Always on-time

Plagiarism-Free

100% Confidentiality
Special offer! Get 10% off with the 25START discount code!